This vignette introduces the SpaceTrooper package for preprocessing and
quality control of imaging-based spatial omics
from platforms like CosMx on Protein assay.
To install SpaceTrooper, use the following commands:
# Install BiocManager if not already installed, then install SpaceTrooper
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("drighelli/SpaceTrooper")
In this section, we load data from various platforms using the package’s
functions. The goal is to provide a uniform SpatialExperiment object across
all technologies, allowing for consistent QC analysis.
The functions in SpaceTrooper compute missing metrics as needed and allow
for the inclusion of polygons with the keep_polygons argument. This stores
polygons in the colData of the SpatialExperiment.
# Load the SpaceTrooper library
library(SpaceTrooper)
# Load Xenium data into a Spatial Experiment object (SPE)
protfolder <- system.file( "extdata", "S01_prot", package="SpaceTrooper")
(spe <- readCosmxProteinSPE(protfolder))
## class: SpatialExperiment
## dim: 69 745
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(745): f200_c10 f200_c100 ... f201_c96 f201_c98
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
colData(spe)
## DataFrame with 745 rows and 58 columns
## fov cellID Area AspectRatio Width Height
## <integer> <integer> <integer> <numeric> <integer> <integer>
## f200_c10 200 10 1596 0.55 58 32
## f200_c100 200 100 5855 0.92 101 93
## f200_c102 200 102 9468 0.89 106 119
## f200_c104 200 104 4888 0.84 90 76
## f200_c106 200 106 5757 0.74 106 78
## ... ... ... ... ... ... ...
## f201_c90 201 90 4425 0.93 83 77
## f201_c92 201 92 9563 0.83 108 130
## f201_c94 201 94 5388 0.91 92 84
## f201_c96 201 96 5587 0.79 76 96
## f201_c98 201 98 76154 0.81 322 396
## Mean.PanCK Max.PanCK Mean.CD68 Max.CD68 Mean.Membrane Max.Membrane
## <integer> <integer> <integer> <integer> <integer> <integer>
## f200_c10 896 1184 350 604 1475 2776
## f200_c100 809 1120 320 636 1918 4380
## f200_c102 556 1948 246 5816 375 3108
## f200_c104 694 7348 272 472 1264 3212
## f200_c106 612 1216 338 1548 527 2248
## ... ... ... ... ... ... ...
## f201_c90 1226 1516 499 696 653 1216
## f201_c92 570 1172 223 404 261 880
## f201_c94 484 1088 208 400 506 1516
## f201_c96 903 1564 395 4584 688 3412
## f201_c98 1047 2608 423 4552 801 3540
## Mean.CD45 Max.CD45 Mean.DAPI Max.DAPI SplitRatioToLocal NucArea
## <integer> <integer> <integer> <integer> <numeric> <integer>
## f200_c10 14940 31860 4901 7524 0.19 1292
## f200_c100 15531 33192 5157 8604 0.00 4696
## f200_c102 284 4976 1710 5544 0.00 3704
## f200_c104 12480 35720 3399 5944 0.00 3032
## f200_c106 268 668 202 320 0.00 0
## ... ... ... ... ... ... ...
## f201_c90 389 808 321 380 0 0
## f201_c92 211 532 172 296 0 0
## f201_c94 241 476 148 228 0 0
## f201_c96 357 5088 964 3736 0 1936
## f201_c98 373 4060 309 2580 0 1508
## NucAspectRatio Circularity Eccentricity Perimeter Solidity
## <numeric> <numeric> <numeric> <integer> <numeric>
## f200_c10 0.67 3.87 0.66 72 22.17
## f200_c100 0.98 0.93 0.56 282 20.76
## f200_c102 0.76 1.02 0.67 342 27.68
## f200_c104 0.97 0.92 0.80 259 18.87
## f200_c106 0.00 0.92 0.63 281 20.49
## ... ... ... ... ... ...
## f201_c90 0.00 0.95 0.69 242 18.29
## f201_c92 0.00 0.92 0.70 361 26.49
## f201_c94 0.00 0.96 0.72 265 20.33
## f201_c96 0.69 1.00 0.68 265 21.08
## f201_c98 0.80 0.74 0.72 1135 67.10
## cell_id X version dualfiles Run_name
## <character> <integer> <character> <character> <character>
## f200_c10 f200_c10 1 v6 ? Run0
## f200_c100 f200_c100 1 v6 ? Run0
## f200_c102 f200_c102 1 v6 ? Run0
## f200_c104 f200_c104 1 v6 ? Run0
## f200_c106 f200_c106 1 v6 ? Run0
## ... ... ... ... ... ...
## f201_c90 f201_c90 1 v6 ? Run0
## f201_c92 f201_c92 1 v6 ? Run0
## f201_c94 f201_c94 1 v6 ? Run0
## f201_c96 f201_c96 1 v6 ? Run0
## f201_c98 f201_c98 1 v6 ? Run0
## Run_Tissue_name ISH.concentration Dash tissue Panel
## <character> <character> <character> <character> <character>
## f200_c10 S0 1nM PILOT tissue WTx
## f200_c100 S0 1nM PILOT tissue WTx
## f200_c102 S0 1nM PILOT tissue WTx
## f200_c104 S0 1nM PILOT tissue WTx
## f200_c106 S0 1nM PILOT tissue WTx
## ... ... ... ... ... ...
## f201_c90 S0 1nM PILOT tissue WTx
## f201_c92 S0 1nM PILOT tissue WTx
## f201_c94 S0 1nM PILOT tissue WTx
## f201_c96 S0 1nM PILOT tissue WTx
## f201_c98 S0 1nM PILOT tissue WTx
## assay_type slide_ID median_RNA RNA_quantile_0.75 RNA_quantile_0.8
## <character> <integer> <numeric> <numeric> <numeric>
## f200_c10 protein 1 24859.4 141202 249217
## f200_c100 protein 1 24859.4 141202 249217
## f200_c102 protein 1 24859.4 141202 249217
## f200_c104 protein 1 24859.4 141202 249217
## f200_c106 protein 1 24859.4 141202 249217
## ... ... ... ... ... ...
## f201_c90 protein 1 11016.9 45561.7 56296.9
## f201_c92 protein 1 11016.9 45561.7 56296.9
## f201_c94 protein 1 11016.9 45561.7 56296.9
## f201_c96 protein 1 11016.9 45561.7 56296.9
## f201_c98 protein 1 11016.9 45561.7 56296.9
## RNA_quantile_0.85 RNA_quantile_0.9 RNA_quantile_0.95
## <numeric> <numeric> <numeric>
## f200_c10 439292 582926 1175591
## f200_c100 439292 582926 1175591
## f200_c102 439292 582926 1175591
## f200_c104 439292 582926 1175591
## f200_c106 439292 582926 1175591
## ... ... ... ...
## f201_c90 140621 208723 361311
## f201_c92 140621 208723 361311
## f201_c94 140621 208723 361311
## f201_c96 140621 208723 361311
## f201_c98 140621 208723 361311
## RNA_quantile_0.99 nCount_RNA nFeature_RNA median_negprobes
## <numeric> <numeric> <integer> <numeric>
## f200_c10 2758078 36035.65 67 6432.11
## f200_c100 2758078 39097.62 67 6432.11
## f200_c102 2758078 9059.81 67 6432.11
## f200_c104 2758078 30541.70 67 6432.11
## f200_c106 2758078 8988.49 67 6432.11
## ... ... ... ... ...
## f201_c90 1033930 6648.66 67 4190.43
## f201_c92 1033930 6344.07 67 4190.43
## f201_c94 1033930 7575.43 67 4190.43
## f201_c96 1033930 7713.20 67 4190.43
## f201_c98 1033930 6502.25 67 4190.43
## negprobes_quantile_0.75 negprobes_quantile_0.8
## <numeric> <numeric>
## f200_c10 7810.51 8086.19
## f200_c100 7810.51 8086.19
## f200_c102 7810.51 8086.19
## f200_c104 7810.51 8086.19
## f200_c106 7810.51 8086.19
## ... ... ...
## f201_c90 5042.51 5212.92
## f201_c92 5042.51 5212.92
## f201_c94 5042.51 5212.92
## f201_c96 5042.51 5212.92
## f201_c98 5042.51 5212.92
## negprobes_quantile_0.85 negprobes_quantile_0.9
## <numeric> <numeric>
## f200_c10 8361.87 8637.55
## f200_c100 8361.87 8637.55
## f200_c102 8361.87 8637.55
## f200_c104 8361.87 8637.55
## f200_c106 8361.87 8637.55
## ... ... ...
## f201_c90 5383.34 5553.76
## f201_c92 5383.34 5553.76
## f201_c94 5383.34 5553.76
## f201_c96 5383.34 5553.76
## f201_c98 5383.34 5553.76
## negprobes_quantile_0.95 negprobes_quantile_0.99 nCount_negprobes
## <numeric> <numeric> <numeric>
## f200_c10 8913.23 9133.77 15.20
## f200_c100 8913.23 9133.77 14.72
## f200_c102 8913.23 9133.77 15.70
## f200_c104 8913.23 9133.77 12.46
## f200_c106 8913.23 9133.77 20.58
## ... ... ... ...
## f201_c90 5724.17 5860.51 23.49
## f201_c92 5724.17 5860.51 14.87
## f201_c94 5724.17 5860.51 13.29
## f201_c96 5724.17 5860.51 21.39
## f201_c98 5724.17 5860.51 21.85
## nFeature_negprobes Area.um2 CenterX_local_px CenterY_local_px
## <integer> <numeric> <integer> <integer>
## f200_c10 2 22.9824 1365 16
## f200_c100 2 84.3120 762 204
## f200_c102 2 136.3392 2837 205
## f200_c104 2 70.3872 1443 202
## f200_c106 2 82.9008 3947 191
## ... ... ... ... ...
## f201_c90 2 63.7200 3576 787
## f201_c92 2 137.7072 128 837
## f201_c94 2 77.5872 2364 844
## f201_c96 2 80.4528 2772 870
## f201_c98 2 1096.6176 1923 1048
## cell sample_id
## <character> <character>
## f200_c10 c_1_200_10 sample01
## f200_c100 c_1_200_100 sample01
## f200_c102 c_1_200_102 sample01
## f200_c104 c_1_200_104 sample01
## f200_c106 c_1_200_106 sample01
## ... ... ...
## f201_c90 c_1_201_90 sample01
## f201_c92 c_1_201_92 sample01
## f201_c94 c_1_201_94 sample01
## f201_c96 c_1_201_96 sample01
## f201_c98 c_1_201_98 sample01
The package offers several functions for spatial data analysis, including quality control and visualization.
This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx.
Additionally, even if not tested, the same approach can be extended on Akoya
CODEX data, as far as a SpatialExperiment object is created.
Polygons can be loaded later if needed.
The plotCellsFovs function shows a map of the FoVs within an experiment.
This plot is specific to CosMx data and uses cell centroids.
Please keep in mind, that this specific experiment had unaligned fov_positions
and cell centroids positions.
An alignment approach, can be found at the end of the scripts/datacreation.R
file.
# Plot the cells within their respective Field of Views (FOVs)
plotCellsFovs(spe)
Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple.
When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers.
The spatialPerCellQC function, inspired by scater::addPerCellQC, computes
additional metrics for each cell in the SpatialExperiment. It also allows for
the detection of negative control probes, which is crucial for QC.
By default, it automatically removes 0 counts cells, but this can be handled
with the rmZeros argument.
Here, for transparency, we specified the negProbList for CosMx protein assays,
but the algorithm has already a set of negative probes for the mostly used
probes in multiple technologies.
Notice that despite the same approach can be applied to CODEX data, it is not
provided a list of negative probes for this technology, so the user needs to
specify them.
# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG"))
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea"
You can investigate individual metrics by viewing their histograms. For
outliers, use the use_fences argument to display the fences computed by
computeSpatialOutlier (see next chunk).
# Plot a histogram of counts (sum)
plotMetricHist(spe, metric="sum")
# Plot a histogram of cell areas (Area_um)
plotMetricHist(spe, metric="Area_um")
# Plot a histogram of proportion of counts respect to the cell area in micron
plotMetricHist(spe, metric="log2CountArea")
# Plot a histogram of proportion of negative probe counts respect to the total
# counts in cells
plotMetricHist(spe, metric="log2Ctrl_total_ratio")
These plots show, respectively, the distributions of the total counts (sum),
of the Area in micron (Area_um), the relationship between the counts and
the Area of each cell (log2CountArea) and the proportion between the
negative probes counts and the total counts of each cell
(log2Ctrl_total_ratio).
Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.
The computeSpatialOutlier function allows the computation of the medcouple
statistics on a specified metric (compute_by argument).
The medcouple is specifically designed for asymmetric distributions, indeed the
function stamps a warning message when this requisite is not satisfied.
It can also use scuttle::isOutlier for symmetric distributions.
The method argument supports mc, scuttle, or both.
This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.
# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both")
# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both")
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc"
If we computed outliers with the computeSpatialOutlier function, we can also
visualize which fences have been used to create the filter on the cells.
# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc")
# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc")
# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc")
# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc")
We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected.
If we want, we can already use these fences to remove the computed outliers.
Next, we use computeQCScore to calculate a flag score based on previously
computed metrics.
The quality score combines transcript counts related to
cell area, the aspect ratio of each cell, and its
distance from the FoV border (only for CosMx, this last one is not used
otherwise).
See the help(computeQCScore) details section for additional details.
# Calculate quality scores for each cell
spe <- computeQCScore(spe)
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "QC_score"
Logical filters can then be computed using computeQScoreFlags, which requires
thresholds for various metrics. Currently, the function considers:
Quality Score (qsThreshold): Cells with scores below this threshold
(default 0.5) are flagged for exclusion. This value can be used to indicate
the quantile for the filtering when setting the useQSQuantiles argument
to TRUE.
Quality Score Quantiles (useQSQuantiles): Option to filter based on
quantiles (default FALSE).
# Compute flags to identify cells for filtering
spe <- computeQCScoreFlags(spe, qsThreshold=0.5)
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "QC_score"
## [89] "low_qcscore"
table(spe$low_qcscore)
##
## FALSE TRUE
## 688 57
We detected 61 cells to be removed.
While for other metrics such as the total counts and the negative prob ratio,
the function computeThresholdFlags considers:
Total Counts (totalThreshold): Minimum count threshold (default 0).
Negative Probe Ratio (ctrlTotRatioThreshold): Minimum ratio of
negative probes to total counts (default 0.1).
spe <- computeThresholdFlags(spe, totalThreshold=0,
ctrlTotRatioThreshold=0.1)
table(spe$threshold_flags)
##
## FALSE
## 745
In this example, we don’t find any cell to be removed.
To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics.
We can load and add polygons to the SPE object using the following functions.
Each technology has its own readPolygons function to standardize the
loaded sf object and handle different file types.
# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)
# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)
Once the polygons are stored in an sf object within colData, they can be
visualized using functions based on the ggplot2 library.
# Plot the polygons of the selected cells
plotPolygons(spe, bgColor="white")
Showing the cells on a white background for better visualization.
# Plot polygons colored by cell area
plotPolygons(spe, colourBy="log2AspectRatio")
plotPolygons(spe, colourBy="Area_um")
We can see in yellow and darkviolet that there are few cells with extreme
values of log2AspectRatio and Area:um in micron.
plotPolygons(spe, colourBy="QC_score")
plotPolygons(spe, colourBy="low_qcscore")
We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation.
We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells.
The plotZoomFovsMap function allows you to visualize a map of the FoVs with
a zoom-in of selected FoVs, colored by the colour_by argument.
plotZoomFovsMap(spe, fovs=c(201), colourBy="QC_score")
plotZoomFovsMap(spe, fovs=c(201), colourBy="low_qcscore")
We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment.
In this vignette, we explored the main functionalities of the SpaceTrooper
package for spatial data analysis.
Main steps shown are:
* data and polygons loading for CosMx Protein, CosMx
* quality control:
+ outlier detection: medcouple and scuttle MAD
+ flag score: a score combining transcript counts, cell area,
aspect ratio and distance from the FoV border
* visualization:
+ centroids: with ggplot2
+ polygons: sf + ggplot2
sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpaceTrooper_0.99.47 SpatialExperiment_1.19.1
## [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.2
## [5] Biobase_2.69.1 GenomicRanges_1.61.5
## [7] Seqinfo_0.99.2 IRanges_2.43.5
## [9] S4Vectors_0.47.4 BiocGenerics_0.55.4
## [11] generics_0.1.4 MatrixGenerics_1.21.0
## [13] matrixStats_1.5.0 BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3
## [3] rlang_1.1.6 magrittr_2.0.4
## [5] scater_1.37.0 e1071_1.7-16
## [7] compiler_4.5.1 DelayedMatrixStats_1.31.0
## [9] SpatialExperimentIO_1.1.0 sfheaders_0.4.4
## [11] vctrs_0.6.5 shape_1.4.6.1
## [13] pkgconfig_2.0.3 crayon_1.5.3
## [15] fastmap_1.2.0 backports_1.5.0
## [17] magick_2.9.0 XVector_0.49.1
## [19] labeling_0.4.3 scuttle_1.19.0
## [21] rmarkdown_2.30 ggbeeswarm_0.7.2
## [23] tinytex_0.57 purrr_1.1.0
## [25] bit_4.6.0 glmnet_4.1-10
## [27] xfun_0.53 cachem_1.1.0
## [29] beachmat_2.25.5 jsonlite_2.0.0
## [31] rhdf5filters_1.21.4 DelayedArray_0.35.3
## [33] Rhdf5lib_1.31.1 BiocParallel_1.43.4
## [35] irlba_2.3.5.1 broom_1.0.10
## [37] parallel_4.5.1 R6_2.6.1
## [39] bslib_0.9.0 RColorBrewer_1.1-3
## [41] limma_3.65.7 car_3.1-3
## [43] jquerylib_0.1.4 iterators_1.0.14
## [45] Rcpp_1.1.0 bookdown_0.45
## [47] assertthat_0.2.1 knitr_1.50
## [49] R.utils_2.13.0 splines_4.5.1
## [51] Matrix_1.7-4 tidyselect_1.2.1
## [53] viridis_0.6.5 dichromat_2.0-0.1
## [55] abind_1.4-8 yaml_2.3.10
## [57] codetools_0.2-20 lattice_0.22-7
## [59] tibble_3.3.0 withr_3.0.2
## [61] S7_0.2.0 evaluate_1.0.5
## [63] sf_1.0-21 survival_3.8-3
## [65] units_1.0-0 proxy_0.4-27
## [67] pillar_1.11.1 BiocManager_1.30.26
## [69] ggpubr_0.6.2 carData_3.0-5
## [71] KernSmooth_2.23-26 foreach_1.5.2
## [73] ggplot2_4.0.0 sparseMatrixStats_1.21.0
## [75] scales_1.4.0 class_7.3-23
## [77] glue_1.8.0 tools_4.5.1
## [79] BiocNeighbors_2.3.1 robustbase_0.99-6
## [81] data.table_1.17.8 ScaledMatrix_1.17.0
## [83] locfit_1.5-9.12 ggsignif_0.6.4
## [85] cowplot_1.2.0 rhdf5_2.53.6
## [87] grid_4.5.1 tidyr_1.3.1
## [89] DropletUtils_1.29.7 edgeR_4.7.6
## [91] beeswarm_0.4.0 BiocSingular_1.25.0
## [93] HDF5Array_1.37.0 vipor_0.4.7
## [95] rsvd_1.0.5 Formula_1.2-5
## [97] cli_3.6.5 viridisLite_0.4.2
## [99] S4Arrays_1.9.1 arrow_21.0.0.1
## [101] dplyr_1.1.4 DEoptimR_1.1-4
## [103] gtable_0.3.6 R.methodsS3_1.8.2
## [105] rstatix_0.7.3 sass_0.4.10
## [107] digest_0.6.37 classInt_0.4-11
## [109] ggrepel_0.9.6 SparseArray_1.9.1
## [111] dqrng_0.4.1 rjson_0.2.23
## [113] farver_2.1.2 htmltools_0.5.8.1
## [115] R.oo_1.27.1 lifecycle_1.0.4
## [117] h5mread_1.1.1 statmod_1.5.1
## [119] bit64_4.6.0-1