0.1 Introduction

SpaceTrooper is an R/Bioconductor package for Quality Control (QC) of imaging-based spatial transcriptomics and proteomics data. It provides multi-platform data harmonization, cell-level QC, and visualization utilities. The package leverages SpatialExperiment objects to support data from CosMx, Xenium, and MERFISH technologies.

0.2 Quality Control pipeline

In this section, we show how to run the standard SpaceTrooper workflow on
Spatial Proteomics data obtained with CosMx Protein Assay, to perform QC by computing the Quality Score (QS).

For demonstration purposes, we showcase a small subset of the CosMx human tonsil dataset analyzed also in the paper.

0.2.1 Load example data

SpaceTrooper provides a reading function to load data into a SpatialExperiment object. For more details on how to correctly specify the inputs, please refer to the SpaceTrooper utilities vignette.

library(SpaceTrooper)
library(ggplot2)

protfolder <- system.file( "extdata", "S01_prot", package="SpaceTrooper")
spe <- readCosmxProteinSPE(protfolder, sampleName = "CosMx_Protein_Tonsil")

spe
## class: SpatialExperiment 
## dim: 69 1793 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(1793): f344_c1 f344_c10 ... f344_c998 f344_c999
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id

0.2.2 Field of View (FOVs) visualization

In this section, we show how to plot each cell position onto the map of the Fields of View (FOVs), whose coordinates are provided only for CosMx technology.

IMPORTANT: we have noticed that according to CosMx version, there could be a misalignment of FOVs and cell centroids, that can be easily corrected with a single line of code. Therefore, it is crucial to generate this plot to check for any spatial shift, as such misalignment directly affects the QS computation.

# to check misalignment
plotCellsFovs(spe, size = 3, alpha = 0.7)

If any misalignment is observed, it can be corrected by adjusting the FoV coordinates directly. And this is exactly the case. Indeed, FoVs are shifted upward by one FoV height (which, for CosMx technology, corresponds to 4,256 pixels). You can correct this by subtracting that value from the FoVs y coordinates:

# code line for shift correction
metadata(spe)$fov_positions$y_global_px <- metadata(spe)$fov_positions$y_global_px - 4256

Rerun the FoV plot to check whether the shift was corrected.

# check shift correction
plotCellsFovs(spe, size = 3, alpha = 0.7)

Cell centroids, shown in dark red, now are all contained by FoV boundaries, hence FoVs and cell centroids are aligned after shift correction.

The dataset is a subset with just a single FoVs, whose number is displayed at the center.

When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their numbers.

0.2.3 Load polygons

In this section we show how to load polygons after SpatialExperiment creation, only for visualization purposes. They are stored as an sf object within colData.

This step is not mandatory for CosMx, because the pipeline can be executed even without them.

# polygon loading
spe <- readAndAddPolygonsToSPE(spe, boundariesType="csv")

Please pay attention to any warnings. Cell polygons with fewer than four vertices cannot be handled by the geometry packages used by SpaceTrooper. Therefore, the corresponding cells are discarded from the SpatialExperiment object.

0.2.4 Add QC metrics

The following sections will work the same way for all types of data.

The spatialPerCellQC function computes additional metrics per each cell, that are saved inside the SpatialExperimentand accessible through colData(spe). It is mandatory to run this function before computing QS.

The negProbList parameter is, by default, a vector containing all control probe patterns encountered so far across the supported technologies. Because these patterns continue to evolve, some may not yet be included. If you find that your control probe patterns are missing from the default list, you can define a custom vector, as shown below.

By default, the function automatically removes 0 count cells, but this can be handled with the rmZeros parameter.

spe <- spatialPerCellQC(spe, rmZeros=TRUE,
        negProbList=c("Ms IgG1", "Rb IgG"))

colnames(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "polygons"                 "sum"                     
## [61] "detected"                 "subsets_Ms IgG1_sum"     
## [63] "subsets_Ms IgG1_detected" "subsets_Ms IgG1_percent" 
## [65] "subsets_Rb IgG_sum"       "subsets_Rb IgG_detected" 
## [67] "subsets_Rb IgG_percent"   "total"                   
## [69] "control_sum"              "control_detected"        
## [71] "target_sum"               "target_detected"         
## [73] "CenterX_global_px"        "CenterY_global_px"       
## [75] "ctrl_total_ratio"         "log2Ctrl_total_ratio"    
## [77] "CenterX_global_um"        "CenterY_global_um"       
## [79] "dist_border_x"            "dist_border_y"           
## [81] "dist_border"              "log2AspectRatio"         
## [83] "SignalDensity"            "log2SignalDensity"

Several metrics are added, both derived from protein assay and cell morphology. Some of them are directly used to compute QS:

  • log2SignalDensity: log2-transformed ratio between total protein intensity per cell and cell area in µm²
  • Area_um: cell area in µm²;
  • log2Ctrl_total_ratio: log2-transformed ratio between total negative control protein intensity and total total protein intensity per cell;
  • log2AspectRatio: log2-transformed ratio between cell maximum length along the x dimension and cell maximum length along the y dimension (pixels). This metric is taken in absolute value and combined with dist_border to consider only cells within 50 pixels from the nearest FoV border (only for CosMx technology).

For a better detailed explanation of the other metrics, please refer to the SpaceTrooper utilities vignette.

0.2.5 Compute Quality Score

computeQCScore function calculates QS per each cell. QS combines several metrics into a formula: log2SignalDensity, corresponding to signal density, Area_um, i.e. cell size, log2Ctrl_total_ratio, namely background signal and log2AspectRatio combined with dist_border, which jointly correspond to the border effect (only for CosMx technology).

glmnet package is used to estimate the coefficients of the formula, resulting in a robust score that captures low-quality cells. During model training, the selected cells are used as good or bad examples to learn how each term in the formula contributes to cell quality.

IMPORTANT: please, pay attention to any warning. If a term has too few bad examples (fewer than 0.1% of the total number of cells), it is excluded from the formula and therefore not used in the QS computation.

The QS (stored as QC_score in coldata) ranges from 0 to 1, with 0 meaning low-quality and 1 high-quality.

We are setting a seed to ensure reproducibility in this tutorial, because there are stochastic processes underlying QS computation.

set.seed(1713)

spe <- computeQCScore(spe)

format(summary(spe$QC_score), scientific=FALSE, digits = 4)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## "0.0000" "0.6901" "0.8816" "0.7753" "0.9519" "0.9975"

In this case, all the terms were used as no warning appeared.

Subsequently, it is possible also to assess which cells have a QS lower than a certain threshold (default is 0.5) with the following function. It creates a new column called low_qcscore inside of colData.

spe <- computeQCScoreFlags(spe, qsThreshold=0.5)

table(spe$low_qcscore)
## 
## FALSE  TRUE 
##  1519   274

Using this threshold, 274 cells are flagged as low-quality. We do not suggest a fixed default threshold, but it is advisable to check the QS distribution before setting any threshold.

0.2.6 Data visualization

SpaceTrooper comes with several functions to view cells and metrics.

To view the distribution of whatever quantitative metric, plotMetricHist comes in handy.

# view quantitative metric distribution
plotMetricHist(spe, metric = "QC_score")

The QS distribution exhibits a left tail starting around 0.75.

Cell visualization can be obtained by using either centroids (recommended when the dataset has a large number of cells) or polygons.

plotCentroids plots cell centroids that can be colored by a certain metric contained in the colData slot, by using the colour_by parameter.

Additionally, if you have a palette column in colData, containing colors for each cell, it can be given to palette parameter, so that it automatically matches the column passed in colour_by. As an example, we are using the cell types obtained as described in the paper with their own color palette.

labf <- system.file(file.path("extdata", "S01_prot",
                            "labels_tiny.tsv"), package="SpaceTrooper")
labs <- read.table(file=labf, sep="\t", header=TRUE, comment.char = "")

spe$labels <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$label)
spe$labels_colors <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$lab_color)

plotCentroids(spe, colourBy="labels", size=3, palette="labels_colors")

Cell centroids colored by cell types allow to view the spatial distribution of cell populations in the sample.

When possible, polygon visualization gives a better overview of the cells’ spatial organization and morphological characteristics. Polygons are stored as an sf object within colData, so they can be viewed using standard ggplot2 functions. For CosMx technology, the polygons must be explicitly loaded as described in Load polygons. SpaceTrooper provides plotPolygons function that works just like plotCentroids but takes polygons instead of centroids.

plotPolygons(spe, colourBy="log2SignalDensity")

plotPolygons(spe, colourBy="Area_um")

plotPolygons(spe, colourBy="log2Ctrl_total_ratio")

plotPolygons(spe, colourBy="log2AspectRatio")

We can see in yellow and darkviolet that there are few cells with extreme values of either log2SignalDensity, Area_um, log2Ctrl_total_ratio and log2AspectRatio.

Since all plotting functions are based on ggplot2, you can easily customize the graphical outputs by adding standard ggplot2 components.

plotPolygons(spe, colourBy="QC_score") + 
    scale_fill_viridis_c(option = "plasma")

We can see that the QS is able to detect both the aspects highlighted by log2SignalDensity,Area_um, log2Ctrl_total_ratio or log2AspectRatio. Cells that showed either lower signal density, bigger size, higher background signal or border effect also display low QS (darker color).

It’s up to the user to choose an appropriate threshold to flag cells according to the observed QS distribution.

plotPolygons(spe, colourBy="low_qcscore") + 
    scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))

You can reruncomputeQCScoreFlags to check how many cells would be flagged using another threshold.

spe <- computeQCScoreFlags(spe, qsThreshold=0.75)

table(spe$low_qcscore)
## 
## FALSE  TRUE 
##  1266   527

Using this threshold, 527 cells are flagged as low-quality.

plotPolygons(spe, colourBy="low_qcscore") + 
    scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))

The threshold is more stringent and more cells are flagged compared to the previous one. It’s up to you to choose a threshold that better suits your analysis needs.

0.3 Conclusion

In this vignette, we explored the main functionalities of the SpaceTrooper package for imaging-based spatial omics data QC.

For further insights on SpaceTrooper package usage, please refer to the SpaceTrooper utilities vignette.

Main steps shown are:

  • data loading: CosMx Protein Assay
  • Quality Control:
    • per-cell QC metrics
    • Quality Score: a score combining signal density, cell size, background noise and border effect
  • visualization:
    • centroids: with ggplot2
    • polygons: sf + ggplot2

0.4 Session Information

sessionInfo()
## R Under development (unstable) (2026-01-15 r89304)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_4.0.1               SpaceTrooper_1.1.4         
##  [3] SpatialExperiment_1.21.0    SingleCellExperiment_1.33.0
##  [5] SummarizedExperiment_1.41.0 Biobase_2.71.0             
##  [7] GenomicRanges_1.63.1        Seqinfo_1.1.0              
##  [9] IRanges_2.45.0              S4Vectors_0.49.0           
## [11] BiocGenerics_0.57.0         generics_0.1.4             
## [13] MatrixGenerics_1.23.0       matrixStats_1.5.0          
## [15] BiocStyle_2.39.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        jsonlite_2.0.0           
##   [3] shape_1.4.6.1             magrittr_2.0.4           
##   [5] ggbeeswarm_0.7.3          magick_2.9.0             
##   [7] farver_2.1.2              rmarkdown_2.30           
##   [9] vctrs_0.7.1               DelayedMatrixStats_1.33.0
##  [11] tinytex_0.58              rstatix_0.7.3            
##  [13] htmltools_0.5.9           S4Arrays_1.11.1          
##  [15] BiocNeighbors_2.5.2       broom_1.0.11             
##  [17] Rhdf5lib_1.33.0           SparseArray_1.11.10      
##  [19] Formula_1.2-5             rhdf5_2.55.12            
##  [21] sass_0.4.10               KernSmooth_2.23-26       
##  [23] bslib_0.10.0              cachem_1.1.0             
##  [25] lifecycle_1.0.5           iterators_1.0.14         
##  [27] pkgconfig_2.0.3           rsvd_1.0.5               
##  [29] Matrix_1.7-4              R6_2.6.1                 
##  [31] fastmap_1.2.0             digest_0.6.39            
##  [33] scater_1.39.2             dqrng_0.4.1              
##  [35] irlba_2.3.5.1             ggpubr_0.6.2             
##  [37] beachmat_2.27.2           labeling_0.4.3           
##  [39] SpatialExperimentIO_1.3.0 abind_1.4-8              
##  [41] compiler_4.6.0            proxy_0.4-29             
##  [43] bit64_4.6.0-1             withr_3.0.2              
##  [45] S7_0.2.1                  backports_1.5.0          
##  [47] BiocParallel_1.45.0       carData_3.0-5            
##  [49] viridis_0.6.5             DBI_1.2.3                
##  [51] HDF5Array_1.39.0          R.utils_2.13.0           
##  [53] ggsignif_0.6.4            DelayedArray_0.37.0      
##  [55] rjson_0.2.23              classInt_0.4-11          
##  [57] tools_4.6.0               units_1.0-0              
##  [59] vipor_0.4.7               otel_0.2.0               
##  [61] beeswarm_0.4.0            R.oo_1.27.1              
##  [63] glue_1.8.0                h5mread_1.3.1            
##  [65] rhdf5filters_1.23.3       grid_4.6.0               
##  [67] sf_1.0-24                 gtable_0.3.6             
##  [69] R.methodsS3_1.8.2         class_7.3-23             
##  [71] tidyr_1.3.2               data.table_1.18.0        
##  [73] BiocSingular_1.27.1       ScaledMatrix_1.19.0      
##  [75] car_3.1-3                 XVector_0.51.0           
##  [77] ggrepel_0.9.6             foreach_1.5.2            
##  [79] pillar_1.11.1             limma_3.67.0             
##  [81] robustbase_0.99-6         splines_4.6.0            
##  [83] dplyr_1.1.4               lattice_0.22-7           
##  [85] survival_3.8-6            bit_4.6.0                
##  [87] tidyselect_1.2.1          locfit_1.5-9.12          
##  [89] scuttle_1.21.0            sfheaders_0.4.5          
##  [91] knitr_1.51                gridExtra_2.3            
##  [93] bookdown_0.46             edgeR_4.9.2              
##  [95] xfun_0.56                 statmod_1.5.1            
##  [97] DropletUtils_1.31.0       DEoptimR_1.1-4           
##  [99] yaml_2.3.12               evaluate_1.0.5           
## [101] codetools_0.2-20          tibble_3.3.1             
## [103] BiocManager_1.30.27       cli_3.6.5                
## [105] arrow_23.0.0              jquerylib_0.1.4          
## [107] dichromat_2.0-0.1         Rcpp_1.1.1               
## [109] parallel_4.6.0            assertthat_0.2.1         
## [111] sparseMatrixStats_1.23.0  glmnet_4.1-10            
## [113] viridisLite_0.4.2         scales_1.4.0             
## [115] e1071_1.7-17              purrr_1.2.1              
## [117] rlang_1.1.7               cowplot_1.2.0