0.1 Introduction

SpaceTrooper is an R/Bioconductor package for Quality Control (QC) of imaging-based spatial transcriptomics and proteomics data. It provides multi-platform data harmonization, cell-level QC, and visualization utilities. The package leverages SpatialExperiment objects to support data from CosMx, Xenium, and MERFISH technologies.

0.2 Quality Control pipeline

In this section, we show how to run the standard SpaceTrooper workflow on Spatial Transcriptomics data, to perform QC by computing the Quality Score (QS).

For demonstration purposes, we showcase the CosMx_DBKero_Tiny dataset as the main tutorial example, a small subset of the CosMx 1k breast cancer dataset analyzed also in the paper.

0.2.1 Load example data

SpaceTrooper provides three different reading functions, one per each technology, that load data into a SpatialExperiment object. For more details on how to correctly specify the inputs, please refer to the SpaceTrooper utilities vignette.

library(SpaceTrooper)
library(ggplot2)

cosmxFolder <- system.file(file.path("extdata", "CosMx_DBKero_Tiny"), 
                        package="SpaceTrooper")
spe <- readCosmxSPE(cosmxFolder, sampleName="DBKero_CosMx")

spe
## class: SpatialExperiment 
## dim: 1010 905 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(1010): RAMP2 CD83 ... NegPrb09 NegPrb10
## rowData names(0):
## colnames(905): f16_c1 f16_c10 ... f16_c98 f16_c99
## colData names(20): fov cellID ... sample_id cell_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id

In addition, we provide the code for loading Xenium, and MERFISH example data. Please, note that for these technologies the parameters are configured to automatically load polygons and compute missing metrics, so these steps don’t need to be run again. To proceed with these technologies, skip directly to Add QC metrics.

xeniumFolder <- system.file( "extdata", "Xenium_small", package="SpaceTrooper")
xen <- readXeniumSPE(xeniumFolder, computeMissingMetrics=TRUE,
                        keepPolygons=TRUE)

merfishFolder <- system.file("extdata", "Merfish_Tiny", package="SpaceTrooper")
mer <- readMerfishSPE(merfishFolder, boundariesType="parquet",
                        computeMissingMetrics=TRUE, keepPolygons=TRUE)

0.2.2 Field of View (FOVs) visualization

In this section, we show how to plot each cell position onto the map of the Fields of View (FOVs), whose coordinates are provided only for CosMx technology.

IMPORTANT: we have noticed that according to CosMx version, there could be a misalignment of FOVs and cell centroids, that can be easily corrected with a single line of code. Therefore, it is crucial to generate this plot to check for any spatial shift, as such misalignment directly affects the QS computation.

# to check misalignment
plotCellsFovs(spe, size = 3, alpha = 0.7)

The dataset is a subset with just a single FoV, whose number is displayed at the center. Cell centroids, shown in dark red, are all contained by FoV boundaries, thus no shift correction is needed here.

When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their numbers.

If any misalignment is observed, it can be corrected by adjusting the FoV coordinates directly. For instance, if the FoVs are shifted upward by one FoV height (which, for CosMx technology, corresponds to 4,256 pixels), you can correct this by subtracting that value from the FoVs y coordinates:

# code line for shift correction
# metadata(spe)$fov_positions$y_global_px <- metadata(spe)$fov_positions$y_global_px - 4256

0.2.3 Load polygons

In this section we show how to load polygons after SpatialExperiment creation, only for visualization purposes. They are stored as an sf object within colData.

This step is not mandatory for CosMx, because the pipeline can be executed even without them.

# polygon loading
spe <- readAndAddPolygonsToSPE(spe, boundariesType="csv")

Please pay attention to any warnings. Cell polygons with fewer than four vertices cannot be handled by the geometry packages used by SpaceTrooper. Therefore, the corresponding cells are discarded from the SpatialExperiment object.

0.2.4 Add QC metrics

The following sections will work the same way for all types of data.

The spatialPerCellQC function computes additional metrics per each cell, that are saved inside the SpatialExperimentand accessible through colData(spe). It is mandatory to run this function before computing QS.

The negProbList parameter is, by default, a vector containing all control probe patterns encountered so far across the supported technologies. Because these patterns continue to evolve, some may not yet be included. If you find that your control probe patterns are missing from the default list, you can define a custom vector, as shown below.

By default, the function automatically removes 0 count cells, but this can be handled with the rmZeros parameter.

spe <- spatialPerCellQC(spe, rmZeros=TRUE,
        negProbList=c("NegPrb", "Negative", "SystemControl"))
## Removing 1 cells with 0 counts!
colnames(colData(spe))
##  [1] "fov"                     "cellID"                 
##  [3] "Area"                    "AspectRatio"            
##  [5] "CenterX_local_px"        "CenterY_local_px"       
##  [7] "Width"                   "Height"                 
##  [9] "Mean.PanCK"              "Max.PanCK"              
## [11] "Mean.CD68"               "Max.CD68"               
## [13] "Mean.MembraneStain_B2M"  "Max.MembraneStain_B2M"  
## [15] "Mean.CD45"               "Max.CD45"               
## [17] "Mean.DAPI"               "Max.DAPI"               
## [19] "sample_id"               "cell_id"                
## [21] "polygons"                "sum"                    
## [23] "detected"                "subsets_NegPrb_sum"     
## [25] "subsets_NegPrb_detected" "subsets_NegPrb_percent" 
## [27] "total"                   "control_sum"            
## [29] "control_detected"        "target_sum"             
## [31] "target_detected"         "CenterX_global_px"      
## [33] "CenterY_global_px"       "ctrl_total_ratio"       
## [35] "log2Ctrl_total_ratio"    "CenterX_global_um"      
## [37] "CenterY_global_um"       "Area_um"                
## [39] "dist_border_x"           "dist_border_y"          
## [41] "dist_border"             "log2AspectRatio"        
## [43] "SignalDensity"           "log2SignalDensity"

Several metrics are added, both derived from expression assay and cell morphology. Some of them are directly used to compute QS:

  • log2SignalDensity: log2-transformed ratio between total probe counts per cell and cell area in µm² (or volume in µm³ for MERFISH technology);
  • Area_um: cell area in µm² (or volume in µm³ in the case of MERFISH technology);
  • log2Ctrl_total_ratio: log2-transformed ratio between total negative control probe counts and total probe count per cell;
  • log2AspectRatio: log2-transformed ratio between cell maximum length along the x dimension and cell maximum length along the y dimension (pixels). This metric is taken in absolute value and combined with dist_border to consider only cells within 50 pixels from the nearest FoV border (only for CosMx technology).

For a better detailed explanation of the other metrics, please refer to the SpaceTrooper utilities vignette.

0.2.5 Compute Quality Score

computeQCScore function calculates QS per each cell. QS combines several metrics into a formula: log2SignalDensity, corresponding to signal density, Area_um or volume, i.e. cell size, log2Ctrl_total_ratio, namely background signal and log2AspectRatio combined with dist_border, which jointly correspond to the border effect (only for CosMx technology).

glmnet package is used to estimate the coefficients of the formula, resulting in a robust score that captures low-quality cells. During model training, the selected cells are used as good or bad examples to learn how each term in the formula contributes to cell quality.

IMPORTANT: please, pay attention to any warning. If a term has too few bad examples (fewer than 0.1% of the total number of cells), it is excluded from the formula and therefore not used in the QS computation.

The QS (stored as QC_score in coldata) ranges from 0 to 1, with 0 meaning low-quality and 1 high-quality.

We are setting a seed to ensure reproducibility in this tutorial, because there are stochastic processes underlying QS computation.

set.seed(1713)

spe <- computeQCScore(spe)

summary(spe$QC_score)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.003429 0.786016 0.883948 0.798579 0.936674 0.988138

In this case, all the terms were used as no warning appeared.

Subsequently, it is possible also to assess which cells have a QS lower than a certain threshold (default is 0.5) with the following function. It creates a new column called low_qcscore inside of colData.

spe <- computeQCScoreFlags(spe, qsThreshold=0.5)

table(spe$low_qcscore)
## 
## FALSE  TRUE 
##   807    97

Using this threshold, 97 cells are flagged as low-quality. We do not suggest a fixed default threshold, but it is advisable to check the QS distribution before setting any threshold.

0.2.6 Data visualization

SpaceTrooper comes with several functions to view cells and metrics.

To view the distribution of whatever quantitative metric, plotMetricHist comes in handy.

# view quantitative metric distribution
plotMetricHist(spe, metric = "QC_score")

The QS distribution exhibits a left tail starting around 0.75.

Cell visualization can be obtained by using either centroids (recommended when the dataset has a large number of cells) or polygons.

plotCentroids plots cell centroids that can be colored by a certain metric contained in the colData slot, by using the colour_by parameter.

Additionally, if you have a palette column in colData, containing colors for each cell, it can be given to palette parameter, so that it automatically matches the column passed in colour_by. As an example, we are using the cell types obtained as described in the paper with their own color palette.

labf <- system.file(file.path("extdata", "CosMx_DBKero_Tiny",
                            "labels_tiny.tsv"), package="SpaceTrooper")
labs <- read.table(file=labf, sep="\t", header=TRUE)

spe$labels <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$label)
spe$labels_colors <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$lab_color)

plotCentroids(spe, colourBy="labels", size=3, palette="labels_colors")

Cell centroids colored by cell types allow to view the spatial distribution of cell populations in the sample.

When possible, polygon visualization gives a better overview of the cells’ spatial organization and morphological characteristics. Polygons are stored as an sf object within colData, so they can be viewed using standard ggplot2 functions. For CosMx technology, the polygons must be explicitly loaded as described in Load polygons. SpaceTrooper provides plotPolygons function that works just like plotCentroids but takes polygons instead of centroids.

plotPolygons(spe, colourBy="log2SignalDensity")

plotPolygons(spe, colourBy="Area_um")

plotPolygons(spe, colourBy="log2Ctrl_total_ratio")

plotPolygons(spe, colourBy="log2AspectRatio")

We can see in yellow and darkviolet that there are few cells with extreme values of either log2SignalDensity, Area_um, log2Ctrl_total_ratio and log2AspectRatio.

Since all plotting functions are based on ggplot2, you can easily customize the graphical outputs by adding standard ggplot2 components.

plotPolygons(spe, colourBy="QC_score") + 
    scale_fill_viridis_c(option = "plasma")

We can see that the QS is able to detect both the aspects highlighted by log2SignalDensity,Area_um, log2Ctrl_total_ratio or log2AspectRatio. Cells that showed either lower signal density, bigger size, higher background signal or border effect also display low QS (darker color).

It’s up to the user to choose an appropriate threshold to flag cells according to the observed QS distribution.

plotPolygons(spe, colourBy="low_qcscore") + 
    scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))

You can reruncomputeQCScoreFlags to check how many cells would be flagged using another threshold.

spe <- computeQCScoreFlags(spe, qsThreshold=0.75)

table(spe$low_qcscore)
## 
## FALSE  TRUE 
##   711   193

Using this threshold, 193 cells are flagged as low-quality.

plotPolygons(spe, colourBy="low_qcscore") + 
    scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))

The threshold is more stringent and more cells are flagged compared to the previous one. It’s up to you to choose a threshold that better suits your analysis needs.

0.3 Conclusion

In this vignette, we explored the main functionalities of the SpaceTrooper package for imaging-based spatial omics data QC.

For further insights on SpaceTrooper package usage, please refer to the SpaceTrooper utilities vignette.

Main steps shown are:

  • data loading: CosMx RNA, Xenium, MERFISH
  • Quality Control:
    • per-cell QC metrics
    • Quality Score: a score combining signal density, cell size, background noise and border effect
  • visualization:
    • centroids: with ggplot2
    • polygons: sf + ggplot2

0.4 Session Information

sessionInfo()
## R Under development (unstable) (2026-01-15 r89304)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_4.0.1               SpaceTrooper_1.1.4         
##  [3] SpatialExperiment_1.21.0    SingleCellExperiment_1.33.0
##  [5] SummarizedExperiment_1.41.0 Biobase_2.71.0             
##  [7] GenomicRanges_1.63.1        Seqinfo_1.1.0              
##  [9] IRanges_2.45.0              S4Vectors_0.49.0           
## [11] BiocGenerics_0.57.0         generics_0.1.4             
## [13] MatrixGenerics_1.23.0       matrixStats_1.5.0          
## [15] BiocStyle_2.39.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        jsonlite_2.0.0           
##   [3] shape_1.4.6.1             magrittr_2.0.4           
##   [5] ggbeeswarm_0.7.3          magick_2.9.0             
##   [7] farver_2.1.2              rmarkdown_2.30           
##   [9] vctrs_0.7.1               DelayedMatrixStats_1.33.0
##  [11] tinytex_0.58              rstatix_0.7.3            
##  [13] htmltools_0.5.9           S4Arrays_1.11.1          
##  [15] BiocNeighbors_2.5.2       broom_1.0.11             
##  [17] Rhdf5lib_1.33.0           SparseArray_1.11.10      
##  [19] Formula_1.2-5             rhdf5_2.55.12            
##  [21] sass_0.4.10               KernSmooth_2.23-26       
##  [23] bslib_0.10.0              cachem_1.1.0             
##  [25] lifecycle_1.0.5           iterators_1.0.14         
##  [27] pkgconfig_2.0.3           rsvd_1.0.5               
##  [29] Matrix_1.7-4              R6_2.6.1                 
##  [31] fastmap_1.2.0             digest_0.6.39            
##  [33] scater_1.39.2             dqrng_0.4.1              
##  [35] irlba_2.3.5.1             ggpubr_0.6.2             
##  [37] beachmat_2.27.2           labeling_0.4.3           
##  [39] SpatialExperimentIO_1.3.0 abind_1.4-8              
##  [41] compiler_4.6.0            proxy_0.4-29             
##  [43] bit64_4.6.0-1             withr_3.0.2              
##  [45] S7_0.2.1                  backports_1.5.0          
##  [47] BiocParallel_1.45.0       carData_3.0-5            
##  [49] viridis_0.6.5             DBI_1.2.3                
##  [51] HDF5Array_1.39.0          R.utils_2.13.0           
##  [53] ggsignif_0.6.4            DelayedArray_0.37.0      
##  [55] rjson_0.2.23              classInt_0.4-11          
##  [57] tools_4.6.0               units_1.0-0              
##  [59] vipor_0.4.7               otel_0.2.0               
##  [61] beeswarm_0.4.0            R.oo_1.27.1              
##  [63] glue_1.8.0                h5mread_1.3.1            
##  [65] rhdf5filters_1.23.3       grid_4.6.0               
##  [67] sf_1.0-24                 gtable_0.3.6             
##  [69] R.methodsS3_1.8.2         class_7.3-23             
##  [71] tidyr_1.3.2               data.table_1.18.0        
##  [73] BiocSingular_1.27.1       ScaledMatrix_1.19.0      
##  [75] car_3.1-3                 XVector_0.51.0           
##  [77] ggrepel_0.9.6             foreach_1.5.2            
##  [79] pillar_1.11.1             limma_3.67.0             
##  [81] robustbase_0.99-6         splines_4.6.0            
##  [83] dplyr_1.1.4               lattice_0.22-7           
##  [85] survival_3.8-6            bit_4.6.0                
##  [87] tidyselect_1.2.1          locfit_1.5-9.12          
##  [89] scuttle_1.21.0            sfheaders_0.4.5          
##  [91] knitr_1.51                gridExtra_2.3            
##  [93] bookdown_0.46             edgeR_4.9.2              
##  [95] xfun_0.56                 statmod_1.5.1            
##  [97] DropletUtils_1.31.0       DEoptimR_1.1-4           
##  [99] yaml_2.3.12               evaluate_1.0.5           
## [101] codetools_0.2-20          tibble_3.3.1             
## [103] BiocManager_1.30.27       cli_3.6.5                
## [105] arrow_23.0.0              jquerylib_0.1.4          
## [107] dichromat_2.0-0.1         Rcpp_1.1.1               
## [109] parallel_4.6.0            assertthat_0.2.1         
## [111] sparseMatrixStats_1.23.0  glmnet_4.1-10            
## [113] viridisLite_0.4.2         scales_1.4.0             
## [115] e1071_1.7-17              purrr_1.2.1              
## [117] rlang_1.1.7               cowplot_1.2.0