1 Introduction

This vignette introduces the SpaceTrooper package for preprocessing and quality control of imaging-based spatial omics from platforms like CosMx on Protein assay.

2 Installation

To install SpaceTrooper, use the following commands:

# Install BiocManager if not already installed, then install SpaceTrooper
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("drighelli/SpaceTrooper")

3 Data Loading

In this section, we load data from various platforms using the package’s functions. The goal is to provide a uniform SpatialExperiment object across all technologies, allowing for consistent QC analysis.

The functions in SpaceTrooper compute missing metrics as needed and allow for the inclusion of polygons with the keep_polygons argument. This stores polygons in the colData of the SpatialExperiment.

# Load the SpaceTrooper library
library(SpaceTrooper)

# Load Xenium data into a Spatial Experiment object (SPE)
protfolder <- system.file( "extdata", "S01_prot", package="SpaceTrooper")
(spe <- readCosmxProteinSPE(protfolder))
## class: SpatialExperiment 
## dim: 69 745 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(745): f200_c10 f200_c100 ... f201_c96 f201_c98
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
colData(spe)
## DataFrame with 745 rows and 58 columns
##                 fov    cellID      Area AspectRatio     Width    Height
##           <integer> <integer> <integer>   <numeric> <integer> <integer>
## f200_c10        200        10      1596        0.55        58        32
## f200_c100       200       100      5855        0.92       101        93
## f200_c102       200       102      9468        0.89       106       119
## f200_c104       200       104      4888        0.84        90        76
## f200_c106       200       106      5757        0.74       106        78
## ...             ...       ...       ...         ...       ...       ...
## f201_c90        201        90      4425        0.93        83        77
## f201_c92        201        92      9563        0.83       108       130
## f201_c94        201        94      5388        0.91        92        84
## f201_c96        201        96      5587        0.79        76        96
## f201_c98        201        98     76154        0.81       322       396
##           Mean.PanCK Max.PanCK Mean.CD68  Max.CD68 Mean.Membrane Max.Membrane
##            <integer> <integer> <integer> <integer>     <integer>    <integer>
## f200_c10         896      1184       350       604          1475         2776
## f200_c100        809      1120       320       636          1918         4380
## f200_c102        556      1948       246      5816           375         3108
## f200_c104        694      7348       272       472          1264         3212
## f200_c106        612      1216       338      1548           527         2248
## ...              ...       ...       ...       ...           ...          ...
## f201_c90        1226      1516       499       696           653         1216
## f201_c92         570      1172       223       404           261          880
## f201_c94         484      1088       208       400           506         1516
## f201_c96         903      1564       395      4584           688         3412
## f201_c98        1047      2608       423      4552           801         3540
##           Mean.CD45  Max.CD45 Mean.DAPI  Max.DAPI SplitRatioToLocal   NucArea
##           <integer> <integer> <integer> <integer>         <numeric> <integer>
## f200_c10      14940     31860      4901      7524              0.19      1292
## f200_c100     15531     33192      5157      8604              0.00      4696
## f200_c102       284      4976      1710      5544              0.00      3704
## f200_c104     12480     35720      3399      5944              0.00      3032
## f200_c106       268       668       202       320              0.00         0
## ...             ...       ...       ...       ...               ...       ...
## f201_c90        389       808       321       380                 0         0
## f201_c92        211       532       172       296                 0         0
## f201_c94        241       476       148       228                 0         0
## f201_c96        357      5088       964      3736                 0      1936
## f201_c98        373      4060       309      2580                 0      1508
##           NucAspectRatio Circularity Eccentricity Perimeter  Solidity
##                <numeric>   <numeric>    <numeric> <integer> <numeric>
## f200_c10            0.67        3.87         0.66        72     22.17
## f200_c100           0.98        0.93         0.56       282     20.76
## f200_c102           0.76        1.02         0.67       342     27.68
## f200_c104           0.97        0.92         0.80       259     18.87
## f200_c106           0.00        0.92         0.63       281     20.49
## ...                  ...         ...          ...       ...       ...
## f201_c90            0.00        0.95         0.69       242     18.29
## f201_c92            0.00        0.92         0.70       361     26.49
## f201_c94            0.00        0.96         0.72       265     20.33
## f201_c96            0.69        1.00         0.68       265     21.08
## f201_c98            0.80        0.74         0.72      1135     67.10
##               cell_id         X     version   dualfiles    Run_name
##           <character> <integer> <character> <character> <character>
## f200_c10     f200_c10         1          v6           ?        Run0
## f200_c100   f200_c100         1          v6           ?        Run0
## f200_c102   f200_c102         1          v6           ?        Run0
## f200_c104   f200_c104         1          v6           ?        Run0
## f200_c106   f200_c106         1          v6           ?        Run0
## ...               ...       ...         ...         ...         ...
## f201_c90     f201_c90         1          v6           ?        Run0
## f201_c92     f201_c92         1          v6           ?        Run0
## f201_c94     f201_c94         1          v6           ?        Run0
## f201_c96     f201_c96         1          v6           ?        Run0
## f201_c98     f201_c98         1          v6           ?        Run0
##           Run_Tissue_name ISH.concentration        Dash      tissue       Panel
##               <character>       <character> <character> <character> <character>
## f200_c10               S0               1nM       PILOT      tissue         WTx
## f200_c100              S0               1nM       PILOT      tissue         WTx
## f200_c102              S0               1nM       PILOT      tissue         WTx
## f200_c104              S0               1nM       PILOT      tissue         WTx
## f200_c106              S0               1nM       PILOT      tissue         WTx
## ...                   ...               ...         ...         ...         ...
## f201_c90               S0               1nM       PILOT      tissue         WTx
## f201_c92               S0               1nM       PILOT      tissue         WTx
## f201_c94               S0               1nM       PILOT      tissue         WTx
## f201_c96               S0               1nM       PILOT      tissue         WTx
## f201_c98               S0               1nM       PILOT      tissue         WTx
##            assay_type  slide_ID median_RNA RNA_quantile_0.75 RNA_quantile_0.8
##           <character> <integer>  <numeric>         <numeric>        <numeric>
## f200_c10      protein         1    24859.4            141202           249217
## f200_c100     protein         1    24859.4            141202           249217
## f200_c102     protein         1    24859.4            141202           249217
## f200_c104     protein         1    24859.4            141202           249217
## f200_c106     protein         1    24859.4            141202           249217
## ...               ...       ...        ...               ...              ...
## f201_c90      protein         1    11016.9           45561.7          56296.9
## f201_c92      protein         1    11016.9           45561.7          56296.9
## f201_c94      protein         1    11016.9           45561.7          56296.9
## f201_c96      protein         1    11016.9           45561.7          56296.9
## f201_c98      protein         1    11016.9           45561.7          56296.9
##           RNA_quantile_0.85 RNA_quantile_0.9 RNA_quantile_0.95
##                   <numeric>        <numeric>         <numeric>
## f200_c10             439292           582926           1175591
## f200_c100            439292           582926           1175591
## f200_c102            439292           582926           1175591
## f200_c104            439292           582926           1175591
## f200_c106            439292           582926           1175591
## ...                     ...              ...               ...
## f201_c90             140621           208723            361311
## f201_c92             140621           208723            361311
## f201_c94             140621           208723            361311
## f201_c96             140621           208723            361311
## f201_c98             140621           208723            361311
##           RNA_quantile_0.99 nCount_RNA nFeature_RNA median_negprobes
##                   <numeric>  <numeric>    <integer>        <numeric>
## f200_c10            2758078   36035.65           67          6432.11
## f200_c100           2758078   39097.62           67          6432.11
## f200_c102           2758078    9059.81           67          6432.11
## f200_c104           2758078   30541.70           67          6432.11
## f200_c106           2758078    8988.49           67          6432.11
## ...                     ...        ...          ...              ...
## f201_c90            1033930    6648.66           67          4190.43
## f201_c92            1033930    6344.07           67          4190.43
## f201_c94            1033930    7575.43           67          4190.43
## f201_c96            1033930    7713.20           67          4190.43
## f201_c98            1033930    6502.25           67          4190.43
##           negprobes_quantile_0.75 negprobes_quantile_0.8
##                         <numeric>              <numeric>
## f200_c10                  7810.51                8086.19
## f200_c100                 7810.51                8086.19
## f200_c102                 7810.51                8086.19
## f200_c104                 7810.51                8086.19
## f200_c106                 7810.51                8086.19
## ...                           ...                    ...
## f201_c90                  5042.51                5212.92
## f201_c92                  5042.51                5212.92
## f201_c94                  5042.51                5212.92
## f201_c96                  5042.51                5212.92
## f201_c98                  5042.51                5212.92
##           negprobes_quantile_0.85 negprobes_quantile_0.9
##                         <numeric>              <numeric>
## f200_c10                  8361.87                8637.55
## f200_c100                 8361.87                8637.55
## f200_c102                 8361.87                8637.55
## f200_c104                 8361.87                8637.55
## f200_c106                 8361.87                8637.55
## ...                           ...                    ...
## f201_c90                  5383.34                5553.76
## f201_c92                  5383.34                5553.76
## f201_c94                  5383.34                5553.76
## f201_c96                  5383.34                5553.76
## f201_c98                  5383.34                5553.76
##           negprobes_quantile_0.95 negprobes_quantile_0.99 nCount_negprobes
##                         <numeric>               <numeric>        <numeric>
## f200_c10                  8913.23                 9133.77            15.20
## f200_c100                 8913.23                 9133.77            14.72
## f200_c102                 8913.23                 9133.77            15.70
## f200_c104                 8913.23                 9133.77            12.46
## f200_c106                 8913.23                 9133.77            20.58
## ...                           ...                     ...              ...
## f201_c90                  5724.17                 5860.51            23.49
## f201_c92                  5724.17                 5860.51            14.87
## f201_c94                  5724.17                 5860.51            13.29
## f201_c96                  5724.17                 5860.51            21.39
## f201_c98                  5724.17                 5860.51            21.85
##           nFeature_negprobes  Area.um2 CenterX_local_px CenterY_local_px
##                    <integer> <numeric>        <integer>        <integer>
## f200_c10                   2   22.9824             1365               16
## f200_c100                  2   84.3120              762              204
## f200_c102                  2  136.3392             2837              205
## f200_c104                  2   70.3872             1443              202
## f200_c106                  2   82.9008             3947              191
## ...                      ...       ...              ...              ...
## f201_c90                   2   63.7200             3576              787
## f201_c92                   2  137.7072              128              837
## f201_c94                   2   77.5872             2364              844
## f201_c96                   2   80.4528             2772              870
## f201_c98                   2 1096.6176             1923             1048
##                  cell   sample_id
##           <character> <character>
## f200_c10   c_1_200_10    sample01
## f200_c100 c_1_200_100    sample01
## f200_c102 c_1_200_102    sample01
## f200_c104 c_1_200_104    sample01
## f200_c106 c_1_200_106    sample01
## ...               ...         ...
## f201_c90   c_1_201_90    sample01
## f201_c92   c_1_201_92    sample01
## f201_c94   c_1_201_94    sample01
## f201_c96   c_1_201_96    sample01
## f201_c98   c_1_201_98    sample01

4 Data Analysis for CosMx protein

The package offers several functions for spatial data analysis, including quality control and visualization.

This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx.

Additionally, even if not tested, the same approach can be extended on Akoya CODEX data, as far as a SpatialExperiment object is created. Polygons can be loaded later if needed.

5 Field of Views (FOVs) Visualization

The plotCellsFovs function shows a map of the FoVs within an experiment. This plot is specific to CosMx data and uses cell centroids.

Please keep in mind, that this specific experiment had unaligned fov_positions and cell centroids positions. An alignment approach, can be found at the end of the scripts/datacreation.R file.

# Plot the cells within their respective Field of Views (FOVs)
plotCellsFovs(spe)

Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple.

When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers.

6 Quality control

The spatialPerCellQC function, inspired by scater::addPerCellQC, computes additional metrics for each cell in the SpatialExperiment. It also allows for the detection of negative control probes, which is crucial for QC.

By default, it automatically removes 0 counts cells, but this can be handled with the rmZeros argument.

Here, for transparency, we specified the negProbList for CosMx protein assays, but the algorithm has already a set of negative probes for the mostly used probes in multiple technologies. Notice that despite the same approach can be applied to CODEX data, it is not provided a list of negative probes for this technology, so the user needs to specify them.

# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG"))
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"

7 Metrics Histograms

You can investigate individual metrics by viewing their histograms. For outliers, use the use_fences argument to display the fences computed by computeSpatialOutlier (see next chunk).

# Plot a histogram of counts (sum)
plotMetricHist(spe, metric="sum")

# Plot a histogram of cell areas (Area_um)
plotMetricHist(spe, metric="Area_um")

# Plot a histogram of proportion of counts respect to the cell area in micron 
plotMetricHist(spe, metric="log2CountArea")

# Plot a histogram of proportion of negative probe counts respect to the total
# counts in cells
plotMetricHist(spe, metric="log2Ctrl_total_ratio")

These plots show, respectively, the distributions of the total counts (sum), of the Area in micron (Area_um), the relationship between the counts and the Area of each cell (log2CountArea) and the proportion between the negative probes counts and the total counts of each cell (log2Ctrl_total_ratio).

8 Spatial Outlier Detection

Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.

The computeSpatialOutlier function allows the computation of the medcouple statistics on a specified metric (compute_by argument). The medcouple is specifically designed for asymmetric distributions, indeed the function stamps a warning message when this requisite is not satisfied. It can also use scuttle::isOutlier for symmetric distributions. The method argument supports mc, scuttle, or both.

This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.

# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both")

# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both")
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"

If we computed outliers with the computeSpatialOutlier function, we can also visualize which fences have been used to create the filter on the cells.

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc")

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc")

We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected.

If we want, we can already use these fences to remove the computed outliers.

9 The Quality Score

Next, we use computeQCScore to calculate a flag score based on previously computed metrics. The quality score combines transcript counts related to cell area, the aspect ratio of each cell, and its distance from the FoV border (only for CosMx, this last one is not used otherwise).

See the help(computeQCScore) details section for additional details.

# Calculate quality scores for each cell
spe <- computeQCScore(spe)
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"     "QC_score"

Logical filters can then be computed using computeQScoreFlags, which requires thresholds for various metrics. Currently, the function considers:

  • Quality Score (qsThreshold): Cells with scores below this threshold (default 0.5) are flagged for exclusion. This value can be used to indicate the quantile for the filtering when setting the useQSQuantiles argument to TRUE.

  • Quality Score Quantiles (useQSQuantiles): Option to filter based on quantiles (default FALSE).

# Compute flags to identify cells for filtering
spe <- computeQCScoreFlags(spe, qsThreshold=0.5)
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"     "QC_score"                
## [89] "low_qcscore"
table(spe$low_qcscore)
## 
## FALSE  TRUE 
##   688    57

We detected 61 cells to be removed.

10 Additional metrics to filter out cells

While for other metrics such as the total counts and the negative prob ratio, the function computeThresholdFlags considers:

  • Total Counts (totalThreshold): Minimum count threshold (default 0).

  • Negative Probe Ratio (ctrlTotRatioThreshold): Minimum ratio of negative probes to total counts (default 0.1).

spe <- computeThresholdFlags(spe, totalThreshold=0, 
                                ctrlTotRatioThreshold=0.1)
table(spe$threshold_flags)
## 
## FALSE 
##   745

In this example, we don’t find any cell to be removed.

11 Adding Polygon and Visualization

To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics.

We can load and add polygons to the SPE object using the following functions. Each technology has its own readPolygons function to standardize the loaded sf object and handle different file types.

# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)

# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)

Once the polygons are stored in an sf object within colData, they can be visualized using functions based on the ggplot2 library.

# Plot the polygons of the selected cells
plotPolygons(spe, bgColor="white")

Showing the cells on a white background for better visualization.

# Plot polygons colored by cell area
plotPolygons(spe, colourBy="log2AspectRatio")

plotPolygons(spe, colourBy="Area_um")

We can see in yellow and darkviolet that there are few cells with extreme values of log2AspectRatio and Area:um in micron.

plotPolygons(spe, colourBy="QC_score")

plotPolygons(spe, colourBy="low_qcscore")

We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation.

We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells.

12 Fov Zoom and Map

The plotZoomFovsMap function allows you to visualize a map of the FoVs with a zoom-in of selected FoVs, colored by the colour_by argument.

plotZoomFovsMap(spe, fovs=c(201), colourBy="QC_score")

plotZoomFovsMap(spe, fovs=c(201), colourBy="low_qcscore")

We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment.

13 Conclusion

In this vignette, we explored the main functionalities of the SpaceTrooper package for spatial data analysis. Main steps shown are: * data and polygons loading for CosMx Protein, CosMx * quality control: + outlier detection: medcouple and scuttle MAD + flag score: a score combining transcript counts, cell area, aspect ratio and distance from the FoV border * visualization: + centroids: with ggplot2 + polygons: sf + ggplot2

14 Session Information

sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpaceTrooper_0.99.47        SpatialExperiment_1.19.1   
##  [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.2
##  [5] Biobase_2.69.1              GenomicRanges_1.61.5       
##  [7] Seqinfo_0.99.2              IRanges_2.43.5             
##  [9] S4Vectors_0.47.4            BiocGenerics_0.55.4        
## [11] generics_0.1.4              MatrixGenerics_1.21.0      
## [13] matrixStats_1.5.0           BiocStyle_2.37.1           
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3                 gridExtra_2.3            
##   [3] rlang_1.1.6               magrittr_2.0.4           
##   [5] scater_1.37.0             e1071_1.7-16             
##   [7] compiler_4.5.1            DelayedMatrixStats_1.31.0
##   [9] SpatialExperimentIO_1.1.0 sfheaders_0.4.4          
##  [11] vctrs_0.6.5               shape_1.4.6.1            
##  [13] pkgconfig_2.0.3           crayon_1.5.3             
##  [15] fastmap_1.2.0             backports_1.5.0          
##  [17] magick_2.9.0              XVector_0.49.1           
##  [19] labeling_0.4.3            scuttle_1.19.0           
##  [21] rmarkdown_2.30            ggbeeswarm_0.7.2         
##  [23] tinytex_0.57              purrr_1.1.0              
##  [25] bit_4.6.0                 glmnet_4.1-10            
##  [27] xfun_0.53                 cachem_1.1.0             
##  [29] beachmat_2.25.5           jsonlite_2.0.0           
##  [31] rhdf5filters_1.21.4       DelayedArray_0.35.3      
##  [33] Rhdf5lib_1.31.1           BiocParallel_1.43.4      
##  [35] irlba_2.3.5.1             broom_1.0.10             
##  [37] parallel_4.5.1            R6_2.6.1                 
##  [39] bslib_0.9.0               RColorBrewer_1.1-3       
##  [41] limma_3.65.7              car_3.1-3                
##  [43] jquerylib_0.1.4           iterators_1.0.14         
##  [45] Rcpp_1.1.0                bookdown_0.45            
##  [47] assertthat_0.2.1          knitr_1.50               
##  [49] R.utils_2.13.0            splines_4.5.1            
##  [51] Matrix_1.7-4              tidyselect_1.2.1         
##  [53] viridis_0.6.5             dichromat_2.0-0.1        
##  [55] abind_1.4-8               yaml_2.3.10              
##  [57] codetools_0.2-20          lattice_0.22-7           
##  [59] tibble_3.3.0              withr_3.0.2              
##  [61] S7_0.2.0                  evaluate_1.0.5           
##  [63] sf_1.0-21                 survival_3.8-3           
##  [65] units_1.0-0               proxy_0.4-27             
##  [67] pillar_1.11.1             BiocManager_1.30.26      
##  [69] ggpubr_0.6.2              carData_3.0-5            
##  [71] KernSmooth_2.23-26        foreach_1.5.2            
##  [73] ggplot2_4.0.0             sparseMatrixStats_1.21.0 
##  [75] scales_1.4.0              class_7.3-23             
##  [77] glue_1.8.0                tools_4.5.1              
##  [79] BiocNeighbors_2.3.1       robustbase_0.99-6        
##  [81] data.table_1.17.8         ScaledMatrix_1.17.0      
##  [83] locfit_1.5-9.12           ggsignif_0.6.4           
##  [85] cowplot_1.2.0             rhdf5_2.53.6             
##  [87] grid_4.5.1                tidyr_1.3.1              
##  [89] DropletUtils_1.29.7       edgeR_4.7.6              
##  [91] beeswarm_0.4.0            BiocSingular_1.25.0      
##  [93] HDF5Array_1.37.0          vipor_0.4.7              
##  [95] rsvd_1.0.5                Formula_1.2-5            
##  [97] cli_3.6.5                 viridisLite_0.4.2        
##  [99] S4Arrays_1.9.1            arrow_21.0.0.1           
## [101] dplyr_1.1.4               DEoptimR_1.1-4           
## [103] gtable_0.3.6              R.methodsS3_1.8.2        
## [105] rstatix_0.7.3             sass_0.4.10              
## [107] digest_0.6.37             classInt_0.4-11          
## [109] ggrepel_0.9.6             SparseArray_1.9.1        
## [111] dqrng_0.4.1               rjson_0.2.23             
## [113] farver_2.1.2              htmltools_0.5.8.1        
## [115] R.oo_1.27.1               lifecycle_1.0.4          
## [117] h5mread_1.1.1             statmod_1.5.1            
## [119] bit64_4.6.0-1