You can install the latest version of spacexr from Bioconductor:
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spacexr")Robust Cell Type Decomposition (RCTD) is a statistical method for decomposing cell type mixtures in spatial transcriptomics data. In this vignette, we will use a simulated dataset to demonstrate how you can run RCTD on spatial transcriptomics data and visualize your results.
We will use the term pixel, synonymous with spot or bead, to refer to any observation that measures gene expression at a fixed location. We do not use the term cell because, in spatial transcriptomics data, single pixels may contain mixtures of multiple cells. The purpose of RCTD is to estimate the proportions of different cell types in each pixel of a dataset.
library(ggplot2)
library(SpatialExperiment)
library(SummarizedExperiment)
library(spacexr)Before running RCTD, we must first load in:
The reference data should be stored in a SummarizedExperiment object with:
Required components:
assay: A matrix (or dgCMatrix) of gene expression counts, with genes as
rows and cells as columns. Row names should be unique gene names, and column
names should be unique cell barcodes. Counts should be untransformed integers.
colData: A factor column containing cell type annotations for each cell in
the reference. The column name is assumed to be “cell_type”, but another name
may be specified using the cell_type_col parameter in createRctd.
Optional components:
colData: A numeric column named nUMI, containing total UMI counts for
each cell. If not provided, nUMI will be calculated as the column sums of the
counts matrix.In practice, your reference could be an annotated dataset from:
Here, we load counts and cell type annotations from the rctdSim
object, but in a real-world setting, the reference information may be
distributed across several files or R objects.
# Load simulated data.
data("rctdSim")
# Create SummarizedExperiment.
reference_se <- SummarizedExperiment(
    assays = list(counts = rctdSim$reference_counts),
    colData = rctdSim$reference_cell_types
)
# Examine reference. (optional)
print(dim(assay(reference_se))) # Gene expression matrix dimensions
#> [1] 750  75
table(colData(reference_se)$cell_type) # Number of occurrences of each cell type
#> 
#> ct1 ct2 ct3 
#>  25  25  25In this vignette, we have a reference dataset of 750 genes for 75 cells. We
find that there are three cell types (ct1, ct2, and ct3), each appearing
25 times in the reference.
Next, we will load the spatial transcriptomics data into a SpatialExperiment object, though we could use any SummarizedExperiment object with:
Required components:
assay: A matrix (or dgCMatrix) of gene expression counts, with genes as
rows and pixels as columns. Row names should be unique gene names, and column
names should be unique pixel barcodes. Counts should be untransformed integers.Optional components:
spatialCoords: A matrix containing x and y coordinates for each pixel.
colData: A numeric column named nUMI, containing total UMI counts for
each pixel. If not provided, nUMI will be calculated as the column sums of the
counts matrix.
Here, we load coordinates and counts from the rctdSim object. Again,
in practice, this step will depend on how you store your spatial transcriptomics
data.
# Create SpatialExperiment.
spatial_spe <- SpatialExperiment(
    assay = rctdSim$spatial_rna_counts,
    spatialCoords = rctdSim$spatial_rna_coords
)
# Examine pixels. (optional)
print(dim(assay(spatial_spe))) # Gene expression matrix dimensions
#> [1] 750  12
ggplot(spatialCoords(spatial_spe), aes(x, y)) + 
    geom_point(size = 5) + 
    theme(axis.text = element_blank(), axis.ticks = element_blank()
)
We find that we have a dataset of 750 genes for 12 pixels, plotted above.
The ground truth for our cell type mixtures is given in the
SpatialExperiment object rctdSim$proportions_spe.
(The ground truth has been stored in the same format as the RCTD output, which
will be discussed later.) Note that this ground truth will not be known in
practice—it is what we will need to estimate with RCTD.
unique(t(assay(rctdSim$proportions_spe)))
#>      ct1 ct2 ct3
#> mx11 0.9 0.1 0.0
#> mx21 0.2 0.4 0.4We find that there are two distinct cell type mixtures in our dataset:
ct1 and 10% ct2.ct1 and 40% ct2 and ct3 each.We can visualize these mixtures with the plotAllWeights function, which
plots each pixel as a pie chart of cell types. We will return to this function
later to visualize our RCTD results.
plotAllWeights(
    rctdSim$proportions_spe,
    r = 0.05, lwd = 0, title = "Ground Truth"
)
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the spacexr package.
#>   Please report the issue at <https://github.com/ggrajeda/spacexr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
Running RCTD is a two-step process:
createRctd function, which:
runRctd.Before running RCTD, we perform some preprocessing with createRctd. Some
important configuration options include:
gene_cutoff, fc_cutoff: Used for filtering genes in platform effect
normalization. gene_cutoff filters for average expression and fc_cutoff
filters for log fold change across cell types.gene_cutoff_reg, fc_cutoff_reg: Similar to above, but for the RCTD step.UMI_min, UMI_max: Minimum and maximum UMI count per pixel.# Preprocess data
rctd_data <- createRctd(spatial_spe, reference_se)Now we can run RCTD on the preprocessed data with the runRctd function.
RCTD has three modes, each suited for different spatial technologies:
"doublet": Assigns 1-2 cell types per pixel. Best for high-resolution
technologies like Slide-seq and MERFISH."multi": Like doublet mode but can assign more cell types (up to
max_multi_types). Best for lower-resolution technologies like 100-micron
Visium."full": No restrictions on number of cell types.The rctd_mode argument is used to specify the mode in which the RCTD algorithm
runs. Other important configuration options include:
max_cores: Number of cores used for parallel processing. We recommend
setting this to at least 4 to improve efficiency. If set to 1, parallel
processing is not used.max_multi_types: Maximum number of cell types per pixel (multi mode only).For the purposes of demonstration, we’ll decompose the data into doublets in this vignette.
results_spe <- runRctd(rctd_data, rctd_mode = "doublet", max_cores = 1)RCTD returns a SpatialExperiment object containing:
Three assays (one in full mode):
weights: Cell type proportions restricted according to the specified
modeweights_unconfident: Cell type proportions restricted according to the
specified mode, including unconfident predictions (not available in full
mode)weights_full: Unrestricted cell type proportions (not available in full
mode, use weights instead)Assays have cell types as rows and pixels as columns, with values representing the proportion (0 to 1) of each cell type in each pixel. Assay columns sum to 1 (except for rejected pixels, which sum to 0).
NOTE: Weights are transposed compared to the output of the original dmcable/spacexr package on GitHub.
Spatial coordinates for each pixel stored in spatialCoords.
Additional colData including:
spot_class: Classification as singlet, doublet_certain,
doublet_uncertain, or rejectfirst_type, second_type: Predicted cell typesmin_score, singlet_score for advanced
userscell_type_list: List of cell types per pixelconf_list: List of whether cell type predictions are confidentmin_score for advanced usersWe can view the doublet weights for our data:
# Cell type mixture 1
assay(results_spe, "weights")[, 1:6]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>     mx11 mx12 mx13 mx14 mx15 mx16
#> ct1    1    1    1    1    1    1
#> ct2    .    .    .    .    .    .
#> ct3    .    .    .    .    .    .
# Cell type mixture 2
assay(results_spe, "weights")[, 7:12]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>          mx21     mx22      mx23      mx24      mx25      mx26
#> ct1 .         .        .         .         .         .        
#> ct2 0.4935949 0.475943 0.4685428 0.4577326 0.5103718 0.4795314
#> ct3 0.5064051 0.524057 0.5314572 0.5422674 0.4896282 0.5204686You can access the full, unrestricted weights using
assay(results_spe, "weights_full"). We will visualize these weights later. You
may use assay(results_spe, "weights_unconfident") to access the less confident
predictions (e.g., the second_type weight for singlets and uncertain
doublets).
Below, we can view the results of RCTD deconvolution:
classification_df <- data.frame(
    pixel = colnames(assay(results_spe)),
    spot_class = colData(results_spe)$spot_class,
    first_type = colData(results_spe)$first_type,
    second_type = colData(results_spe)$second_type
)
classification_df
#>    pixel      spot_class first_type second_type
#> 1   mx11         singlet        ct1         ct2
#> 2   mx12         singlet        ct1         ct2
#> 3   mx13         singlet        ct1         ct2
#> 4   mx14         singlet        ct1         ct2
#> 5   mx15         singlet        ct1         ct2
#> 6   mx16         singlet        ct1         ct2
#> 7   mx21 doublet_certain        ct2         ct3
#> 8   mx22 doublet_certain        ct3         ct2
#> 9   mx23 doublet_certain        ct3         ct2
#> 10  mx24 doublet_certain        ct3         ct2
#> 11  mx25 doublet_certain        ct2         ct3
#> 12  mx26 doublet_certain        ct3         ct2In particular, this data frame provides us with the following information:
spot_class: A factor variable with the following levels of classification:
singlet (1 cell type in the pixel)doublet_certain (2 cell types in the pixel)doublet_uncertain (2 cell types in the pixel, but only confident of 1)reject (no prediction given for pixel)reject pixels should not be used, and doublet_uncertain
pixels should only be used for applications that do not require knowledge of
all cell types in a pixel.first_type: the first cell type predicted on the bead (for all spot_class
conditions except reject).second_type: the second cell type predicted on the bead for doublet
spot_class conditions (not a confident prediction for doublet_uncertain).We can see that RCTD generally succeeded in identifying the components of the
cell mixtures, although pixels with 90% ct1 and 10% ct2 were classified
as singlets. (As an exercise, try changing the doublet_threshold parameter
in runRctd so that these mixtures will be identified as doublets.)
We can visualize the results of our RCTD run with the plotAllWeights
function. Simply pass in the RCTD results (and some aesthetic arguments), and
the cell type proportions will be plotted as pie charts for each pixel.
plotAllWeights(results_spe, r = 0.05, lwd = 0, title = "Doublet Results")
Note that all mixtures with 90% ct1 (on the left) were classified as singlets,
so only ct1 is plotted. And for the other mixtures (on the right), only cell
types ct2 and ct3 are plotted (without any ct1) due to the restrictions of
doublet mode. We can visualize the full, unrestricted proportions by passing in
the name of the relevant assay ("weights_full") to the plotAllWeights
function:
plotAllWeights(
    results_spe, "weights_full", r = 0.05, lwd = 0, title = "Full Results"
)
This is very close to the ground truth!
Finally, we can visualize how the proportion of a specific cell type varies
across space with the plotCellTypeWeight function. We show an example with
the dummy cell type ct1.
plotCellTypeWeight(
    results_spe, "ct1", "weights_full",
    size = 5, stroke = 0.5, title = "\nDensity of Cell Type 1\n"
)
You are now ready to apply RCTD to your own spatial transcriptomics datasets!
sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] spacexr_1.1.1               SpatialExperiment_1.19.1   
#>  [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.2
#>  [5] Biobase_2.69.1              GenomicRanges_1.61.5       
#>  [7] Seqinfo_0.99.2              IRanges_2.43.5             
#>  [9] S4Vectors_0.47.4            BiocGenerics_0.55.3        
#> [11] generics_0.1.4              MatrixGenerics_1.21.0      
#> [13] matrixStats_1.5.0           ggplot2_4.0.0              
#> [15] BiocStyle_2.37.1           
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1     dplyr_1.1.4          farver_2.1.2        
#>  [4] blob_1.2.4           filelock_1.0.3       S7_0.2.0            
#>  [7] fastmap_1.2.0        BiocFileCache_2.99.6 tweenr_2.0.3        
#> [10] digest_0.6.37        lifecycle_1.0.4      RSQLite_2.4.3       
#> [13] magrittr_2.0.4       compiler_4.5.1       rlang_1.1.6         
#> [16] sass_0.4.10          tools_4.5.1          yaml_2.3.10         
#> [19] knitr_1.50           S4Arrays_1.9.1       labeling_0.4.3      
#> [22] bit_4.6.0            curl_7.0.0           DelayedArray_0.35.3 
#> [25] RColorBrewer_1.1-3   abind_1.4-8          withr_3.0.2         
#> [28] purrr_1.1.0          grid_4.5.1           polyclip_1.10-7     
#> [31] scales_1.4.0         MASS_7.3-65          dichromat_2.0-0.1   
#> [34] tinytex_0.57         cli_3.6.5            rmarkdown_2.30      
#> [37] crayon_1.5.3         rjson_0.2.23         DBI_1.2.3           
#> [40] cachem_1.1.0         ggforce_0.5.0        BiocManager_1.30.26 
#> [43] XVector_0.49.1       yulab.utils_0.2.1    vctrs_0.6.5         
#> [46] Matrix_1.7-4         jsonlite_2.0.0       bookdown_0.45       
#> [49] bit64_4.6.0-1        magick_2.9.0         tidyr_1.3.1         
#> [52] jquerylib_0.1.4      glue_1.8.0           gtable_0.3.6        
#> [55] quadprog_1.5-8       tibble_3.3.0         pillar_1.11.1       
#> [58] rappdirs_0.3.3       htmltools_0.5.8.1    R6_2.6.1            
#> [61] dbplyr_2.5.1         httr2_1.2.1          evaluate_1.0.5      
#> [64] lattice_0.22-7       memoise_2.0.1        ggfun_0.2.0         
#> [67] bslib_0.9.0          Rcpp_1.1.0           SparseArray_1.9.1   
#> [70] scatterpie_0.2.6     xfun_0.53            fs_1.6.6            
#> [73] pkgconfig_2.0.3