This vignette demonstrates the use of the SpatialDecon package to estimate cell abundance in spatial gene expression studies.
We’ll analyze a dense GeoMx dataset from a lung tumor, looking for abundance of different immune cell types. This dataset has 30 ROIs. In each ROI, Tumor and Microenvironment segments have been profiled separately.
First, we load the package:
library(SpatialDecon)
library(GeomxTools)
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: NanoStringNCTools
#> Loading required package: S4Vectors
#> Loading required package: stats4
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
#> Loading required package: ggplot2
Now let’s load our example data and examine it:
data("nsclc")
dim(nsclc)
#> Features Samples
#> 1700 199
head(pData(nsclc))
#> Sample_ID Tissue Slide.name ROI AOI.name
#> ROI01Tumor ICP20th-L11-ICPKilo-ROI01-Tumor-A02 L11 ICPKilo ROI01 Tumor
#> ROI01TME ICP20th-L11-ICPKilo-ROI01-TME-A03 L11 ICPKilo ROI01 TME
#> ROI02Tumor ICP20th-L11-ICPKilo-ROI02-Tumor-A04 L11 ICPKilo ROI02 Tumor
#> ROI02TME ICP20th-L11-ICPKilo-ROI02-TME-A05 L11 ICPKilo ROI02 TME
#> ROI03Tumor ICP20th-L11-ICPKilo-ROI03-Tumor-A06 L11 ICPKilo ROI03 Tumor
#> ROI03TME ICP20th-L11-ICPKilo-ROI03-TME-A07 L11 ICPKilo ROI03 TME
#> AOI.annotation x y nuclei istumor
#> ROI01Tumor PanCK 0 8000 572 TRUE
#> ROI01TME TME 0 8000 733 FALSE
#> ROI02Tumor PanCK 600 8000 307 TRUE
#> ROI02TME TME 600 8000 697 FALSE
#> ROI03Tumor PanCK 1200 8000 583 TRUE
#> ROI03TME TME 1200 8000 484 FALSE
nsclc@assayData$exprs[seq_len(5), seq_len(5)]
#> ROI01Tumor ROI01TME ROI02Tumor ROI02TME ROI03Tumor
#> ABCF1 55 26 47 30 102
#> ABL1 21 22 27 18 47
#> ACVR1B 89 30 57 29 122
#> ACVR1C 9 7 4 8 14
#> ACVR2A 14 15 9 12 22
# better segment names:
sampleNames(nsclc) <-
paste0(nsclc$ROI, nsclc$AOI.name)
The spatialdecon function takes 3 arguments of expression data:
We estimate each data point’s expected background from the negative control probes from its corresponding observation: This study has two probesets, and the genes from each probeset will have distinct background values. The function “derive_GeoMx_background” conveniently estimates background of all data points, accounting for which probeset each gene belongs to:
A “cell profile matrix” is a pre-defined matrix that specifies the expected expression profiles of each cell type in the experiment. The SpatialDecon library comes with one such matrix pre-loaded, the “SafeTME” matrix, designed for estimation of immune and stroma cells in the tumor microenvironment. (This matrix was designed to avoid genes commonly expressed by cancer cells; see the SpatialDecon manuscript for details.)
Let’s take a glance at the safeTME matrix:
data("safeTME")
data("safeTME.matches")
signif(safeTME[seq_len(3), seq_len(3)], 2)
#> macrophages mast B.naive
#> A2M 0.8500 0.044 0.0043
#> ABCB1 0.0021 0.023 0.0250
#> ABCB4 0.0044 0.000 0.2200
heatmap(sweep(safeTME, 1, apply(safeTME, 1, max), "/"),
labRow = NA, margins = c(10, 5))
For studies of other tissue types, we have provided a library of cell profile matrices, available on Github and downloadable with the “download_profile_matrix” function.
For a complete list of matrices, see CellProfileLibrary GitHub Page.
Below we download a matrix of cell profiles derived from scRNA-seq of a mouse spleen.
mousespleen <- download_profile_matrix(species = "Mouse",
age_group = "Adult",
matrixname = "Spleen_MCA")
dim(mousespleen)
#> [1] 11125 9
mousespleen[1:4,1:4]
#> Dendritic.cell.S100a4.high Dendritic.cell.Siglech.high
#> 0610009B22Rik 0.02985075 0.0000000
#> 0610010F05Rik 0.00000000 0.0000000
#> 0610010K14Rik 0.02985075 0.0000000
#> 0610012G03Rik 0.08955224 0.1111111
#> Granulocyte Macrophage
#> 0610009B22Rik 0.00000000 0.00000000
#> 0610010F05Rik 0.00000000 0.00000000
#> 0610010K14Rik 0.00000000 0.03846154
#> 0610012G03Rik 0.08571429 0.03846154
head(cellGroups)
#> $Dendritic
#> [1] "Dendritic.cell.S100a4.high" "Dendritic.cell.Siglech.high"
#>
#> $Granulocyte
#> [1] "Granulocyte"
#>
#> $Macrophage
#> [1] "Macrophage"
#>
#> $`Marginal zone B`
#> [1] "Marginal.zone.B.cell"
#>
#> $Monocyte
#> [1] "Monocyte"
#>
#> $NK
#> [1] "NK.cell"
metadata
#> Profile Matrix Tissue Species Strain Age Age Group
#> 33 MCA Spleen Mouse C57BL/6 6-10 weeks Adult
#> URL
#> 33 https://pubmed.ncbi.nlm.nih.gov/29474909/
#> Citation
#> 33 Han, X. et al. Mapping the Mouse Cell Atlas by Microwell-Seq. Cell172, 1091-1107.e17 (2018).
heatmap(sweep(mousespleen, 1, apply(mousespleen, 1, max), "/"),
labRow = NA, margins = c(10, 5), cexCol = 0.7)
For studies where the provided cell profile matrices aren’t sufficient or if a specific single cell dataset is wanted, we can make a custom profile matrix using the function create_profile_matrix().
This mini single cell dataset is a fraction of the data from Kinchen, J. et al. Structural Remodeling of the Human Colonic Mesenchyme in Inflammatory Bowel Disease. Cell 175, 372-386.e17 (2018).
data("mini_singleCell_dataset")
mini_singleCell_dataset$mtx@Dim # genes x cells
#> [1] 1814 250
as.matrix(mini_singleCell_dataset$mtx)[1:4,1:4]
#> ACTGCTCGTAAGTTCC.S90 TGAAAGAAGGCGCTCT.S66 AGCTTGAGTTTGGGCC.S66
#> PLEKHN1 0 0 0
#> PERM1 0 0 0
#> C1orf159 0 0 0
#> TTLL10 0 0 0
#> ACGGCCATCGTCTGAA.S66
#> PLEKHN1 0
#> PERM1 0
#> C1orf159 0
#> TTLL10 0
head(mini_singleCell_dataset$annots)
#> CellID LabeledCellType
#> 2660 ACTGCTCGTAAGTTCC.S90 stromal cell
#> 2162 TGAAAGAAGGCGCTCT.S66 glial cell
#> 368 AGCTTGAGTTTGGGCC.S66 endothelial cell
#> 238 ACGGCCATCGTCTGAA.S66 stromal cell
#> 4158 TCTCTAACACTGTTAG.S90 stromal cell
#> 2611 ACGATGTGTGTGGTTT.S90 stromal cell
table(mini_singleCell_dataset$annots$LabeledCellType)
#>
#> endothelial cell glial cell
#> 14 12
#> pericyte cell plasma cell
#> 3 30
#> smooth muscle cell of colon stromal cell
#> 1 190
Pericyte cell and smooth muscle cell of colon will be dropped from this matrix due to low cell count. The average expression across all cells of one type is returned so the more cells of one type, the better reflection of the true gene expression. The confidence in these averages can be changed using the minCellNum filter.
custom_mtx <- create_profile_matrix(mtx = mini_singleCell_dataset$mtx, # cell x gene count matrix
cellAnnots = mini_singleCell_dataset$annots, # cell annotations with cell type and cell name as columns
cellTypeCol = "LabeledCellType", # column containing cell type
cellNameCol = "CellID", # column containing cell ID/name
matrixName = "custom_mini_colon", # name of final profile matrix
outDir = NULL, # path to desired output directory, set to NULL if matrix should not be written
normalize = FALSE, # Should data be normalized?
minCellNum = 5, # minimum number of cells of one type needed to create profile, exclusive
minGenes = 10, # minimum number of genes expressed in a cell, exclusive
scalingFactor = 5, # what should all values be multiplied by for final matrix
discardCellTypes = TRUE) # should cell types be filtered for types like mitotic, doublet, low quality, unknown, etc.
#> [1] "Creating Atlas"
#> [1] "1 / 6 : stromal cell"
#> [1] "2 / 6 : glial cell"
#> [1] "3 / 6 : endothelial cell"
#> [1] "4 / 6 : plasma cell"
#> [1] "5 / 6 : pericyte cell"
#> Warning in create_profile_matrix(mtx = mini_singleCell_dataset$mtx, cellAnnots = mini_singleCell_dataset$annots, :
#> pericyte cell was dropped from matrix because it didn't have enough viable cells based on current filtering thresholds.
#> If this cell type is necessary consider changing minCellNum or minGenes
#> [1] "6 / 6 : smooth muscle cell of colon"
#> Warning in create_profile_matrix(mtx = mini_singleCell_dataset$mtx, cellAnnots = mini_singleCell_dataset$annots, :
#> smooth muscle cell of colon was dropped from matrix because it didn't have enough viable cells based on current filtering thresholds.
#> If this cell type is necessary consider changing minCellNum or minGenes
head(custom_mtx)
#> stromal cell glial cell endothelial cell plasma cell
#> PLEKHN1 0.0000000 0 0.0000000 0.05787067
#> PERM1 0.1167131 0 0.0000000 0.00000000
#> C1orf159 0.0000000 0 0.0000000 0.07922668
#> TTLL10 0.0000000 0 0.1853931 0.00000000
#> TAS1R3 0.0000000 0 0.0000000 0.07039008
#> ATAD3C 0.1219237 0 0.0000000 0.07922668
heatmap(sweep(custom_mtx, 1, apply(custom_mtx, 1, max), "/"),
labRow = NA, margins = c(10, 5), cexCol = 0.7)
Now our data is ready for deconvolution. First we’ll show how to use spatialdecon under the basic settings, omitting optional bells and whistles.
res = runspatialdecon(object = nsclc,
norm_elt = "exprs_norm",
raw_elt = "exprs",
X = safeTME,
align_genes = TRUE)
str(pData(res))
#> 'data.frame': 199 obs. of 17 variables:
#> $ Sample_ID : chr "ICP20th-L11-ICPKilo-ROI01-Tumor-A02" "ICP20th-L11-ICPKilo-ROI01-TME-A03" "ICP20th-L11-ICPKilo-ROI02-Tumor-A04" "ICP20th-L11-ICPKilo-ROI02-TME-A05" ...
#> $ Tissue : chr "L11" "L11" "L11" "L11" ...
#> $ Slide.name : chr "ICPKilo" "ICPKilo" "ICPKilo" "ICPKilo" ...
#> $ ROI : chr "ROI01" "ROI01" "ROI02" "ROI02" ...
#> $ AOI.name : chr "Tumor" "TME" "Tumor" "TME" ...
#> $ AOI.annotation : chr "PanCK" "TME" "PanCK" "TME" ...
#> $ x : int 0 0 600 600 1200 1200 1800 1800 2400 2400 ...
#> $ y : num 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 ...
#> $ nuclei : int 572 733 307 697 583 484 506 706 851 552 ...
#> $ istumor : logi TRUE FALSE TRUE FALSE TRUE FALSE ...
#> $ beta : num [1:199, 1:18] 3.54 6.25 4.87 14.94 2.53 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ p : num [1:199, 1:18] 3.42e-07 3.56e-09 1.57e-08 6.66e-16 9.28e-06 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ t : num [1:199, 1:18] 5.1 5.9 5.65 8.06 4.43 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ se : num [1:199, 1:18] 0.694 1.059 0.861 1.853 0.57 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ prop_of_all : num [1:199, 1:18] 0.208 0.182 0.225 0.368 0.177 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ prop_of_nontumor: num [1:199, 1:18] 0.208 0.182 0.225 0.368 0.177 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ sigmas : num [1:199, 1:18, 1:18] 0.482 1.122 0.741 3.435 0.325 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:18] "var1" "var2" "var3" "var4" ...
#> .. ..$ : chr [1:18] "var1" "var2" "var3" "var4" ...
names(res@assayData)
#> [1] "exprs_norm" "resids" "exprs" "yhat"
We’re most interested in “beta”, the matrix of estimated cell abundances.
spatialdecon has several abilities beyond basic deconvolution:
Let’s take a look at an example cell matching object:
str(safeTME.matches)
#> List of 14
#> $ macrophages : chr "macrophages"
#> $ mast : chr "mast"
#> $ B : chr [1:2] "B.naive" "B.memory"
#> $ plasma : chr "plasma"
#> $ CD4.T.cells : chr [1:2] "T.CD4.naive" "T.CD4.memory"
#> $ CD8.T.cells : chr [1:2] "T.CD8.naive" "T.CD8.memory"
#> $ NK : chr "NK"
#> $ pDC : chr "pDCs"
#> $ mDCs : chr "mDCs"
#> $ monocytes : chr [1:2] "monocytes.C" "monocytes.NC.I"
#> $ neutrophils : chr "neutrophils"
#> $ Treg : chr "Treg"
#> $ endothelial.cells: chr "endothelial.cells"
#> $ fibroblasts : chr "fibroblasts"
Now let’s run spatialdecon:
# vector identifying pure tumor segments:
nsclc$istumor = nsclc$AOI.name == "Tumor"
# run spatialdecon with all the bells and whistles:
restils = runspatialdecon(object = nsclc,
norm_elt = "exprs_norm", # normalized data
raw_elt = "exprs", # expected background counts for every data point in norm
X = safeTME, # safeTME matrix, used by default
cellmerges = safeTME.matches, # safeTME.matches object, used by default
cell_counts = nsclc$nuclei, # nuclei counts, used to estimate total cells
is_pure_tumor = nsclc$istumor, # identities of the Tumor segments/observations
n_tumor_clusters = 5) # how many distinct tumor profiles to append to safeTME
str(pData(restils))
#> 'data.frame': 199 obs. of 21 variables:
#> $ Sample_ID : chr "ICP20th-L11-ICPKilo-ROI01-Tumor-A02" "ICP20th-L11-ICPKilo-ROI01-TME-A03" "ICP20th-L11-ICPKilo-ROI02-Tumor-A04" "ICP20th-L11-ICPKilo-ROI02-TME-A05" ...
#> $ Tissue : chr "L11" "L11" "L11" "L11" ...
#> $ Slide.name : chr "ICPKilo" "ICPKilo" "ICPKilo" "ICPKilo" ...
#> $ ROI : chr "ROI01" "ROI01" "ROI02" "ROI02" ...
#> $ AOI.name : chr "Tumor" "TME" "Tumor" "TME" ...
#> $ AOI.annotation : chr "PanCK" "TME" "PanCK" "TME" ...
#> $ x : int 0 0 600 600 1200 1200 1800 1800 2400 2400 ...
#> $ y : num 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 ...
#> $ nuclei : int 572 733 307 697 583 484 506 706 851 552 ...
#> $ istumor : logi TRUE FALSE TRUE FALSE TRUE FALSE ...
#> $ beta : num [1:199, 1:14] 0.4668 2.7947 0.9003 10.2149 0.0131 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ p : num [1:199, 1:14] 2.88e-01 7.34e-04 1.20e-01 3.84e-10 9.63e-01 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ t : num [1:199, 1:14] 1.0616 3.3765 1.5565 6.2603 0.0466 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ se : num [1:199, 1:14] 0.44 0.828 0.578 1.632 0.281 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ prop_of_all : num [1:199, 1:14] 0.2979 0.1734 0.4577 0.4794 0.0763 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ prop_of_nontumor : num [1:199, 1:14] 0.2979 0.1734 0.4577 0.4794 0.0763 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ sigma : num [1:199, 1:14, 1:14] 0.1933 0.6851 0.3346 2.6624 0.0788 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:14] "var1" "var2" "var3" "var4" ...
#> .. ..$ : chr [1:14] "var1" "var2" "var3" "var4" ...
#> $ beta.granular : num [1:199, 1:23] 0.4668 2.7947 0.9003 10.2149 0.0131 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:23] "macrophages" "mast" "B.naive" "B.memory" ...
#> $ sigma.granular : num [1:199, 1:23, 1:23] 0.1933 0.6851 0.3346 2.6624 0.0788 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. ..$ : chr [1:23] "var1" "var2" "var3" "var4" ...
#> .. ..$ : chr [1:23] "var1" "var2" "var3" "var4" ...
#> $ cell.counts :'data.frame': 199 obs. of 2 variables:
#> ..$ cells.per.100: num [1:199, 1:14] 1.1831 7.0835 2.282 25.8905 0.0332 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> ..$ cell.counts : num [1:199, 1:14] 6.767 51.922 7.006 180.457 0.193 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. .. ..$ : chr [1:14] "macrophages" "mast" "B" "plasma" ...
#> $ cell.counts.granular:'data.frame': 199 obs. of 2 variables:
#> ..$ cells.per.100: num [1:199, 1:18] 1.1831 7.0835 2.282 25.8905 0.0332 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
#> ..$ cell.counts : num [1:199, 1:18] 6.767 51.922 7.006 180.457 0.193 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:199] "ROI01Tumor" "ROI01TME" "ROI02Tumor" "ROI02TME" ...
#> .. .. ..$ : chr [1:18] "macrophages" "mast" "B.naive" "B.memory" ...
names(restils@assayData)
#> [1] "exprs_norm" "resids" "exprs" "yhat"
str(restils@experimentData@other)
#> List of 1
#> $ SpatialDeconMatrix: num [1:544, 1:23] 0.0137 0.0379 0.2697 0.2754 0.4022 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:544] "ADAM12" "ANGPTL4" "BCAT1" "BCL2A1" ...
#> .. ..$ : chr [1:23] "macrophages" "mast" "B.naive" "B.memory" ...
There are quite a few readouts here. Let’s review the important ones:
To illustrate the derivation of tumor profiles, let’s look at the cell profile matrix output by spatialdecon:
heatmap(sweep(restils@experimentData@other$SpatialDeconMatrix, 1, apply(restils@experimentData@other$SpatialDeconMatrix, 1, max), "/"),
labRow = NA, margins = c(10, 5))
Note the new tumor-specific columns.
The SpatialDecon package contains two specialized plotting functions, and a default color palette for the safeTME matrix.
The first function is “TIL_barplot”, which is just a convenient way of drawing barplots of cell type abundance.
# For reference, show the TILs color data object used by the plotting functions
# when safeTME has been used:
data("cellcols")
cellcols
#> CD4.T.cells CD8.T.cells Treg T.CD4.naive
#> "red" "firebrick" "#FF66FF" "#CC0000"
#> T.CD4.memory T.CD8.naive T.CD8.memory NK
#> "#FF0000" "#FF6633" "#FF9900" "grey10"
#> B B.naive B.memory plasma
#> "darkblue" "#000099" "#0000FF" "#3399CC"
#> pDC pDCs macrophages monocytes
#> "#00FFFF" "#00FFFF" "#006600" "#33CC00"
#> monocytes.C monocytes.NC.I mDCs neutrophils
#> "#66CC66" "#33CC00" "#00FF00" "#9966CC"
#> mast fibroblasts endothelial.cells tumor
#> "#FFFF00" "#999999" "#996633" "#333333"
# show just the TME segments, since that's where the immune cells are:
o = hclust(dist(t(restils$cell.counts$cell.counts)))$order
layout(mat = (matrix(c(1, 2), 1)), widths = c(7, 3))
TIL_barplot(t(restils$cell.counts$cell.counts[, o]), draw_legend = TRUE,
cex.names = 0.5)
# or the proportions of cells:
temp = replace(restils$prop_of_nontumor, is.na(restils$prop_of_nontumor), 0)
o = hclust(dist(temp[restils$AOI.name == "TME",]))$order
TIL_barplot(t(restils$prop_of_nontumor[restils$AOI.name == "TME",])[, o],
draw_legend = TRUE, cex.names = 0.75)
The second function is “florets”, used for plotting cell abundances atop some 2-D projection. Here, we’ll plot cell abundances atop the first 2 principal components of the data:
# PCA of the normalized data:
pc = prcomp(t(log2(pmax(nsclc@assayData$exprs_norm, 1))))$x[, c(1, 2)]
# run florets function:
par(mar = c(5,5,1,1))
layout(mat = (matrix(c(1, 2), 1)), widths = c(6, 2))
florets(x = pc[, 1], y = pc[, 2],
b = t(restils$beta), cex = .5,
xlab = "PC1", ylab = "PC2")
par(mar = c(0,0,0,0))
frame()
legend("center", fill = cellcols[colnames(restils$beta)],
legend = colnames(restils$beta), cex = 0.7)
So we can see that PC1 roughly tracks many vs. few immune cells, and PC2 tracks the relative abundance of lymphoid/myeloid populations.
The SpatialDecon library includes several helpful functions for further analysis/fine-tuning of deconvolution results.
When two cell types are too similar, the estimation of their abundances becomes unstable. However, their sum can still be estimated easily. The function “collapseCellTypes” takes a deconvolution results object and collapses any closely-related cell types you tell it to:
matching = list()
matching$myeloid = c( "macrophages", "monocytes", "mDCs")
matching$T.NK = c("CD4.T.cells","CD8.T.cells", "Treg", "NK")
matching$B = c("B")
matching$mast = c("mast")
matching$neutrophils = c("neutrophils")
matching$stroma = c("endothelial.cells", "fibroblasts")
collapsed = runCollapseCellTypes(object = restils,
matching = matching)
heatmap(t(collapsed$beta), cexRow = 0.85, cexCol = 0.75)
Sometimes a cell profile matrix will omit a cell type you know to be present in a tissue. If your data includes any regions that are purely this unmodelled cell type - e.g. because you’ve used the GeoMx platform’s segmentation capability to specifically select them - then you can infer a profile for that cell type and merge it with your cell profile matrix. The algorithm clusters all the observations you designate as purely the unmodelled cell type, and collapses those clusters into as many profiles of that cell type as you wish. For cancer cell, it may be appropriate to specify 10 or more clusters; for highly regular healthy cells, one cluster may suffice.
(Note: this functionality can also be run within the spatialdecon function, as is demonstrated further above.)
pure.tumor.ids = res$AOI.name == "Tumor"
safeTME.with.tumor = runMergeTumorIntoX(object = nsclc,
norm_elt = "exprs_norm",
pure_tumor_ids = pure.tumor.ids,
X = safeTME,
K = 3)
heatmap(sweep(safeTME.with.tumor, 1, apply(safeTME.with.tumor, 1, max), "/"),
labRow = NA, margins = c(10, 5))
Once cell type abundance has been estimated, we can flip the deconvolution around, modelling the expression data as a function of cell abundances, and thereby deriving:
The function “reversedecon” runs this model.
rdecon = runReverseDecon(object = nsclc,
norm_elt = "exprs_norm",
beta = res$beta)
str(fData(rdecon))
#> 'data.frame': 1700 obs. of 12 variables:
#> $ TargetName : chr "ABCF1" "ABL1" "ACVR1B" "ACVR1C" ...
#> $ HUGOSymbol : chr "ABCF1" "ABL1" "ACVR1B" "ACVR1C" ...
#> $ TargetGroup : chr "All Probes;Transport of small molecules" "All Probes;Cell Cycle;Signaling by Rho GTPases;DNA Repair;Factors involved in megakaryocyte development and pla"| __truncated__ "All Probes;Signaling by NODAL;Signaling by TGF-beta family members" "All Probes;Signaling by NODAL;Signaling by TGF-beta family members" ...
#> $ AnalyteType : chr "RNA" "RNA" "RNA" "RNA" ...
#> $ Codeclass : chr "Endogenous" "Endogenous" "Endogenous" "Endogenous" ...
#> $ Module : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ CorrelationToNegatives: num 0.597 0.876 0.435 0.928 0.776 ...
#> $ GlobalOutliers : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ Negative : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ coefs : num [1:1700, 1:19] 2.13 2.03 1.3 1.13 1.28 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1700] "ABCF1" "ABL1" "ACVR1B" "ACVR1C" ...
#> .. ..$ : chr [1:19] "(Intercept)" "macrophages" "mast" "B.naive" ...
#> $ cors : num 0.763 0.517 0.804 0.191 0.662 ...
#> $ resid.sd : num 0.468 0.315 0.55 0.322 0.383 ...
names(rdecon@assayData)
#> [1] "exprs_norm" "resids" "exprs" "yhat"
# look at the residuals:
heatmap(pmax(pmin(rdecon@assayData$resids, 2), -2))
# look at the two metrics of goodness-of-fit:
plot(fData(rdecon)$cors, fData(rdecon)$resid.sd, col = 0)
showgenes = c("CXCL14", "LYZ", "FOXP3")
text(fData(rdecon)$cors[!rownames(fData(rdecon)) %in% showgenes],
fData(rdecon)$resid.sd[!rownames(fData(rdecon)) %in% showgenes],
setdiff(rownames(fData(rdecon)), showgenes), cex = 0.5)
text(fData(rdecon)$cors[rownames(fData(rdecon)) %in% showgenes], fData(rdecon)$resid.sd[rownames(fData(rdecon)) %in% showgenes],
showgenes, cex = 0.75, col = 2)
From the above plot, we can see that genes like CXCL14 vary independently of cell mixing, genes like LYZ are correlated with cell mixing but still have variable expression, and genes like FOXP3 serve as nothing but obtuse readouts of cell mixing.
sessionInfo()
#> R version 4.2.0 RC (2022-04-19 r82224)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] GeomxTools_3.0.0 NanoStringNCTools_1.4.0 ggplot2_3.3.5
#> [4] S4Vectors_0.34.0 Biobase_2.56.0 BiocGenerics_0.42.0
#> [7] SeuratObject_4.0.4 SpatialDecon_1.6.0
#>
#> loaded via a namespace (and not attached):
#> [1] nlme_3.1-157 bitops_1.0-7 EnvStats_2.7.0
#> [4] RColorBrewer_1.1-3 httr_1.4.2 GenomeInfoDb_1.32.0
#> [7] R.cache_0.15.0 numDeriv_2016.8-1.1 tools_4.2.0
#> [10] bslib_0.3.1 utf8_1.2.2 R6_2.5.1
#> [13] vipor_0.4.5 DBI_1.1.2 colorspace_2.0-3
#> [16] withr_2.5.0 tidyselect_1.1.2 GGally_2.1.2
#> [19] repmis_0.5 curl_4.3.2 compiler_4.2.0
#> [22] cli_3.3.0 sass_0.4.1 scales_1.2.0
#> [25] systemfonts_1.0.4 stringr_1.4.0 digest_0.6.29
#> [28] minqa_1.2.4 rmarkdown_2.14 R.utils_2.11.0
#> [31] XVector_0.36.0 pkgconfig_2.0.3 htmltools_0.5.2
#> [34] lme4_1.1-29 fastmap_1.1.0 highr_0.9
#> [37] htmlwidgets_1.5.4 rlang_1.0.2 ggthemes_4.2.4
#> [40] readxl_1.4.0 jquerylib_0.1.4 generics_0.1.2
#> [43] jsonlite_1.8.0 dplyr_1.0.8 R.oo_1.24.0
#> [46] RCurl_1.98-1.6 magrittr_2.0.3 GenomeInfoDbData_1.2.8
#> [49] Matrix_1.4-1 Rcpp_1.0.8.3 ggbeeswarm_0.6.0
#> [52] munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.1
#> [55] R.methodsS3_1.8.1 stringi_1.7.6 yaml_2.3.5
#> [58] MASS_7.3-57 zlibbioc_1.42.0 plyr_1.8.7
#> [61] grid_4.2.0 parallel_4.2.0 crayon_1.5.1
#> [64] lattice_0.20-45 Biostrings_2.64.0 splines_4.2.0
#> [67] knitr_1.38 pillar_1.7.0 uuid_1.1-0
#> [70] boot_1.3-28 rjson_0.2.21 reshape2_1.4.4
#> [73] glue_1.6.2 ggiraph_0.8.2 evaluate_0.15
#> [76] data.table_1.14.2 vctrs_0.4.1 nloptr_2.0.0
#> [79] cellranger_1.1.0 gtable_0.3.0 purrr_0.3.4
#> [82] reshape_0.8.9 assertthat_0.2.1 xfun_0.30
#> [85] tibble_3.1.6 pheatmap_1.0.12 lmerTest_3.1-3
#> [88] beeswarm_0.4.0 IRanges_2.30.0 ellipsis_0.3.2