1. Introduction
Single-cell RNA sequencing (scRNA-seq) and Spatial RNA sequencing are widely used techniques for profiling gene expression in individual cells with their locations in the histological sections. These allow molecular biology to be studied at a resolution that cannot be matched by bulk sequencing of cell populations. To better visualize the result of reduction, spatial gene expression pattern in single cell or spatial experiment data, ggsc
provides some layer functions based on the ggplot2
grammar. It can work with the SingleCellExperiment
class or Seurat
class, which are the widely used classes for storing data from single cell experiment.
2. Installation
To install ggsc
package, please enter the following codes in R:
3. The data pre-processing
Here we use an example data from a single sample (sample 151673) of human brain dorsolateral prefrontal cortex (DLPFC) in the human brain, measured using the 10x Genomics Visium platform. First, a brief/standard data pre-processing were done with the scater
and scran
packages.
library(BiocParallel)
library(STexampleData)
library(scater)
library(scran)
library(ggplot2)
# create ExperimentHub instance
eh <- ExperimentHub()
# query STexampleData datasets
myfiles <- query(eh, "STexampleData")
spe <- myfiles[["EH7538"]]
spe <- addPerCellQC(spe, subsets=list(Mito=grep("^MT-", rowData(spe)$gene_name)))
colData(spe) |> head()
## DataFrame with 6 rows and 13 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
## array_col ground_truth cell_count sum detected
## <integer> <character> <integer> <numeric> <numeric>
## AAACAACGAATAGTTC-1 16 NA NA 622 526
## AAACAAGTATCTCCCA-1 102 Layer3 6 8458 3586
## AAACAATCTACTAGCA-1 43 Layer1 16 1667 1150
## AAACACCAATAACTGC-1 19 WM 5 3769 1960
## AAACAGAGCGACTCCT-1 94 Layer3 2 5433 2424
## AAACAGCTTTCAGAAG-1 9 Layer5 4 4278 2264
## subsets_Mito_sum subsets_Mito_detected subsets_Mito_percent
## <numeric> <numeric> <numeric>
## AAACAACGAATAGTTC-1 37 9 5.94855
## AAACAAGTATCTCCCA-1 1407 13 16.63514
## AAACAATCTACTAGCA-1 204 11 12.23755
## AAACACCAATAACTGC-1 430 13 11.40886
## AAACAGAGCGACTCCT-1 1316 13 24.22234
## AAACAGCTTTCAGAAG-1 651 12 15.21739
## total
## <numeric>
## AAACAACGAATAGTTC-1 622
## AAACAAGTATCTCCCA-1 8458
## AAACAATCTACTAGCA-1 1667
## AAACACCAATAACTGC-1 3769
## AAACAGAGCGACTCCT-1 5433
## AAACAGCTTTCAGAAG-1 4278
colData(spe) |> data.frame() |>
ggplot(aes(x = sum, y = detected, colour = as.factor(in_tissue))) +
geom_point()
plotColData(spe, x='sum', y = 'subsets_Mito_percent', other_fields="in_tissue") + facet_wrap(~in_tissue)
Firstly, we filter the data to retain the cells that are in the tissue. Then cell-specific biases are normalized using the computeSumFactors
method.
spe <- spe[, spe$in_tissue == 1]
clusters <- quickCluster(
spe,
BPPARAM = BiocParallel::MulticoreParam(workers=2),
block.BPPARAM = BiocParallel::MulticoreParam(workers=2)
)
spe <- computeSumFactors(spe, clusters = clusters, BPPARAM = BiocParallel::MulticoreParam(workers=2))
spe <- logNormCounts(spe)
Next, we use the Graph-based clustering method to do the reduction with the runPCA
and runTSNE
functions provided in the scater
package.
# identify genes that drive biological heterogeneity in the data set by
# modelling the per-gene variance
dec <- modelGeneVar(spe)
# Get the top 15% genes.
top.hvgs <- getTopHVGs(dec, prop=0.15)
spe <- runPCA(spe, subset_row=top.hvgs)
output <- getClusteredPCs(reducedDim(spe), BPPARAM = BiocParallel::MulticoreParam(workers=2))
npcs <- metadata(output)$chosen
npcs
## [1] 13
reducedDim(spe, "PCAsub") <- reducedDim(spe, "PCA")[,1:npcs,drop=FALSE]
g <- buildSNNGraph(spe, use.dimred="PCAsub", BPPARAM = MulticoreParam(workers=2))
cluster <- igraph::cluster_walktrap(g)$membership
colLabels(spe) <- factor(cluster)
set.seed(123)
spe <- runTSNE(spe, dimred="PCAsub", BPPARAM = MulticoreParam(workers=2))
Dimensional reduction plot
Here, we used the sc_dim
function provided in the ggsc
package to visualize the TSNE
reduction result. Unlike other packages, ggsc
implemented the ggplot2
graphic of grammar syntax and visual elements are overlaid through the combinations of graphic layers. The sc_dim_geom_label
layer is designed to add cell cluster labels to a dimensional reduction plot, and can utilized different implementation of text geoms, such as geom_shadowtext
in the shadowtext
package and geom_text
in the ggplot2
package (default) through the geom
argument.
Visualize ‘features’ on a dimensional reduction plot
To visualize the gene expression of cells in the result of reduction, ggsc
provides sc_feature
function to highlight on a dimensional reduction plot.
genes <- c('MOBP', 'PCP4', 'SNAP25', 'HBB', 'IGKC', 'NPY')
target.features <- rownames(spe)[match(genes, rowData(spe)$gene_name)]
sc_feature(spe, target.features[1], slot='logcounts', reduction = 'TSNE')
In addition, it provides sc_dim_geom_feature
layer working with sc_dim
function to visualize the cells expressed the gene and the cell clusters information simultaneously.
sc_dim(spe, slot='logcounts', reduction = 'TSNE') +
sc_dim_geom_feature(spe, target.features[1], color='black')
sc_dim(spe, alpha=.3, slot='logcounts', reduction = 'TSNE') +
ggnewscale::new_scale_color() +
sc_dim_geom_feature(spe, target.features, mapping=aes(color=features)) +
scale_color_viridis_d()
It also provides sc_dim_geom_ellipse
to add confidence levels of the the cluster result, and sc_dim_geom_sub
to select and highlight a specific cluster of cells.
Violin plot of gene expression
ggsc
provides sc_violin
to visualize the expression information of specific genes using the violin layer with common legend, the genes can be compared more intuitively.
Spatial features
To visualize the spatial pattern of gene, ggsc
provides sc_spatial
to visualize specific features/genes with image information.
Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] aplot_0.2.2 ggsc_1.0.2
## [3] Matrix_1.6-1.1 scran_1.30.0
## [5] scater_1.30.0 ggplot2_3.4.4
## [7] scuttle_1.12.0 STexampleData_1.10.0
## [9] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0
## [11] SummarizedExperiment_1.32.0 Biobase_2.62.0
## [13] GenomicRanges_1.54.1 GenomeInfoDb_1.38.0
## [15] IRanges_2.36.0 S4Vectors_0.40.1
## [17] MatrixGenerics_1.14.0 matrixStats_1.0.0
## [19] ExperimentHub_2.10.0 AnnotationHub_3.10.0
## [21] BiocFileCache_2.10.1 dbplyr_2.4.0
## [23] BiocGenerics_0.48.0 BiocParallel_1.36.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.3 spatstat.sparse_3.0-3
## [3] bitops_1.0-7 httr_1.4.7
## [5] RColorBrewer_1.1-3 tools_4.3.1
## [7] sctransform_0.4.1 utf8_1.2.4
## [9] R6_2.5.1 lazyeval_0.2.2
## [11] uwot_0.1.16 withr_2.5.2
## [13] sp_2.1-1 gridExtra_2.3
## [15] progressr_0.14.0 cli_3.6.1
## [17] spatstat.explore_3.2-5 labeling_0.4.3
## [19] sass_0.4.7 Seurat_4.4.0
## [21] spatstat.data_3.0-3 ggridges_0.5.4
## [23] pbapply_1.7-2 yulab.utils_0.1.0
## [25] parallelly_1.36.0 limma_3.58.1
## [27] RSQLite_2.3.2 gridGraphics_0.5-1
## [29] generics_0.1.3 ica_1.0-3
## [31] spatstat.random_3.2-1 dplyr_1.1.3
## [33] ggbeeswarm_0.7.2 fansi_1.0.5
## [35] abind_1.4-5 lifecycle_1.0.3
## [37] yaml_2.3.7 edgeR_4.0.1
## [39] SparseArray_1.2.0 Rtsne_0.16
## [41] grid_4.3.1 blob_1.2.4
## [43] promises_1.2.1 dqrng_0.3.1
## [45] crayon_1.5.2 miniUI_0.1.1.1
## [47] lattice_0.22-5 beachmat_2.18.0
## [49] cowplot_1.1.1 KEGGREST_1.42.0
## [51] magick_2.8.1 pillar_1.9.0
## [53] knitr_1.45 metapod_1.10.0
## [55] rjson_0.2.21 future.apply_1.11.0
## [57] codetools_0.2-19 leiden_0.4.3
## [59] glue_1.6.2 ggfun_0.1.3
## [61] data.table_1.14.8 vctrs_0.6.4
## [63] png_0.1-8 spam_2.10-0
## [65] gtable_0.3.4 cachem_1.0.8
## [67] xfun_0.40 S4Arrays_1.2.0
## [69] mime_0.12 survival_3.5-7
## [71] statmod_1.5.0 bluster_1.12.0
## [73] interactiveDisplayBase_1.40.0 ellipsis_0.3.2
## [75] fitdistrplus_1.1-11 ROCR_1.0-11
## [77] nlme_3.1-163 bit64_4.0.5
## [79] filelock_1.0.2 RcppAnnoy_0.0.21
## [81] bslib_0.5.1 irlba_2.3.5.1
## [83] vipor_0.4.5 KernSmooth_2.23-22
## [85] colorspace_2.1-0 DBI_1.1.3
## [87] tidyselect_1.2.0 bit_4.0.5
## [89] compiler_4.3.1 curl_5.1.0
## [91] BiocNeighbors_1.20.0 DelayedArray_0.28.0
## [93] plotly_4.10.3 shadowtext_0.1.2
## [95] scales_1.2.1 lmtest_0.9-40
## [97] rappdirs_0.3.3 stringr_1.5.0
## [99] digest_0.6.33 goftest_1.2-3
## [101] spatstat.utils_3.0-4 rmarkdown_2.25
## [103] XVector_0.42.0 htmltools_0.5.6.1
## [105] pkgconfig_2.0.3 sparseMatrixStats_1.14.0
## [107] highr_0.10 fastmap_1.1.1
## [109] rlang_1.1.1 htmlwidgets_1.6.2
## [111] shiny_1.7.5.1 prettydoc_0.4.1
## [113] DelayedMatrixStats_1.24.0 farver_2.1.1
## [115] jquerylib_0.1.4 zoo_1.8-12
## [117] jsonlite_1.8.7 BiocSingular_1.18.0
## [119] RCurl_1.98-1.12 magrittr_2.0.3
## [121] GenomeInfoDbData_1.2.11 ggplotify_0.1.2
## [123] dotCall64_1.1-0 patchwork_1.1.3
## [125] munsell_0.5.0 Rcpp_1.0.11
## [127] ggnewscale_0.4.9 viridis_0.6.4
## [129] reticulate_1.34.0 stringi_1.7.12
## [131] zlibbioc_1.48.0 MASS_7.3-60
## [133] plyr_1.8.9 parallel_4.3.1
## [135] listenv_0.9.0 ggrepel_0.9.4
## [137] deldir_1.0-9 Biostrings_2.70.1
## [139] splines_4.3.1 tensor_1.5
## [141] locfit_1.5-9.8 igraph_1.5.1
## [143] spatstat.geom_3.2-7 reshape2_1.4.4
## [145] ScaledMatrix_1.10.0 BiocVersion_3.18.0
## [147] evaluate_0.22 SeuratObject_5.0.0
## [149] RcppParallel_5.1.7 BiocManager_1.30.22
## [151] tweenr_2.0.2 httpuv_1.6.12
## [153] RANN_2.6.1 tidyr_1.3.0
## [155] purrr_1.0.2 polyclip_1.10-6
## [157] future_1.33.0 scattermore_1.2
## [159] ggforce_0.4.1 rsvd_1.0.5
## [161] xtable_1.8-4 tidydr_0.0.5
## [163] later_1.3.1 viridisLite_0.4.2
## [165] tibble_3.2.1 memoise_2.0.1
## [167] beeswarm_0.4.0 AnnotationDbi_1.64.0
## [169] cluster_2.1.4 globals_0.16.2