--- title: "Interoperability with SingleCellExperiment" author: "Givanna Putri" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Using SuperCellCyto with Single-Cell Based Objects} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r global_options, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction This vignette demonstrates how to integrate SuperCellCyto results with cytometry data stored in SingleCellExperiment (SCE) objects, and how to analyse SuperCellCyto output using Bioconductor packages that take SCE objects as input. We use a subsampled Levine_32dim dataset stored in an SCE object to illustrate how to create supercells and conduct downstream analyses. ```{r setup, message=FALSE, warning=FALSE} library(SuperCellCyto) library(qs2) library(scran) library(BiocSingular) library(scater) library(bluster) library(data.table) ``` ## Preparing SCE object We first load the subsampled Levine_32dim data, stored as a [qs2](https://cran.r-project.org/web/packages/qs2/index.html) using the `qs_read` function. ```{r load_sce_obj} sce <- qs_read( system.file( "extdata", "Levine_32dim_sce_sub.qs2", package = "SuperCellCyto" ) ) sce ``` The data is stored in the `counts` assay. We will subset it to include only markers we need to perform downstream analysis, transform it using arcsinh transformation, and store the transformed data in the `logcounts` assay. ```{r subset_and_transform} markers <- c( "CD45RA", "CD133", "CD19", "CD22", "CD11b", "CD4", "CD8", "CD34", "Flt3", "CD20", "CXCR4", "CD235ab", "CD45", "CD123", "CD321", "CD14", "CD33", "CD47", "CD11c", "CD7", "CD15", "CD16", "CD44", "CD38", "CD13", "CD3", "CD61", "CD117", "CD49d", "HLA-DR", "CD64", "CD41" ) # keep only the relevant markers sce <- sce[markers, ] # to store arcsinh transformed data exprs(sce) <- asinh(counts(sce) / 5) sce ``` ## Run SuperCellCyto SuperCellCyto requires input data in a `data.table` format. Therefore, we need to extract the arcsinh-transformed data into a `data.table` object, and add the sample information and IDs of the cells. Note that SCE typically stores cells as columns and features as rows. SuperCellCyto, conversely, requires cells as rows and features as columns, a format typical for cytometry data where we typically have more cells than features. Hence, we will transpose the extracted data accordingly when creating the `data.table` object. ```{r extract_dt_and_run_supercellcyto} dt <- data.table(t(exprs(sce))) dt$sample <- colData(sce)$sample dt$cell_id <- colnames(sce) supercells <- runSuperCellCyto( dt = dt, markers = markers, sample_colname = "sample", cell_id_colname = "cell_id", gam = 5 ) head(supercells$supercell_expression_matrix) ``` We can now embed the supercell ID in the `colData` of our SCE object. ```{r add_supercell_id_to_coldata} colData(sce)$supercell_id <- factor(supercells$supercell_cell_map$SuperCellID) head(colData(sce)) ``` ## Analyse Supercells as SCE object As the number of supercells is less than the number of cells in our SCE object, we store the supercell expression matrix as a separate SCE object. This then allows us to use Bioconductor packages to analyse our supercells. ```{r create_supercell_sce} supercell_sce <- SingleCellExperiment( list(logcounts = t( supercells$supercell_expression_matrix[, markers, with = FALSE] )), colData = DataFrame( SuperCellId = supercells$supercell_expression_matrix$SuperCellId, sample = supercells$supercell_expression_matrix$sample ) ) colnames(supercell_sce) <- colData(supercell_sce)$SuperCellId supercell_sce ``` The code above essentially transpose the supercell expression matrix, making supercells columns and markers rows, and store it in the `logcounts` assay of our new SCE object. We also populate the `colData` with SuperCellId and the sample name for each supercell. With the supercell expression matrix now in an SCE format, we can perform downstream analyses such as clustering and and drawing UMAP plots using Bioconductor packages. ```{r cluster_and_umap_supercells} set.seed(42) supercell_sce <- fixedPCA( supercell_sce, rank = 10, subset.row = NULL, BSPARAM = RandomParam() ) supercell_sce <- runUMAP(supercell_sce, dimred = "PCA") clusters <- clusterCells( supercell_sce, use.dimred = "PCA", BLUSPARAM = SNNGraphParam(cluster.fun = "leiden") ) colLabels(supercell_sce) <- clusters plotReducedDim(supercell_sce, dimred = "UMAP", colour_by = "label") ``` Any functions which operate on SCE object should work. For example, we can use `plotExpression` in `scater` package to create violin plots of the markers against clusters. Note, the y-axis says "logcounts", but the data is actually arcsinh transformed, not log transformed. ```{r plot_marker_expression_supercells} plotExpression( supercell_sce, c("CD4", "CD8", "CD19", "CD34", "CD11b"), x = "label", colour_by = "sample" ) ``` ## Transfer information from supercells SCE object to single cell SCE object To transfer analysis results (e.g., clusters) from the supercell SCE object back to the single-cell SCE object, we need to do some data wrangling. It is vital to ensure that the order of the analysis results (e.g., clusters) aligns with the cell order in the SCE object. Using the cluster information as an example, we need to first extract the `colData` of the SCE objects into two separate `data.table` objects. We then use `merge.data.table` to match and merge them using the supercell ID as the common identifiers. Make sure you set the `sort` parameter to FALSE and set `x` to the `colData` of your single cell SCE object. This ensures that the order of the resulting `data.table` aligns with the order of the `colData` of our single-cell SCE object. ```{r transfer_cluster_to_singlecell} cell_id_sce <- data.table(as.data.frame(colData(sce))) supercell_cluster <- data.table(as.data.frame(colData(supercell_sce))) cell_id_sce_with_clusters <- merge.data.table( x = cell_id_sce, y = supercell_cluster, by.x = "supercell_id", by.y = "SuperCellId", sort = FALSE ) ``` Finally, we can then add the cluster assignment as a column in the `colData` of our single-cell SCE object. ```{r add_cluster_to_coldata} colData(sce)$cluster <- cell_id_sce_with_clusters$label ``` Visualise them as UMAP plot. ```{r umap_singlecell_colored_by_cluster} sce <- fixedPCA(sce, rank = 10, subset.row = NULL, BSPARAM = RandomParam()) sce <- runUMAP(sce, dimred = "PCA") plotReducedDim(sce, dimred = "UMAP", colour_by = "cluster") ``` Or violin plot to see the distribution of their marker expressions. Note, the y-axis says "logcounts", but the data is actually arcsinh transformed, not log transformed. ```{r plot_marker_expression_singlecell} plotExpression( sce, c("CD4", "CD8", "CD19", "CD34", "CD11b"), x = "cluster", colour_by = "sample" ) ``` ## Session information ```{r session_info} sessionInfo() ```