--- title: detecting relevant genes with destiny 3 output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{detecting relevant genes with destiny 3} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Single Cell RNA-Sequencing data and gene relevance ================================================== Libraries --------- We need of course destiny, scran for preprocessing, and some tidyverse niceties. ```{r} library(conflicted) library(destiny) suppressPackageStartupMessages(library(scran)) library(purrr) library(ggplot2) library(SingleCellExperiment) ``` Data ---- Let’s use data from the `scRNAseq`[1] package. If necessary, install it via `BiocManager::install('scRNAseq')`. [1] Risso D, Cole M (2019). [scRNAseq: A Collection of Public Single-Cell RNA-Seq Datasets](https://bioconductor.org/packages/scRNAseq/). ```{r} # The parts of the help we’re interested in help('scRNAseq-package', package = 'scRNAseq') %>% repr::repr_html() %>% stringr::str_extract_all(stringr::regex('
The dataset.*?
', dotall = TRUE)) %>% unlist() %>% paste(collapse = '') %>% knitr::raw_html() ``` 379 cells seems sufficient to see something! ```{r} allen <- scRNAseq::ReprocessedAllenData() ``` Preprocessing ------------- We’ll mostly stick to the [scran vignette][] here. Let’s add basic information to the data and choose what to work with. As `scran` expects the raw counts in the `counts` assay, we rename the more accurate RSEM counts to `counts`. Our data has ERCC spike-ins in an `altExp` slot: [scran vignette]: https://bioconductor.org/packages/devel/bioc/vignettes/scran/inst/doc/scran.html ```{r} rowData(allen)$Symbol <- rownames(allen) rowData(allen)$EntrezID <- AnnotationDbi::mapIds(org.Mm.eg.db::org.Mm.eg.db, rownames(allen), 'ENTREZID', 'ALIAS') rowData(allen)$Uniprot <- AnnotationDbi::mapIds(org.Mm.eg.db::org.Mm.eg.db, rownames(allen), 'UNIPROT', 'ALIAS', multiVals = 'list') assayNames(allen)[assayNames(allen) == 'rsem_counts'] <- 'counts' assayNames(altExp(allen, 'ERCC'))[assayNames(altExp(allen, 'ERCC')) == 'rsem_counts'] <- 'counts' allen ``` Now we can use it to renormalize the data. We normalize the `counts` using the spike-in size factors and logarithmize them into `logcounts`. ```{r} allen <- computeSpikeFactors(allen, 'ERCC') allen <- logNormCounts(allen) allen ``` We also use the spike-ins to detect highly variable genes more accurately: ```{r} decomp <- modelGeneVarWithSpikes(allen, 'ERCC') rowData(allen)$hvg_order <- order(decomp$bio, decreasing = TRUE) ``` We create a subset of the data containing only rasonably highly variable genes: ```{r} allen_hvg <- subset(allen, hvg_order <= 5000L) ``` Let’s create a Diffusion map. For rapid results, people often create a PCA first, which can be stored in your `SingleCellExperiment` before creating the Diffusion map or simply created implicitly using `DiffusionMap(..., n_pcs =