--- title: "Quick start of CytoTree" author: "Yuting Dai" date: "`r Sys.Date()`" output: prettydoc::html_pretty: highlight: github theme: cayman toc: yes pdf_document: toc: yes html_document: df_print: paged toc: yes package: CytoTree vignette: | %\VignetteIndexEntry{Quick_start} \usepackage[utf8]{inputenc} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = TRUE} knitr::opts_chunk$set(echo = TRUE, cache = FALSE, eval = TRUE, warning = TRUE, message = TRUE, fig.width = 6, fig.height = 5) ``` ## Introduction Although multidimensional single-cell-based flow and mass cytometry have been increasingly applied to microenvironmental composition and stem-cell research, integrated analysis workflows to facilitate the interpretation of experimental cytometry data remain underdeveloped. We present CytoTree, a comprehensive R package designed for the analysis and interpretation of flow and mass cytometry data. We applied CytoTree to mass cytometry and time-course flow cytometry data to demonstrate the usage and practical utility of its computational modules. CytoTree is a reliable tool for multidimensional cytometry data workflows and produces compelling results for trajectory construction and pseudotime estimation. **See the detailed tutorial of CytoTree, please visit [Tutorial of CytoTree](https://ytdai.github.io/CytoTree/index.html).** ## Overview of Workflow The `CytoTree` package is developed to complete the majority of standard analysis and visualization workflow for FCS data. In CytoTree workflow, an S4 object in R is built to implement the statistical and computational approach, and all computational functionalities are integrated into one single channel which only requires a specified input data format. Computational functionalities of `CytoTree` can be divided into four main parts (Fig. 2.1): preprocessing, trajectory, analysis and visualization. - **Preprocessing**. Data import, compensation, quality control, filtration, normalization and merge cells from different samples can be implemented in the preprocessing part. After preprocessing, a matrix that contains clean cytometric signaling data is required to build a CYT object. There are other optional data recommended to build the CYT object, including a data frame containing meta-information of the experiment and a vector contains all markers enrolled in the computational process. - **Trajectory**. Cells built in the CYT object are classified into different clusters based on the expression level of input markers. You can choose different clustering methods by inputting different parameters. After clustering, cells are downsampled in a cluster-dependent fashion to reduce the total cell size and avoid small cluster deletion. Dimensionality reduction for both cells and clusters are also implemented in the clustering procedure. After dimensionality reduction, we use Minimus Spanning Tree (MST) to construct cell trajectory. - **Analysis**. This part is designed for time course FCS data. Before running pseudotime, root cells must be defined first based on users' prior knowledge. Root cells in `CytoTree` workflow are the initial cells of the trajectory tree. So it can be set using one vertex node of the tree or a cluster of cells with specific antibodies combination. Intermediate state evaluation is also involved in the pseudotime part. Leaf cells are defined by the end node of the trajectory or the end stage of the experiment. Intermediate state cells are cells with higher betweenness in the graph built on cell-cell connection, which plays an important role between the connection of root cells and leaf cells. - **Visualization**. The visualization part can provide clear and concise visualization of FCS data in an effective and easy-to-comprehend manner. `CytoTree` package offers various plotting functions to generate customizable and publication-quality plots. A two-dimensional or three-dimensional plot can fit most requirements from dimensionality reduction results. And tree-based plot can visualize cell trajectory as a force-directed layout tree. Other special plots such as heatmap and violin plot are also provided in `CytoTree`.
Figure 1 Workflow of CytoTree
## Installation ### GitHub This requires the `devtools` package to be pre-installed first. ``` {r eval = FALSE} # If not already installed install.packages("devtools") devtools::install_github("JhuangLab/CytoTree") library(CytoTree) ``` The link of `CytoTree` on GitHub can be visited at https://github.com/JhuangLab/CytoTree. ### Bioconductor This requires the `BiocManager` package to be pre-installed first, and make sure the version of Bioc is 3.12. To install this package, start R (version "4.0") and enter: ``` {r eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("CytoTree") ``` The link of `CytoTree` on Bioconductor can be visited at https://bioconductor.org/packages/CytoTree/. ## Quick-start code To run `CytoTree`, the first step is to build a CYT object. Here are the main functions in `CytoTree`. This figure describes the available functionalities: preprocessing, trajectory, analysis, visualization, and set operations. A short description (black font) and the corresponding function (blue font) are provided for each function. The CytoTree workflow begins with the reading of the FCS data. Compensation, filtration, concatenation, and normalization are included in the preprocessing part. A clean matrix after preprocessing is required to build a CYT object, and the analysis workflows of all other functionalities are all based on the CYT object. The trajectory module contains functions used to perform clustering and dimensionality reduction for cells. The analysis module is based on calculation results from the trajectory part. The visualization part includes functions to generate publication-quality plots from the CYT object. The set operations part includes a function for subsetting a CYT object based on user-defined cells or fetching meta information for clusters and cells during the analysis. ``` {r eval = TRUE} # Loading packages suppressMessages({ library(CytoTree) }) # Read fcs files fcs.path <- system.file("extdata", package = "CytoTree") fcs.files <- list.files(fcs.path, pattern = '.FCS$', full = TRUE) # Using runExprsMerge for multip FCS files # Or using runExprsExtract for one single FCS file fcs.data <- runExprsMerge(fcs.files, comp = FALSE, transformMethod = "none") # Build the CYT object cyt <- createCYT(raw.data = fcs.data, normalization.method = "log") # See information cyt ################################################ ##### Running CytoTree in one line code ################################################ # Run without dimensionality reduction steps # Run CytoTree as pipeline and visualize as tree cyt <- cyt %>% runCluster() %>% processingCluster() %>% buildTree() plotTree(cyt) # Or you can run with dimensionality reduction steps # Run CytoTree as pipeline and visualize as tree cyt <- cyt %>% runCluster() %>% processingCluster() %>% runFastPCA() %>% runTSNE() %>% runDiffusionMap() %>% runUMAP() %>% buildTree() plot2D(cyt, item.use = c("UMAP_1", "UMAP_2")) ``` Here we provied the running template of trajectory inference using CYT object is as follows: ``` {r eval = TRUE} # Cluster cells by SOM algorithm set.seed(1) cyt <- runCluster(cyt) # Processing Clusters cyt <- processingCluster(cyt) # This is an optional step # run Principal Component Analysis (PCA) cyt <- runFastPCA(cyt) # This is an optional step # run t-Distributed Stochastic Neighbor Embedding (tSNE) cyt <- runTSNE(cyt) # This is an optional step # run Diffusion map cyt <- runDiffusionMap(cyt) # This is an optional step # run Uniform Manifold Approximation and Projection (UMAP) cyt <- runUMAP(cyt) # build minimum spanning tree cyt <- buildTree(cyt) # DEGs of different branch diff.list <- runDiff(cyt) # define root cells cyt <- defRootCells(cyt, root.cells = c(32,26)) # run pseudotime cyt <- runPseudotime(cyt) # define leaf cells cyt <- defLeafCells(cyt, leaf.cells = c(30)) # run walk between root cells and leaf cells cyt <- runWalk(cyt) # Save object if (FALSE) { saveRDS(cyt, file = "Path to you output directory") } ``` ## Visualization The running template of visualization is as follows: ```{r eval = TRUE, out.width='50%', fig.asp=.75, fig.align='center'} # Plot 2D tSNE. And cells are colored by cluster id plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "cluster.id", alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE) # Plot 2D UMAP. And cells are colored by cluster id plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "cluster.id", alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE) # Plot 2D tSNE. And cells are colored by cluster id plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "branch.id", alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE) # Plot 2D UMAP. And cells are colored by cluster id plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "branch.id", alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE) # Plot 2D tSNE. And cells are colored by stage plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "stage", alpha = 1, main = "UMAP", category = "categorical") # Plot 2D UMAP. And cells are colored by stage plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "stage", alpha = 1, main = "UMAP", category = "categorical") # Tree plot plotTree(cyt, color.by = "D0.percent", show.node.name = TRUE, cex.size = 1) # Tree plot plotTree(cyt, color.by = "FITC-A", show.node.name = TRUE, cex.size = 1) # plot clusters plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric", size = 100, color.by = "BV510-A") # plot pie tree plotPieTree(cyt, cex.size = 3, size.by.cell.number = TRUE) # plot pie cluster plotPieCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), cex.size = 40) # plot heatmap of clusters plotClusterHeatmap(cyt) # plot heatmap of branches plotBranchHeatmap(cyt) # Violin plot plotViolin(cyt, color.by = "cluster.id", marker = "BV510-A", text.angle = 90) # Violin plot plotViolin(cyt, color.by = "branch.id", marker = "BV510-A", text.angle = 90) # UMAP plot colored by pseudotime plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), category = "numeric", size = 1, color.by = "pseudotime") # tSNE plot colored by pseudotime plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric", size = 1, color.by = "pseudotime") # denisty plot by different stage plotPseudotimeDensity(cyt, adjust = 1) # Tree plot plotTree(cyt, color.by = "pseudotime", cex.size = 1.5) # Violin plot plotViolin(cyt, color.by = "cluster.id", order.by = "pseudotime", marker = "BV650-A", text.angle = 90) # trajectory value plotPseudotimeTraj(cyt, var.cols = TRUE) # Heatmap plot plotHeatmap(cyt, downsize = 1000, cluster_rows = TRUE, clustering_method = "ward.D", color = colorRampPalette(c("#00599F","#EEEEEE","#FF3222"))(100)) # plot cluster plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "traj.value.log", size = 10, show.cluser.id = TRUE, category = "numeric") ``` ## References 1. Hahne F, Arlt D, Sauermann M, Majety M, Poustka A, Wiemann S, Huber W: Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts. Genome Biol 2006, 7:R77. 2. Olsen LR, Leipold MD, Pedersen CB, Maecker HT: The anatomy of single cell mass cytometry data. Cytometry A 2019, 95:156-172. 3. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018, 36:411-420. 4. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL: The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014, 32:381-386. 5. Kiselev VY, Yiu A, Hemberg M: scmap: projection of single-cell RNA-seq data across data sets. Nat Methods 2018, 15:359-362. 6. Amir el AD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, Shenfeld DK, Krishnaswamy S, Nolan GP, Pe'er D: viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol 2013, 31:545-552. 7. Haghverdi L, Buettner F, Theis FJ: Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 2015, 31:2989-2998. 8. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW: Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 2018. 9. Wang L, Hoffman RA: Standardization, Calibration, and Control in Flow Cytometry. Curr Protoc Cytom 2017, 79:1 3 1-1 3 27. 10. Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R: flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 2009, 10:106. 11. Sarkar D, Le Meur N, Gentleman R: Using flowViz to visualize flow cytometry data. Bioinformatics 2008, 24:878-879. 12. Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T, Saeys Y: FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 2015, 87:636-645. 13. Qiu P, Simonds EF, Bendall SC, Gibbs KD, Jr., Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK: Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol 2011, 29:886-891. 14. Chen H, Lau MC, Wong MT, Newell EW, Poidinger M, Chen J: Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline. PLoS Comput Biol 2016, 12:e1005112. 15. Chattopadhyay PK, Winters AF, Lomas WE, 3rd, Laino AS, Woods DM: High-Parameter Single-Cell Analysis. Annu Rev Anal Chem (Palo Alto Calif) 2019, 12:411-430. 16. Bendall SC, Davis KL, Amir el AD, Tadmor MD, Simonds EF, Chen TJ, Shenfeld DK, Nolan GP, Pe'er D: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 2014, 157:714-725. 17. Nowicka M, Krieg C, Crowell HL, Weber LM, Hartmann FJ, Guglietta S, Becher B, Levesque MP, Robinson MD: CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Res 2017, 6:748.