--- title: "carnation - airway tutorial" author: "Apratim Mitra" date: "`r format(Sys.Date(), '%m/%d/%Y')`" abstract: > `carnation` is an interactive Shiny dashboard that simplifies and transforms complex bulk RNA-Seq data using insightful visualizations and numerous interactive features. Designed for both computational and experimental biologists, `carnation` makes exploring differential expression analysis, functional enrichment, and pattern analysis intuitive and accessible, while providing a platform to manage multiple complex RNA-Seq datasets either locally or on a server to share with collaborators (package version: `r packageVersion("carnation")`). output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{carnation - airway tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r opts, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Now, we will use the `airway` dataset to explore some of `carnation's` functionality. ## Install carnation Install carnation from Bioconductor using `BiocManager::install`. ```{r pkg_install, eval=FALSE} # first check to see if BiocManager is available if(!requireNamespace('BiocManager', quietly=TRUE)){ install.packages('BiocManager') } BiocManager::install('carnation') ``` ## Load libraries & airway dataset First check for and load some libraries that we will need for this tutorial. ```{r presetup, eval=FALSE} # install optional packages if not present pkgs_to_check <- c('airway', 'org.Hs.eg.db', 'DEGreport') for(pkg in pkgs_to_check){ if(!requireNamespace(pkg, quietly=TRUE)){ BiocManager::install(pkg) } } ``` ```{r lib, message=FALSE} library(airway) library(DESeq2) library(dplyr) library(GeneTonic) library(org.Hs.eg.db) ``` We will be using the 'airway' dataset. First, we load this dataset. ```{r airway_load} data('airway') ``` Next, we extract the counts matrix and and metadata. ```{r extract} mat <- assay(airway) cdata <- colData(airway) ``` Now let's see what these look like. ```{r view} dim(mat) ``` So, `mat` is a matrix with 64102 rows and 8 columns. Each row corresponds to a single gene and each column corresponds to a single sample. As you will notice, the rownames of `mat` contains gene IDs and column names have the sample IDs. ```{r view_cnts} head(mat) ``` `cdata` contains the sample metadata. There is a lot of information here, but notice the `cell` and `dex` columns, as we will be using this for the differential expression analysis later. ```{r view_cdata} cdata ``` ## Get more gene annotation The gene IDs that come with the dataset are from ENSEMBL and are not human-readable. So, next we will extract gene symbols and `ENTREZID` for these genes from the `org.Hs.eg.db` package. ```{r annot, message=FALSE} keytypes <- list('SYMBOL'='SYMBOL', 'ENTREZID'='ENTREZID') anno_df <- do.call('cbind', lapply(keytypes, function(x){ mapIds(org.Hs.eg.db, column=x, keys=rownames(mat), keytype='ENSEMBL') }) ) # convert to data frame anno_df <- as.data.frame(anno_df) ``` Now, we have human readable gene names in the `SYMBOL` column and Entrez IDs in the `ENTREZ` column. ```{r view_annot} head(anno_df) ``` ## Create DESeqDataSet Next, we create a new `DESeqDataSet` using `mat` and `cdata`. ```{r dds} dds <- DESeqDataSetFromMatrix(mat, colData=cdata, design=~cell + dex) ``` Let's check to make sure that everything looks okay: ```{r view_dds} dds ``` Then we save `dds` in a list. ```{r save_dds} dds_list <- list(main=dds) ``` We also normalize the counts data and save it in a list. ```{r vst} rld_list <- lapply(dds_list, function(x) varianceStabilizingTransformation(x, blind=TRUE)) ``` ## Run differential expression analysis Now, the object `dds` is ready for differential expression (DE) analysis. ```{r deseq} dds <- DESeq(dds) ``` Since, we used `design=~cell + dex` while creating `dds`, the above step will automatically calculate some comparisons. ```{r results} resultsNames(dds) ``` The last comparison `dex_untrt_vs_trt` contains the effect of the `dex` treatment, while the other comparisons compare different cell lines. Next, we will extract two of these results and run `lfcShrink` on them. For the `cell` comparison, we choose a precomputed results using the `coef` parameter. ```{r contrast1, message=FALSE} cell_comparison <- lfcShrink(dds, coef='cell_N61311_vs_N052611', type='normal') ``` For the `dex` comparison, we use `contrast` to specify the direction of the comparison, since we want to use `untrt` as control. ```{r contrast2, message=FALSE} dex_comparison <- lfcShrink(dds, contrast=c('dex', 'trt', 'untrt'), type='normal') ``` Now, each of these comparisons, contain the DE analysis results. For example, ```{r view_contrast} head(dex_comparison) ``` Then, we save these results in a special nested list that `carnation` will use. Here, - `res` contains the actual DE analysis results - `dds` contains the name of the DESeqDataSet used for the DE analysis. These values should map to `dds_list` names - `label` is a description of the comparison ```{r res} res_list <- list( dex_trt_vs_untrt=list( res=dex_comparison, dds='main', label='dex, treated vs untreated'), cell_N61311_vs_N052611=list( res=cell_comparison, dds='main', label='cell, N61311 vs N052611') ) ``` Finally, we add `SYMBOL` and `ENTREZID` columns to the DE results from the `anno_df` data frame. ```{r bld_res} res_list <- lapply(res_list, function(x){ # save the rownames as a new 'gene' column x$res[[ 'gene' ]] <- rownames(x$res) # add 'SYMBOL' and 'ENTREZID' columns x$res[[ 'SYMBOL' ]] <- anno_df[rownames(x$res), 'SYMBOL'] x$res[[ 'ENTREZID' ]] <- anno_df[rownames(x$res), 'ENTREZID'] x }) ``` ## Add functional enrichment results (optional) Now we run functional enrichment on the DE genes from the two comparisons. For this, we first set significance thresholds and then extract the DE genes and save as a list. ```{r alpha} # padj cutoff alpha <- 0.01 # log2FoldChange threshold; 1 == 2x difference lfc_threshold <- 1 # list to save DE genes de.genes <- lapply(res_list, function(x){ # changed genes idx <- x$res$padj < alpha & !is.na(x$res$padj) & abs(x$res$log2FoldChange) >= lfc_threshold # return DE genes as a dataframe x$res[idx, c('gene', 'ENTREZID')] }) ``` Next, we run functional enrichment and save the results in a list called `enrich_list`. We also save a converted list called `genetonic` which carnation uses for several plots from the `GeneTonic` package. Since, this is a time-consuming step, we will use the pre-computed enrichment results included with carnation. ```{r enrich} # fe results from dex comparison data(eres_dex, package='carnation') # fe results from dex comparison data(eres_cell, package='carnation') # compile into a list go_list <- list(dex_trt_vs_untrt=eres_dex, cell_N61311_vs_N052611=eres_cell) # list to save functional enrichment results enrich_list <- list() # list to save a converted object for GeneTonic plots genetonic <- list() for(comp in names(res_list)){ # NOTE: this is the command used to generate the functional # enrichment results #go.res <- clusterProfiler::enrichGO( # gene=de.genes[[comp]][['ENTREZID']], # keyType='ENTREZID', # OrgDb=org.Hs.eg.db, # ont='BP', # pvalueCutoff=1, qvalueCutoff=1, # readable = TRUE) # we use the precomputed results here instead go.res <- go_list[[ comp ]] enrich_list[[ comp ]] <- list( res=comp, changed=list( BP=as.data.frame(go.res) ) ) genetonic[[ comp ]] <- list( res=comp, changed=list( BP=carnation::enrich_to_genetonic(go.res, res_list[[comp]]$res) ) ) } ``` `enrich_list` is a nested list where: - The top-level names are unique keys/identifiers - The second level corresponds to direction of change, e.g. `up`, `down` or `changed`. This level also contains a special entry `res` which maps to `res_list` names, as a way to record where the DE results came from. - The third level corresponds to the ontology, e.g. `BP` (GO Biological Process). Here, we are just using changed genes and the GO Biological Process (BP) ontology, so `enrich_list` looks like: ``` enrich_list ├─ dex_trt_vs_untrt │ ├─ res = 'des_trt_vs_untrt' <--- comparison used to get FE results │ └─ changed │ └─ BP <--- functional enrichment results │ └─ cell_N61311_vs_N052611 ├─ res = 'cell_N61311_vs_N052611' <--- comparison used to get FE results └─ changed └─ BP <--- functional enrichment results ``` `genetonic` mirrors the same structure. ## Add pattern analysis (optional) Finally, we add some pattern analysis for the `dex_trt_vs_untrt` comparison using the `DEGreport` package. First, we extract normalized data for the 755 DE genes from this comparison. ```{r deg_metadata} # extract normalized data & metadata ma <- assay(rld_list[['main']]) colData.i <- colData(rld_list[['main']]) # only keep data from DE genes idx <- rownames(ma) %in% de.genes[['dex_trt_vs_untrt']][['gene']] ma.i <- ma[idx,] # remove any genes with 0 variance ma.i <- ma.i[rowVars(ma.i) != 0, ] ``` Then, we run the pattern analysis, using `cell` as the *time* variable and `dex` as the *color* variable. Again, since this is a time-consuming step, we will use the pre-computed pattern analysis results included with carnation. ```{r degpatterns} # NOTE: This is the command used to perform pattern analysis # degpatterns_dex <- DEGreport::degPatterns( # ma.i, # colData.i, # # time='cell', # col='dex', # # # NOTE: reduce and merge cutoff---------------------------------------- # # Reduce will merge clusters that are similar; similarity determined # # by cutoff # reduce=TRUE, # # plot=FALSE # ) # We use the pre-computed results here instead data(degpatterns_dex, package='carnation') ``` Next, we extract the `normalized` slot from this object and save as a list. ```{r deg_extract} # extract normalized slot and add symbol column p_norm <- degpatterns_dex$normalized p_norm[[ 'SYMBOL' ]] <- anno_df[p_norm[['genes']], 'SYMBOL'] # save pattern analysis results degpatterns <- list(dex_by_cell=p_norm) ``` ## Compose carnation object Now we have all the pieces to build the carnation object. ```{r compose} combined <- list(res_list=res_list, dds_list=dds_list, rld_list=rld_list, enrich_list=enrich_list, genetonic=genetonic, degpatterns_list=degpatterns) saveRDS(combined, 'carnation_vignette.rds', compress=FALSE) ``` ## Data Organization Organize your data in a directory structure that Carnation can easily navigate, e.g. in a folder `carnation/data` in your home directory. Here, we make a subfolder `airway` in this folder and copy the `RDS` file we created above. ``` ~/carnation/data/ └─ airway └─ carnation-vignette.rds ``` Now, we're ready for the first run. ## First Run Load the carnation package. ```{r first_load} library(carnation) ``` Carnation allows you to download interactive plots as PDF. To use this functionality you need to install some required Python dependencies: ```{r install, eval=FALSE} install_carnation() # Installs plotly and kaleido for PDF export ``` Now, run the app: ```{r run_app, eval=FALSE} run_carnation() ``` To run on a fixed port, e.g. when using remote servers with SSH port forwarding, specify `port` within a list of options. ```{r run_app2, eval=FALSE} run_carnation(options=list(port=12345, launch.browser=FALSE)) ``` Then access Carnation by opening the URL: `http://127.0.0.1:12345` ### Initial setup - The first time you run carnation, you will be asked to specify where your data is located with a modal dialog. Enter the folder where you saved the RDS file, `~/carnation/data`, then click 'OK'. ![ ](images/01_first_run.png){width=600px} - Now carnation will automatically refresh and you will see `data/airway` in the "Available Projects" menu, and `carnation_vignette` in the "Available Datasets". Since this is the only dataset we have, these options are automatically selected. Now click 'Go' to load the data. ![ ](images/02_pre_load.png){width=600px} Now carnation will load the airway data and switch to the "Summary" tab. ### General layout The interface of Carnation consists of a main central panel for content, a sidebar containing settings and 'Gene scratchpad' buttons and context-specific help buttons. ![ ](images/03_layout.png){width=700px} ## Feature overview Carnation is organized into three major sections each with its own set of subtabs to organize different portions of the RNA-Seq analysis. ### DE analysis Here you can analyze differential expression through different tables and visualizations: - *Summary*: Get a quick overview of your differential expression results ![ ](images/04_data_loaded.png){width=600px} - *Metadata*: Explore sample metadata and experimental design. You can also add metadata columns. ![ ](images/05_metadata.png){width=600px} - *PCA Plot*: Visualize sample relationships with PCA plot. ![ ](images/06a_pca_analysis_cell_2d.png){width=600px} You can also view the plot in 3D ![ ](images/06b_pca_analysis_dex_3d.png){width=600px} or with gene loadings overlaid. ![ ](images/06c_pca_analysis_loadings.png){width=600px} - *MA Plot*: Identify differentially expressed genes with statistical significance ![ ](images/07_ma_plot.png){width=600px} - *Scatter Plot*: Compare fold-changes of genes across different comparisons ![ ](images/08_scatter.png){width=600px} - *Gene Plot*: Create customizable expression visualizations for genes of interest ![ ](images/09_geneplot_facet.png){width=600px} - *UpSet Plot*: Discover overlapping gene sets across multiple comparisons ![ ](images/10_upset.png){width=600px} - *Heatmap*: Examine expression patterns across samples and conditions ![ ](images/11_heatmap.png){width=600px} ### Functional enrichment Explore functional enrichment analysis results using this module: - **Table**: Interactive tables with powerful search capabilities ![ ](images/12_functable.png){width=650px} - **Plots**: Seven different visualizations including network plots and dendrograms from the GeneTonic and clusterProfiler packages: - Summary overview to view top enriched terms ![ ](images/13a_sumov.png){width=600px} - Three flavors of enrichment map including one with similar terms grouped together ![ ](images/13b_emap_distill.png){width=600px} - Cnetplot: network plot highlighting genes connecting functional terms ![ ](images/13c_cnetplot.png){width=600px} - Radar plot: top enriched terms shown in a circular view ![ ](images/13d_radar.png){width=600px} - Alluvial plot: flow diagram highlighting genes connecting functional terms ![ ](images/13e_alluvial.png){width=600px} - Dendrogram: hierarchical tree view of top enriched terms ![ ](images/13f_dendrogram.png){width=600px} - **Compare Results**: Directly compare enrichment results between conditions using three different visualizations. - Summary overview comparing contrasts ![ ](images/14a_comp_sumov.png){width=600px} - Radar plot for two comparisons ![ ](images/14b_comp_radar.png){width=600px} - Horizon plot for top terms connected by comparison ![ ](images/14c_comp_horizon.png){width=600px} ### Pattern analysis Identify co-regulated gene clusters across conditions - **Plot**: Visualize expression patterns of gene clusters ![ ](images/15_pattern_plot.png){width=600px} - **Cluster Membership**: Explore which genes belong to which clusters using an interactive table ![ ](images/15a_pattern_tbl.png){width=600px} ### Gene scratchpad You can keep track of genes of interest across Carnation using the gene scratchpad. - To use this feature, click on the "Gene scratchpad" button in the sidebar. - To start you can add top differentially expressed genes from a specific comparison by using the "Top genes settings" and clicking "Add top genes". ![ ](images/11a_gene_scratchpad.png){width=300px} Now several plots in the "DE analysis" section will have these genes labeled, e.g. the MA plot, scatter plot, and others. Labeled MA plot ![ ](images/11b_ma_plot_label.png){width=600px} Labeled scatter plot ![ ](images/11c_scatter_plot_label.png){width=600px} These genes can also be used to label the pattern analysis plot. Set "label" to "gene_scratchpad" and click "Refresh plot". ![ ](images/11d_pattern_plt_label.png){width=400px} Now the genes are labeled on the plot. You will also see a table below summarizing the clusters where these genes were found. ![ ](images/11e_pattern_plt_label2.png){width=600px} - You can also quickly add genes to the scratchpad from functional enrichment tables First, select your term of interest from the enrichment table ![ ](images/11f_functbl_click.png){width=700px} then, click the 'Add to scratchpad' button at the bottom of the page. ![ ](images/11g_functbl_add.png){width=700px} Now, these genes will be added to the scratchpad and can be used to label plots as shown above. ![ ](images/11h_functbl_scratch_show.png){width=300px} ### Server Mode Carnation supports multi-user environments with authentication: ```{r server, eval=FALSE} # Create user database credentials <- data.frame( user = c('shinymanager'), password = c('12345'), admin = c(TRUE), stringsAsFactors = FALSE ) # Initialize the database shinymanager::create_db( credentials_data = credentials, sqlite_path = 'credentials.sqlite', passphrase = 'admin_passphrase' ) ``` Now run carnation with authentication ```{r run_server, eval=FALSE} run_carnation(credentials='credentials.sqlite', passphrase='admin_passphrase') ``` # Summary In summary, `carnation` adds to the rich Bioconductor ecosystem providing a sophisticated interactive tool that makes it easier to ask complex questions of the data and accelerate the process of scientific discovery for biologists and computational scientists alike. Other packages with comparable functionality include `r BiocStyle::Biocpkg('pcaExplorer')` & `r BiocStyle::Biocpkg('iSEE')`. However, `carnation's` unique combination of thoughtful features, e.g. fuzzy search & gene scratchpad, in-built access management system and deployment flexibility to single or multi-user environments, makes it ideal as a platform for managing and sharing complex bulk RNA-Seq projects with collaborators. `carnation` currently supports the following widely-used tools: - Differential expression analysis: `r BiocStyle::Biocpkg('DESeq2')` - Functional enrichment analysis: `r BiocStyle::Biocpkg('clusterProfiler')` (both overrepresentation & gene set enrichment analysis). - Pattern analysis: `r BiocStyle::Biocpkg('DEGreport')` The functional enrichment module incorporates plots from the `r BiocStyle::Biocpkg('GeneTonic')` and `r BiocStyle::Biocpkg('enrichplot')` packages. Future plans include extending functionality to include support for other differential expression frameworks, e.g. `r BiocStyle::Biocpkg('edgeR')` & `r BiocStyle::Biocpkg('limma')`, functional enrichment tools, e.g. `r BiocStyle::Biocpkg('topGO')` and coexpression analysis methods, e.g. `r BiocStyle::CRANpkg('WGCNA')`. # sessionInfo ```{r sessionInfo} sessionInfo() ```