--- title: "Complete RNA-seq Analysis Workflow with VISTA" author: "VISTA Development Team" package: VISTA output: BiocStyle::html_document: toc: true toc_float: true toc_depth: 4 number_sections: true fig_width: 8 fig_height: 6 code_folding: show vignette: > %\VignetteIndexEntry{Complete RNA-seq Analysis Workflow with VISTA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.align = "center" ) ensure_pkgs_available <- function(pkgs) { missing <- pkgs[!vapply(pkgs, requireNamespace, logical(1), quietly = TRUE)] if (length(missing)) { stop("Missing required packages: ", paste(missing, collapse = ", ")) } } # This vignette requires all dependencies to already be installed. ensure_pkgs_available(c("VISTA", "ggplot2", "airway", "org.Hs.eg.db")) ``` # Introduction This vignette demonstrates a complete RNA-seq differential expression workflow using **VISTA** (Visualization and Integrated System for Transcriptomic Analysis). We'll use the well-known **airway** dataset from Bioconductor, which contains RNA-seq data from airway smooth muscle cells treated with dexamethasone. ## What is VISTA? VISTA streamlines RNA-seq analysis by: - **Unifying DE workflows**: Wraps DESeq2 and edgeR with consistent output - **Simplifying visualization**: 28+ publication-ready plotting functions - **Integrating enrichment**: Built-in MSigDB, GO, and KEGG analysis - **Ensuring reproducibility**: S4 class structure with comprehensive metadata ## Dataset Overview The **airway** dataset (Himes et al. 2014) includes: - **Samples**: 8 human airway smooth muscle cell lines - **Treatment**: 4 treated with dexamethasone, 4 untreated - **Sequencing**: Illumina HiSeq 2000 - **Features**: \~64,000 genes **Reference**: Himes BE et al. (2014). "RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in airway smooth muscle cells." *PLoS One* 9(6): e99625. # Installation and Setup ```{r install-packages, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install(c("VISTA", "airway", "org.Hs.eg.db")) install.packages("ggplot2") ``` ```{r load-packages} # Load required packages library(VISTA) library(ggplot2) # For plotting functions library(airway) # Dataset library(org.Hs.eg.db) # Human gene annotations library(magrittr) # %>% ``` If required packages are missing, install them before rendering this vignette. # Data Preparation ## Load the airway dataset ```{r load-data} # Load the SummarizedExperiment object data("airway", package = "airway") # Examine the structure airway ``` ## Extract counts and metadata ```{r extract-data} # Extract count matrix counts_matrix <- assay(airway, "counts") # Preview counts (first 5 genes, first 4 samples) counts_matrix[1:5, 1:4] # Extract sample metadata sample_metadata <- as.data.frame(colData(airway)) sample_metadata # Simplify treatment labels for clarity sample_metadata$treatment <- ifelse( sample_metadata$dex == "trt", "Dexamethasone", "Untreated" ) ``` ## Prepare data for VISTA VISTA includes helper functions that standardize counts and sample metadata before calling `create_vista()`: ```{r prepare-inputs} prepared_counts <- read_vista_counts( counts_matrix, format = "matrix" ) prepared_samples <- read_vista_metadata( sample_metadata %>% tibble::as_tibble() %>% dplyr::rename("sample_names" = "Run") ) matched_inputs <- match_vista_inputs(prepared_counts, prepared_samples) # Preview the create_vista-ready count table matched_inputs$counts[1:5, 1:5] ``` For a more complete guide covering file-derived sample-name repair and starter metadata generation with `derive_vista_metadata()`, see the pkgdown article `Preparing Counts and Metadata for VISTA`. The matched sample sheet now has stable `sample_names` aligned to the count columns: ```{r prepare-samples} matched_inputs$sample_info[, c("sample_names", "cell", "treatment", "dex")] ``` # Create VISTA Object ## Using DESeq2 backend The primary method for creating a VISTA object: ```{r create-vista} # Create VISTA object with DESeq2 backend vista <- create_vista( counts = matched_inputs$counts, sample_info = matched_inputs$sample_info, column_geneid = matched_inputs$column_geneid, group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "deseq2", min_counts = 10, min_replicates = 2, log2fc_cutoff = 1.0, pval_cutoff = 0.05, p_value_type = "padj" ) # Examine the VISTA object vista ``` The VISTA object stores: - **Normalized counts** in `assays` - **Sample metadata** in `colData` - **Gene annotations** in `rowData` - **DE results** in `metadata` ## Validate object integrity `create_vista()` runs validation by default (`validate = TRUE`). You can also run it explicitly: ```{r validate-vista} validate_vista(vista, level = "full") ``` For advanced users importing a pre-built `SummarizedExperiment`, use `as_vista()` and then validate: ```{r as-vista-example, eval=FALSE} se <- SummarizedExperiment::SummarizedExperiment( assays = list(norm_counts = norm_counts(vista)), colData = S4Vectors::DataFrame(sample_info(vista), row.names = sample_info(vista)$sample_names), rowData = S4Vectors::DataFrame(row_data(vista), row.names = rownames(norm_counts(vista))) ) vista2 <- as_vista(se, group_column = "treatment") validate_vista(vista2, level = "full") ``` ## Alternative: Using edgeR backend ```{r create-vista-edger, eval=FALSE} # Create VISTA object with edgeR backend vista_edger <- create_vista( counts = count_data, sample_info = sample_info, column_geneid = "gene_id", group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "edger", # Use edgeR instead of DESeq2 min_counts = 10, min_replicates = 2, log2fc_cutoff = 1.0, pval_cutoff = 0.05, p_value_type = "padj" ) ``` ## Alternative: Using limma-voom backend ```{r create-vista-limma, eval=FALSE} # Create VISTA object with limma-voom backend vista_limma <- create_vista( counts = count_data, sample_info = sample_info, column_geneid = "gene_id", group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "limma", min_counts = 10, min_replicates = 2, log2fc_cutoff = 1.0, pval_cutoff = 0.05, p_value_type = "padj" ) ``` ## Advanced: covariates, design formula, and consensus mode ```{r advanced-design-consensus, eval=FALSE} # Covariate-adjusted model (additive design) vista_cov <- create_vista( counts = count_data, sample_info = sample_info, column_geneid = "gene_id", group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "deseq2", covariates = c("cell") ) # Equivalent explicit model formula vista_formula <- create_vista( counts = count_data, sample_info = sample_info, column_geneid = "gene_id", group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "deseq2", design_formula = "~ cell + treatment" ) # Run both DESeq2 and edgeR and keep consensus as active source vista_both <- create_vista( counts = count_data, sample_info = sample_info, column_geneid = "gene_id", group_column = "treatment", group_numerator = "Dexamethasone", group_denominator = "Untreated", method = "both", consensus_mode = "intersection", # or "union" result_source = "consensus" # or "deseq2"/"edger" ) # Access source-specific outputs comparisons(vista_both, source = "consensus") comparisons(vista_both, source = "deseq2") comparisons(vista_both, source = "edger") # Switch the active source used by plotting functions vista_both <- set_de_source(vista_both, "edger") ``` ## Add gene annotations Enhance the object with gene symbols and descriptions: ```{r add-annotations} vista <- set_rowdata( vista, orgdb = org.Hs.eg.db, columns = c("SYMBOL", "GENENAME", "ENTREZID"), keytype = "ENSEMBL" ) # View updated gene annotations head(rowData(vista)) ``` # Explore the Results ## Access differential expression results ```{r access-results} # Get comparison names comp_names <- names(comparisons(vista)) comp_names # Get DE results for the comparison de_results <- comparisons(vista)[[1]] head(de_results) # Get DEG summary deg_summary(vista) # Get analysis cutoffs cutoffs(vista) ``` ## Count significant genes ```{r count-degs} # Extract upregulated genes up_genes <- get_genes_by_regulation( vista, sample_comparisons = comp_names[1], regulation = "Up", #top_n = 50 ) # Extract downregulated genes down_genes <- get_genes_by_regulation( vista, sample_comparisons = comp_names[1], regulation = "Down", #top_n = 50 ) # Summary cat("Upregulated genes:", length(up_genes[[1]]), "\n") cat("Downregulated genes:", length(down_genes[[1]]), "\n") ``` # Quality Control Visualizations ## Sample Correlation Heatmap Check sample relationships and potential batch effects. ### Basic correlation heatmap ```{r corr-heatmap-basic, fig.width=7, fig.height=6} get_corr_heatmap(vista, label_size = 18,base_size = 18, viridis_direction = -1) ``` ### Customize color scheme ```{r corr-heatmap-colors, fig.width=7, fig.height=6} # Reverse viridis color direction get_corr_heatmap( vista, viridis_direction = -1, viridis_option = "plasma", label_size = 18, base_size = 18 ) ``` ### Show correlation values ```{r corr-heatmap-values, fig.width=7, fig.height=6} # Display correlation coefficients get_corr_heatmap( vista, viridis_direction = -1, show_corr_values = TRUE, col_corr_values = 'white', viridis_option = "mako", label_size = 14, base_size = 14 ) ``` ## Principal Component Analysis (PCA) Visualize sample clustering and variation. ### Basic PCA with labels ```{r pca-basic, fig.width=8, fig.height=6} get_pca_plot( vista, label = TRUE,label_size = 5 ) ``` ### PCA colored by different metadata ```{r pca-shape, fig.width=8, fig.height=6} # Shape points by cell line get_pca_plot( vista, label = TRUE, label_size = 5, shape_by = "cell" ) ``` ### PCA with top variable genes ```{r pca-top-genes, fig.width=8, fig.height=6} # Use top 500 most variable genes get_pca_plot( vista, top_n_genes = 500, show_clusters = TRUE, shape_by = "cell" ) ``` ### PCA with custom circle size ```{r pca-circle-size, fig.width=8, fig.height=6} # Larger points for better visibility get_pca_plot( vista, label = TRUE, point_size = 15,label_size = 5 ) ``` ### PCA without labels ```{r pca-no-labels, fig.width=8, fig.height=6} # Clean plot without sample labels get_pca_plot( vista, label = FALSE, point_size = 12 ) ``` ## Multidimensional Scaling (MDS) Alternative dimensionality reduction method. ### Basic MDS plot ```{r mds-basic, fig.width=8, fig.height=6} get_mds_plot( vista, label = TRUE ) ``` ### MDS with top variable genes ```{r mds-top-genes, fig.width=8, fig.height=6} get_mds_plot( vista, top_n_genes = 500, label = TRUE ) ``` ### MDS with custom shapes ```{r mds-shapes, fig.width=8, fig.height=6} # Shape points by cell line get_mds_plot( vista, label = TRUE, shape_by = "cell" ) ``` ## Uniform Manifold Approximation and Projection (UMAP) Non-linear sample embedding for exploratory structure. ### Basic UMAP plot ```{r umap-basic, fig.width=8, fig.height=6, eval=requireNamespace("uwot", quietly=TRUE)} get_umap_plot( vista, label = TRUE ) ``` ### UMAP colored by a user-defined metadata column ```{r umap-color-by-cell, fig.width=8, fig.height=6, eval=requireNamespace("uwot", quietly=TRUE)} get_umap_plot( vista, color_by = "cell", shape_by = "treatment", label = TRUE ) ``` # Differential Expression Visualizations ## DEG Count Summary ### Basic count barplot ```{r deg-count-basic, fig.width=7, fig.height=5} get_deg_count_barplot(vista) ``` ### Faceted by regulation ```{r deg-count-facet, fig.width=8, fig.height=5} get_deg_count_barplot( vista, facet_by = "regulation" ) ``` ## Volcano Plot Classic volcano plot showing log2FC vs -log10(p-value). ### Basic volcano plot ```{r volcano-basic, fig.width=8, fig.height=7, eval=requireNamespace("EnhancedVolcano", quietly=TRUE)} get_volcano_plot( vista, sample_comparison = comp_names[1] ) ``` ### Customize cutoffs and labels ```{r volcano-custom, fig.width=8, fig.height=7, eval=requireNamespace("EnhancedVolcano", quietly=TRUE)} get_volcano_plot( vista, sample_comparison = comp_names[1], log2fc_cutoff = 1.5, pval_cutoff = 0.01, label_size = 3, display_id = "SYMBOL" ) ``` ### Custom colors ```{r volcano-colors, fig.width=8, fig.height=7, eval=requireNamespace("EnhancedVolcano", quietly=TRUE)} # Custom colors for up/down regulated genes get_volcano_plot( vista, sample_comparison = comp_names[1], col_up = "red", col_down = "blue", display_id = "SYMBOL" ) ``` ## MA Plot Mean expression vs log2 fold-change. ### Basic MA plot ```{r ma-basic, fig.width=8, fig.height=6} get_ma_plot( vista, sample_comparison = comp_names[1] ) ``` ### Label top genes ```{r ma-labeled, fig.width=8, fig.height=6} get_ma_plot( vista, sample_comparison = comp_names[1], label_n = 10, point_size = 2, display_id = "SYMBOL" ) ``` ### Custom cutoffs ```{r ma-custom, fig.width=8, fig.height=6} get_ma_plot( vista, sample_comparison = comp_names[1], label_n = 5, display_id = "SYMBOL", point_size = 1.5, alpha = 0.8 ) ``` # Expression Pattern Analysis ## Prepare gene sets ```{r prep-gene-sets} # Get top 50 DEGs by adjusted p-value de_table <- comparisons(vista)[[1]] top_degs <- get_genes_by_regulation(vista, names(comparisons(vista))[1],regulation = "Both",top_n = 50)[[1]] # top 50 by abs fold change top_up <- get_genes_by_regulation(vista, names(comparisons(vista))[1],regulation = "Up",top_n = 50)[[1]] top_down <- get_genes_by_regulation(vista, names(comparisons(vista))[1],regulation = "Down",top_n = 50)[[1]] cat("Top upregulated genes:\n") print(top_up) cat("\nTop downregulated genes:\n") print(top_down) ``` ## Expression Heatmaps ### Basic heatmap ```{r heatmap-basic, fig.width=8, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_expression_heatmap(vista) ``` ### Heatmap with explicit gene set ```{r heatmap-explicit, fig.width=8, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_expression_heatmap( vista, sample_group = levels(colData(vista)$treatment), genes = top_degs, display_id = "SYMBOL", show_row_names = TRUE ) ``` ### Heatmap with k-means clustering ```{r heatmap-kmeans, fig.width=8, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_expression_heatmap( vista, sample_group = levels(colData(vista)$treatment), genes = top_degs, display_id = "SYMBOL", kmeans_k = 3, show_row_names = TRUE ) ``` ### Heatmap with column annotations ```{r heatmap-annotated, fig.width=8, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_expression_heatmap( vista, sample_group = levels(colData(vista)$treatment), genes = top_degs, show_row_names = TRUE, display_id = "SYMBOL", kmeans_k = 3, cluster_row_slice = FALSE, summarise_replicates = FALSE, annotate_columns = TRUE ) ``` ### Heatmap with multiple column annotations and `cluster_by` ```{r heatmap-annotated-multi, fig.width=8, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} # Use multiple sample-level columns in top annotation. # By default, columns are split by the first annotation column. get_expression_heatmap( vista, sample_group = levels(colData(vista)$treatment), genes = top_degs, show_row_names = FALSE, display_id = "SYMBOL", summarise_replicates = FALSE, annotate_columns = c("treatment", "cell"), cluster_by = "cell" ) ``` ### Heatmap showing each replicate ```{r heatmap-summarized, fig.width=7, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_expression_heatmap( vista, sample_group = levels(colData(vista)$treatment), genes = top_degs, display_id = "SYMBOL", summarise_replicates = FALSE, show_row_names = FALSE, annotate_columns = TRUE, kmeans_k = 2 ) ``` ## Expression Barplots ### Basic barplot ```{r barplot-basic, fig.width=9, fig.height=6} get_expression_barplot( vista, genes = top_up[1:4], display_id = "SYMBOL", )+theme_minimal(base_size = 16) ``` ### Log-transformed with statistics ```{r barplot-log-stats, fig.width=9, fig.height=6} # Add statistical comparisons between groups get_expression_barplot( vista, genes = top_up[1:4], display_id = "SYMBOL", log_transform = TRUE, stats_group = TRUE # Enable statistical annotations ) + theme_minimal(base_size = 16) ``` ### Per-sample barplot for selected genes ```{r barplot-per-sample, fig.width=10, fig.height=6} get_expression_barplot( vista, genes = top_up[1:2], display_id = "SYMBOL", by = "sample", sample_order = "group" )+ theme(text = element_text(size = 16)) ``` ### Compare up and down regulated genes ```{r barplot-comparison, fig.width=10, fig.height=6} # Compare expression of both up- and down-regulated genes selected_genes <- c(top_up[1:3], top_down[1:3]) get_expression_barplot( vista, genes = selected_genes, display_id = "SYMBOL", log_transform = TRUE, stats_group = TRUE )+ theme(text = element_text(size = 16)) ``` ## Expression Boxplots ### Basic boxplot ```{r boxplot-basic, fig.width=9, fig.height=6} get_expression_boxplot( vista, genes = top_up[1:4], display_id = "SYMBOL" )+ theme(text = element_text(size = 16)) ``` ### Boxplot without faceting ```{r boxplot-no-facet, fig.width=9, fig.height=6} # All genes overlaid on same plot get_expression_boxplot( vista, genes = top_up[1:3], display_id = "SYMBOL", facet_by = "none" ) + theme(text = element_text(size = 16)) ``` ### Boxplot with faceting by gene ```{r boxplot-facets, fig.width=9, fig.height=6} # Each gene in separate panel - must specify facet_by = "gene" get_expression_boxplot( vista, genes = top_up[1:3], display_id = "SYMBOL", facet_by = "gene", facet_scales = "free_y" ) + theme(text = element_text(size = 16)) ``` ### Boxplot with gene facets AND statistics ```{r boxplot-facets-stats, fig.width=10, fig.height=7} # Each gene in separate panel WITH statistical comparisons get_expression_boxplot( vista, genes = top_up[1:4], display_id = "SYMBOL", log_transform = TRUE, facet_by = "gene", facet_scales = "free_y", stats_group = TRUE, # Add statistics to each gene panel p.label = "p.signif" )+ theme(text = element_text(size = 16)) ``` ### Pooled genes with statistics ```{r boxplot-pooled, fig.width=8, fig.height=6} # Pool all genes together for group comparison with statistical test get_expression_boxplot( vista, genes = top_up[1:5], display_id = "SYMBOL", log_transform = TRUE, pool_genes = TRUE, facet_by = "none", stats_group = TRUE, # Required for statistical annotations p.label = "p.signif" )+ theme(text = element_text(size = 16)) ``` ### Log-transformed with p-values ```{r boxplot-pvalues, fig.width=9, fig.height=6} # Show statistical comparisons between treatment groups get_expression_boxplot( vista, genes = top_up[1:4], display_id = "SYMBOL", log_transform = TRUE, stats_group = TRUE, # Enable statistical annotations p.label = "p.signif" )+ theme(text = element_text(size = 16)) ``` ## Expression Violin Plots ### Basic violin plot ```{r violin-basic, fig.width=9, fig.height=6} get_expression_violinplot( vista, genes = top_up[1:4], display_id = "SYMBOL" )+ theme(text = element_text(size = 16)) ``` ### Violin with log2 transformation ```{r violin-log, fig.width=9, fig.height=6} get_expression_violinplot( vista, genes = top_up[1:4],, display_id = "SYMBOL", value_transform = "none" )+ theme(text = element_text(size = 16)) ``` ### Violin with z-score transformation ```{r violin-zscore, fig.width=8, fig.height=6} get_expression_violinplot( vista, genes = top_up[1:4], value_transform = "zscore" )+ theme(text = element_text(size = 16)) ``` ## Additional Expression Plots ### Density plot ```{r density-plot, fig.width=9, fig.height=6} get_expression_density( vista, genes = top_up[1:50], log_transform = TRUE )+ theme(text = element_text(size = 16)) ``` ### Scatter plot (sample vs sample) ```{r scatter-plot, fig.width=8, fig.height=7} # Compare two samples samples <- colnames(vista) get_expression_scatter( vista, sample_x = samples[1], sample_y = samples[2], log_transform = TRUE, label_n = 50,display_id = "SYMBOL",label_size = 4 )+ theme(text = element_text(size = 16)) ``` ### Line plot (expression across samples) ```{r line-plot, fig.width=9, fig.height=6} get_expression_lineplot( vista, genes = top_up[1:3], log_transform = TRUE,display_id = "SYMBOL", by = "sample",facet_by = "none", group_column = "treatment",sample_group = c("Untreated","Dexamethasone") ) ``` ### Lollipop plot ```{r lollipop-plot, fig.width=9, fig.height=6} get_expression_lollipop( vista, genes = top_up[1:4], display_id = "SYMBOL", log_transform = TRUE ) ``` ### Per-sample lollipop plot ```{r lollipop-per-sample, fig.width=10, fig.height=6} get_expression_lollipop( vista, genes = top_up[1:2], display_id = "SYMBOL", by = "sample", sample_order = "expression" ) ``` ### Joyplot by treatment group ```{r joyplot, fig.width=8, fig.height=7, eval=requireNamespace("ggridges", quietly=TRUE)} # Ridges by treatment group - shows distribution for each group get_expression_joyplot( vista, genes = top_up[1:5], log_transform = TRUE, y_by = "group", # Each treatment group gets a ridge color_by = "group" # Color by treatment group ) ``` ### Joyplot by sample ```{r joyplot-sample, fig.width=8, fig.height=9, eval=requireNamespace("ggridges", quietly=TRUE)} # Ridges by individual sample - shows distribution for each sample get_expression_joyplot( vista, genes = top_up[1:3], log_transform = TRUE, y_by = "sample", # Each sample gets a ridge color_by = "group" # Color by treatment group ) ``` ### Raincloud plot (expression) ```{r raincloud-expression-message, eval=!requireNamespace("ggrain", quietly=TRUE), echo=FALSE} cat("Package 'ggrain' is not installed; expression raincloud plot example is skipped.\n") ``` ```{r raincloud-expression, fig.width=9, fig.height=6, eval=requireNamespace("ggrain", quietly=TRUE)} get_expression_raincloud( vista, genes = top_up, value_transform = "log2", summarise = TRUE, facet_by = "none", id.long.var = "gene", stats_group = TRUE ) ``` # Functional Enrichment Analysis ## MSigDB Enrichment ### Hallmark gene sets - Upregulated ```{r msigdb-up, eval=requireNamespace("msigdbr", quietly=TRUE)} msig_up <- get_msigdb_enrichment( vista, sample_comparison = comp_names[1], regulation = "Up", msigdb_category = "H", # Hallmark gene sets species = "Homo sapiens", from_type = "ENSEMBL" ) # View top enriched pathways if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { head(msig_up$enrich@result[, c("Description", "pvalue", "p.adjust", "Count")]) } ``` ### Hallmark gene sets - Downregulated ```{r msigdb-down, eval=requireNamespace("msigdbr", quietly=TRUE)} msig_down <- get_msigdb_enrichment( vista, sample_comparison = comp_names[1], regulation = "Down", msigdb_category = "H", species = "Homo sapiens", from_type = "ENSEMBL" ) if (!is.null(msig_down$enrich) && nrow(msig_down$enrich@result) > 0) { head(msig_down$enrich@result[, c("Description", "pvalue", "p.adjust", "Count")]) } ``` ## Enrichment Visualizations ### VISTA dotplot (default) ```{r enrichment-dotplot, fig.width=9, fig.height=7, eval=requireNamespace("msigdbr", quietly=TRUE)} # VISTA's wrapper function if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { get_enrichment_plot(msig_up$enrich, top_n = 10) } ``` ### Barplot (clusterProfiler native) ```{r enrichment-barplot, fig.width=9, fig.height=6, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("enrichplot", quietly=TRUE)} # Use generic barplot with enrichResult method if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { barplot(msig_up$enrich, showCategory = 10) } ``` ### Dotplot with customization ```{r enrichment-dotplot-native, fig.width=9, fig.height=7, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("enrichplot", quietly=TRUE)} # Customized dotplot with more categories if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { enrichplot::dotplot(msig_up$enrich, showCategory = 20, font.size = 12) } ``` ### Network plot (clusterProfiler native) ```{r enrichment-network, fig.width=10, fig.height=9, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("enrichplot", quietly=TRUE)} # Gene-concept network showing gene-pathway relationships if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { enrichplot::cnetplot(msig_up$enrich, showCategory = 5) } ``` ### Chord diagram (gene--pathway relationships) The chord diagram reveals which **hub genes** drive multiple enriched pathways and how much redundancy exists across terms. Chords can be coloured by fold-change when a VISTA object is supplied. ```{r enrichment-chord-pathway, fig.width=8, fig.height=8, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("circlize", quietly=TRUE)} # Pathway-coloured chord diagram (no VISTA object needed) if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { get_enrichment_chord( msig_up, top_n = 6, color_by = "pathway", title = "Gene-Pathway Membership (Up-regulated, Hallmark)" ) } ``` ```{r enrichment-chord-fc, fig.width=8, fig.height=8, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("circlize", quietly=TRUE)} # Fold-change coloured chords if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { get_enrichment_chord( msig_up, vista = vista, sample_comparison = names(comparisons(vista))[1], top_n = 6, color_by = "foldchange", display_id = "SYMBOL", title = "Gene-Pathway Chord (coloured by log2FC)" ) } ``` ```{r enrichment-chord-hub, fig.width=8, fig.height=8, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("circlize", quietly=TRUE)} # Show only hub genes shared across 2+ pathways if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { chord_result <- get_enrichment_chord( msig_up, vista = vista, sample_comparison = names(comparisons(vista))[1], top_n = 8, min_pathways = 2, color_by = "regulation", display_id = "SYMBOL", title = "Hub Genes Bridging Multiple Pathways" ) # Inspect hub genes returned invisibly if (length(chord_result$hub_genes)) { cat("Hub genes:", paste(head(chord_result$hub_genes, 10), collapse = ", "), "\n") } } ``` ## Pathway-Specific Expression Heatmaps Use enrichment output to extract pathway genes and visualize their expression directly. ### Extract genes from top pathways ```{r pathway-genes, eval=requireNamespace("msigdbr", quietly=TRUE)} if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { pathway_gene_list <- get_pathway_genes( msig_up$enrich, top_n = 3, return_type = "list" ) # Preview first few genes per pathway lapply(pathway_gene_list, head, n = 5) } ``` ### Heatmap of genes from top enriched pathways ```{r pathway-heatmap, fig.width=8, fig.height=10, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("ComplexHeatmap", quietly=TRUE)} if (!is.null(msig_up$enrich) && nrow(msig_up$enrich@result) > 0) { get_pathway_heatmap( x = vista, enrichment = msig_up$enrich, sample_group = c("Untreated", "Dexamethasone"), top_n = 2, gene_mode = "union", max_genes = 60, value_transform = "zscore", display_id = "SYMBOL", annotate_columns = TRUE, summarise_replicates = FALSE, show_row_names = FALSE ) } ``` ## GO Enrichment ### Biological Process ```{r go-bp, eval=requireNamespace("clusterProfiler", quietly=TRUE)} go_bp <- get_go_enrichment( vista, sample_comparison = comp_names[1], regulation = "Up", ont = "BP", # Biological Process species = "Homo sapiens", from_type = "ENSEMBL" ) if (!is.null(go_bp$enrich) && nrow(go_bp$enrich@result) > 0) { head(go_bp$enrich@result[, c("Description", "pvalue", "p.adjust", "Count")], n = 10) } ``` ### GO Visualization ```{r go-plot, fig.width=9, fig.height=7, eval=requireNamespace("clusterProfiler", quietly=TRUE)} if (!is.null(go_bp$enrich) && nrow(go_bp$enrich@result) > 0) { get_enrichment_plot(go_bp$enrich, top_n = 20) } ``` ## Gene Set Enrichment Analysis (GSEA) GSEA uses ranked gene lists to identify pathways enriched at the top or bottom of the ranking. VISTA automatically prepares the ranked list from your differential expression results. ### GSEA with MSigDB Hallmark gene sets ```{r gsea-hallmark, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("clusterProfiler", quietly=TRUE)} # Run GSEA using VISTA's native function gsea_results <- get_gsea( vista, sample_comparison = comp_names[1], set_type = "msigdb", from_type = "ENSEMBL", species = "Homo sapiens", msigdb_category = "H", # Hallmark gene sets pvalueCutoff = 0.05, pAdjustMethod = "BH" ) # Show results if (!is.null(gsea_results$enrich) && nrow(gsea_results$enrich@result) > 0) { head(gsea_results$enrich@result[, c("Description", "NES", "pvalue", "p.adjust")], n = 10) } ``` ### GSEA with GO Biological Process ```{r gsea-go, eval=requireNamespace("clusterProfiler", quietly=TRUE)} # Run GSEA with GO terms gsea_go <- get_gsea( vista, sample_comparison = comp_names[1], set_type = "go", from_type = "ENSEMBL", orgdb = org.Hs.eg.db, ont = "BP", # Biological Process pvalueCutoff = 0.05 ) # Show results if (!is.null(gsea_go$enrich) && nrow(gsea_go$enrich@result) > 0) { head(gsea_go$enrich@result[, c("Description", "NES", "pvalue", "p.adjust")], n = 10) } ``` ### GSEA enrichment overview ```{r gsea-dotplot, fig.width=9, fig.height=7, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("clusterProfiler", quietly=TRUE)} # Show all significant pathways using VISTA's visualization if (!is.null(gsea_results$enrich) && nrow(gsea_results$enrich@result) > 0) { get_enrichment_plot(gsea_results$enrich, top_n = 15) } ``` ### GSEA plot for top pathway ```{r gsea-pathway, fig.width=10, fig.height=6, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("enrichplot", quietly=TRUE)} # Show enrichment plot for the top pathway if (!is.null(gsea_results$enrich) && nrow(gsea_results$enrich@result) > 0) { # Create GSEA enrichment plot with running enrichment score enrichplot::gseaplot2( gsea_results$enrich, geneSetID = 1, # Top pathway title = gsea_results$enrich@result$Description[1], pvalue_table = TRUE ) } ``` ### GSEA plot for multiple pathways ```{r gsea-multi, fig.width=10, fig.height=8, eval=requireNamespace("msigdbr", quietly=TRUE) && requireNamespace("enrichplot", quietly=TRUE)} # Show top 3 pathways together if (!is.null(gsea_results$enrich) && nrow(gsea_results$enrich@result) > 0) { enrichplot::gseaplot2( gsea_results$enrich, geneSetID = 1:3, # Top 3 pathways pvalue_table = TRUE, ES_geom = "line" ) } ``` ### GSEA with GO visualization ```{r gsea-go-plot, fig.width=9, fig.height=7, eval=requireNamespace("clusterProfiler", quietly=TRUE)} # Visualize GO GSEA results if (!is.null(gsea_go$enrich) && nrow(gsea_go$enrich@result) > 0) { get_enrichment_plot(gsea_go$enrich, top_n = 15) } ``` ## KEGG Pathway Enrichment ### KEGG upregulated genes ```{r kegg-up, eval=FALSE} kegg_up <- get_kegg_enrichment( vista, sample_comparison = comp_names[1], regulation = "Up", species = "Homo sapiens", from_type = "ENSEMBL" ) if (!is.null(kegg_up$enrich) && nrow(kegg_up$enrich@result) > 0) { head(kegg_up$enrich@result[, c("Description", "pvalue", "p.adjust", "Count")], n = 10) } ``` ### KEGG downregulated genes ```{r kegg-down, eval=FALSE} kegg_down <- get_kegg_enrichment( vista, sample_comparison = comp_names[1], regulation = "Down", species = "Homo sapiens", from_type = "ENSEMBL" ) if (!is.null(kegg_down$enrich) && nrow(kegg_down$enrich@result) > 0) { head(kegg_down$enrich@result[, c("Description", "pvalue", "p.adjust", "Count")]) } ``` ### KEGG Visualization ```{r kegg-plot, fig.width=9, fig.height=6, eval=FALSE} if (!is.null(kegg_up$enrich) && nrow(kegg_up$enrich@result) > 0) { get_enrichment_plot(kegg_up$enrich, top_n = 15) } ``` # Fold-Change Analysis ## Fold-change Matrix Useful for comparing multiple comparisons: ```{r foldchange-matrix} fc_matrix <- get_foldchange_matrix(vista) head(fc_matrix, n = 10) ``` ## Fold-change Barplot and Lollipop ### Per-gene fold-change barplot ```{r fc-barplot-gene, fig.width=9, fig.height=6} get_foldchange_barplot( vista, genes = top_up[1:3], sample_comparisons = comp_names, display_id = "SYMBOL", facet_by = "gene" ) ``` ### Per-gene fold-change lollipop ```{r fc-lollipop-gene, fig.width=9, fig.height=6} get_foldchange_lollipop( vista, sample_comparison = comp_names[1], genes = top_up[1:3], display_id = "SYMBOL", facet_by = "gene" ) ``` ## Fold-change Raincloud ```{r raincloud-foldchange-message, eval=!requireNamespace("ggrain", quietly=TRUE), echo=FALSE} cat("Package 'ggrain' is not installed; fold-change raincloud plot example is skipped.\n") ``` ```{r raincloud-foldchange, fig.width=9, fig.height=6, eval=requireNamespace("ggrain", quietly=TRUE)} get_foldchange_raincloud( vista, sample_comparisons = comp_names, facet_by = "none", id.long.var = "gene_id", stats_group = TRUE ) ``` ## Fold-change Heatmap ### Basic FC heatmap ```{r fc-heatmap-basic, fig.width=7, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_foldchange_heatmap(vista) ``` ### FC heatmap for selected genes ```{r fc-heatmap-explicit, fig.width=7, fig.height=9, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} # Select genes with large fold-changes fc_genes <- rownames(fc_matrix)[abs(fc_matrix[, 1]) > 2][1:30] get_foldchange_heatmap( vista, sample_comparisons = comp_names, genes = fc_genes, display_id = "SYMBOL" ) ``` ### FC heatmap with gene names ```{r fc-heatmap-names, fig.width=7, fig.height=10, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} get_foldchange_heatmap( vista, sample_comparisons = comp_names, genes = fc_genes[1:25], show_row_names = TRUE, display_id = "SYMBOL" ) ``` ### FC heatmap for specific gene set ```{r fc-heatmap-custom, fig.width=7, fig.height=8, eval=requireNamespace("ComplexHeatmap", quietly=TRUE)} # Use top upregulated genes get_foldchange_heatmap( vista, sample_comparisons = comp_names, genes = top_up, show_row_names = TRUE, display_id = "SYMBOL" ) ``` # Export Results ## Export DE results to file ```{r export-results, eval=FALSE} # Export complete DE table with annotations de_annotated <- merge( comparisons(vista)[[1]], as.data.frame(rowData(vista)), by.x = "gene_id", by.y = "row.names" ) # Write to CSV write.csv( de_annotated, file = "airway_dexamethasone_vs_untreated_DE_results.csv", row.names = FALSE ) # Export significant genes only sig_genes <- de_annotated[de_annotated$regulation %in% c("Up", "Down"), ] write.csv( sig_genes, file = "airway_significant_DEGs.csv", row.names = FALSE ) ``` ## Save VISTA object ```{r save-vista, eval=FALSE} # Save the complete VISTA object for later use saveRDS(vista, file = "airway_vistaect.rds") # Load it back # vista <- readRDS("airway_vistaect.rds") ``` # Summary ## Workflow Completed In this workflow, we: 1. ✅ Loaded the airway RNA-seq dataset 2. ✅ Created a VISTA object with DESeq2 analysis 3. ✅ Added gene annotations from org.Hs.eg.db 4. ✅ Performed quality control (PCA, MDS, UMAP, correlation) 5. ✅ Visualized differential expression (volcano, MA plots) 6. ✅ Analyzed expression patterns (heatmaps, barplots, boxplots, violin/raincloud plots, and more) 7. ✅ Performed functional enrichment (MSigDB, GO, KEGG) 8. ✅ Explored fold-change patterns 9. ✅ Generated publication-ready visualizations ## Key Features Demonstrated - **Single-function workflow**: `create_vista()` handles DE analysis - **Consistent interface**: All plot functions follow the same pattern - **Flexible visualizations**: Easy to customize colors, labels, thresholds - **Multiple plot types**: 29+ plotting functions for every analysis need - **Integrated enrichment**: No need to wrangle gene IDs manually - **Publication-ready**: All plots are ggplot2/ComplexHeatmap objects ## Plotting Functions Used ### QC Plots - `get_corr_heatmap()` - Sample correlation - `get_pca_plot()` - Principal component analysis - `get_mds_plot()` - Multidimensional scaling - `get_umap_plot()` - Nonlinear sample embedding ### DE Visualization - `get_deg_count_barplot()` - DEG summary counts - `get_volcano_plot()` - Volcano plots - `get_ma_plot()` - MA plots ### Expression Plots - `get_expression_heatmap()` - Expression heatmaps - `get_expression_barplot()` - Expression barplots - `get_expression_boxplot()` - Expression boxplots - `get_expression_violinplot()` - Violin plots - `get_expression_density()` - Density plots - `get_expression_scatter()` - Sample-vs-sample scatter - `get_expression_lineplot()` - Expression across samples - `get_expression_lollipop()` - Lollipop plots - `get_expression_joyplot()` - Ridgeline plots ### Enrichment Plots - `get_enrichment_plot()` - Generic enrichment visualization - `get_msigdb_enrichment()` - MSigDB enrichment - `get_go_enrichment()` - GO enrichment - `get_kegg_enrichment()` - KEGG pathway enrichment - `get_pathway_genes()` - Extract genes driving enriched pathways - `get_pathway_heatmap()` - Plot pathway-derived expression heatmaps - `get_enrichment_chord()` - Chord diagram of gene-pathway relationships ### Fold-Change - `get_foldchange_matrix()` - Extract FC matrix - `get_foldchange_heatmap()` - Visualize FC patterns ## Next Steps - Try with your own data - Explore edgeR backend: `method = "edger"` - Explore limma-voom backend: `method = "limma"` - Test multiple comparisons simultaneously - Customize plots with ggplot2 themes - Generate automated reports with `run_vista_report()` - Integrate with downstream tools # Session Information ```{r session-info} sessionInfo() ``` # References - Himes BE et al. (2014). RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. *PLoS One*, 9(6), e99625. - Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. *Genome Biology*, 15, 550. - Robinson MD, McCarthy DJ, Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. *Bioinformatics*, 26(1), 139-140. - Subramanian A et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. *PNAS*, 102(43), 15545-15550. - Yu G et al. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. *OMICS*, 16(5), 284-287.