--- title: "Visualization of Functional Enrichment Result" author: "\\ Guangchuang Yu\\ School of Basic Medical Sciences, Southern Medical University" date: "`r Sys.Date()`" bibliography: enrichplot.bib biblio-style: apalike output: prettydoc::html_pretty: toc: true theme: cayman highlight: github pdf_document: toc: true vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{enrichplot introduction} %\VignetteDepends{ggplot2} %\VignetteDepends{ggraph} %\usepackage[utf8]{inputenc} --- ```{r include=FALSE} knitr::opts_chunk$set(warning = FALSE, message = TRUE) library(DOSE) library(org.Hs.eg.db) library(ggplot2) library(ggraph) library(cowplot) library(UpSetR) library(enrichplot) CRANpkg <- function (pkg) { cran <- "https://CRAN.R-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function (pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } ``` The `r Biocpkg("enrichplot")` package implements several visualization methods to help interpreting enrichment results. It supports visualizing enrichment results obtained from `r Biocpkg("DOSE")` [@yu_dose_2015], `r Biocpkg("clusterProfiler")` [@yu_clusterprofiler_2012], `r Biocpkg("ReactomePA")` [@yu_reactomepa_2016] and `r Biocpkg("meshes")`. Both over representation analysis (ORA) and gene set enrichment analysis (GSEA) are supported. # Enrichment Analysis ## Over Representation Analysis ```{r fig.width=12, fig.height=8} library(DOSE) data(geneList) de <- names(geneList)[abs(geneList) > 2] edo <- enrichDGN(de) ``` ## Gene Set Enrichment Analysis ```{r message=FALSE} edo2 <- gseNCG(geneList, nPerm=10000) ``` # Visualization methods ## Bar plot Bar plot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (*e.g.* p values) and gene count or ratio as bar height and color. ```{r fig.width=12, fig.height=8} barplot(edo, showCategory=20) ``` ## Dot plot Dot plot is similar to bar plot with the capability to encode another score as dot size. ```{r fig.width=12, fig.height=8} p1 <- dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA") p2 <- dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA") plot_grid(p1, p2, ncol=2) ``` Users can use formula to specify derived variable of x-axis. ```{r} N <- as.numeric(sub("\\d+/", "", edo[1, "BgRatio"])) N dotplot(edo, showCategory=15, x = ~Count/N) + ggplot2::xlab("Rich Factor") ``` ## Gene-Concept Network Both the `barplot` and `dotplot` only displayed most significant enriched terms, while users may want to know which genes are involved in these significant terms. The `cnetplot` depicts the linkages of genes and biological concepts (*e.g.* GO terms or KEGG pathways) as a network. GSEA result is also supported with only core enriched genes displayed. ```{r fig.width=12, fig.height=8} ## convert gene ID to Symbol edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID') cnetplot(edox, foldChange=geneList) cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE) ``` ## UpSet Plot The `upsetplot` is an alternative to `cnetplot` for visualizing the complex association between genes and gene sets. It emphasizes the gene overlapping among different gene sets. ```{r fig.width=12, fig.height=5} upsetplot(edo) ``` ## Heatmap-like functional classification The `heatplot` is similar to `cnetplot`, while displaying the relationships as a heatmap. The gene-concept network may become too complicated if user want to show a large number significant terms. The `heatplot` can simplify the result and more easy to identify expression patterns. ```{r fig.width=16, fig.height=4} heatplot(edox) heatplot(edox, foldChange=geneList) ``` ## Enrichment Map Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module. ```{r fig.width=12, fig.height=10} emapplot(edo) ``` ## ridgeline plot for expression distribution of GSEA result The `ridgeplot` will visualize expression distributions of core enriched genes for GSEA enriched categories. It helps users to interpret up/down-regulated pathways. ```{r fig.width=12, fig.height=8, message=FALSE} ridgeplot(edo2) ``` ## running score and preranked list of GSEA result Running score and preranked list are traditional methods for visualizing GSEA result. The `r Biocpkg("enrichplot")` package supports both of them to visualize the distribution of the gene set and the enrichment score. ```{r fig.width=12, fig.height=4} gseaplot(edo2, geneSetID = 1, by = "runningScore", title = edo2$Description[1]) gseaplot(edo2, geneSetID = 1, by = "preranked", title = edo2$Description[1]) ``` ```{r fig.width=12, fig.height=8} gseaplot(edo2, geneSetID = 1, title = edo2$Description[1]) ``` Another method to plot GSEA result is the `gseaplot2` function: ```{r fig.width=12, fig.height=8} gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1]) ``` The `gseaplot2` also supports multile gene sets to be displayed on the same figure: ```{r fig.width=12, fig.height=8} gseaplot2(edo2, geneSetID = 1:3) ``` User can also displaying the pvalue table on the plot via `pvalue_table` parameter: ```{r fig.width=12, fig.height=8} gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE, color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot") ``` User can specify `subplots` to only display a subset of plots: ```{r fig.width=12, fig.height=4} gseaplot2(edo2, geneSetID = 1:3, subplots = 1) ``` ```{r fig.width=12, fig.height=8} gseaplot2(edo2, geneSetID = 1:3, subplots = 1:2) ``` The `gsearank` function plot the ranked list of genes belong to the specific gene set. ```{r fig.width=8, fig.height=4} gsearank(edo2, 1, title = edo2[1, "Description"]) ``` Multiple gene sets can be aligned using `cowplot`: ```{r fig.width=8, fig.height=6} pp <- lapply(1:3, function(i) { anno <- edo2[i, c("NES", "pvalue", "p.adjust")] lab <- paste0(names(anno), "=", round(anno, 3), collapse="\n") gsearank(edo2, i, edo2[i, 2]) + xlab(NULL) +ylab(NULL) + annotate("text", 0, edo2[i, "enrichmentScore"] * .9, label = lab, hjust=0, vjust=0) }) plot_grid(plotlist=pp, ncol=1) ``` ## pubmed trend of enriched terms One of the problem of enrichment analysis is to find pathways for further investigation. Here, we provide `pmcplot` function to plot the number/proportion of publications trend based on the query result from PubMed Central. Of course, users can use `pmcplot` in other scenarios. All text that can be queried on PMC is valid as input of `pmcplot`. ```{r fig.width=12, fig.height=4} terms <- edo$Description[1:3] p <- pmcplot(terms, 2010:2017) p2 <- pmcplot(terms, 2010:2017, proportion=FALSE) plot_grid(p, p2, ncol=2) ``` # References