---
title: "Visualization of Functional Enrichment Result"
author: "\\

	Guangchuang Yu\\

        School of Basic Medical Sciences, Southern Medical University"
date: "`r Sys.Date()`"
bibliography: enrichplot.bib
biblio-style: apalike
output:
  prettydoc::html_pretty:
    toc: true
    theme: cayman
    highlight: github
  pdf_document:
    toc: true
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{enrichplot introduction}
  %\VignetteDepends{ggplot2}
  %\VignetteDepends{ggraph}
  %\usepackage[utf8]{inputenc}
---


```{r include=FALSE}
knitr::opts_chunk$set(warning = FALSE,
                      message = TRUE)

library(DOSE)
library(org.Hs.eg.db)
library(ggplot2)
library(ggraph)
library(cowplot)
library(UpSetR)
library(enrichplot)

CRANpkg <- function (pkg) {
    cran <- "https://CRAN.R-project.org/package"
    fmt <- "[%s](%s=%s)"
    sprintf(fmt, pkg, cran, pkg)
}

Biocpkg <- function (pkg) {
    sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg)
}
```

The `r Biocpkg("enrichplot")` package implements several visualization methods
to help interpreting enrichment results. It supports visualizing enrichment
results obtained from `r Biocpkg("DOSE")` [@yu_dose_2015],
`r Biocpkg("clusterProfiler")` [@yu_clusterprofiler_2012],
`r Biocpkg("ReactomePA")` [@yu_reactomepa_2016] and `r Biocpkg("meshes")`. Both
over representation analysis (ORA) and gene set enrichment analysis (GSEA) are
supported.

# Enrichment Analysis


## Over Representation Analysis

```{r fig.width=12, fig.height=8}
library(DOSE)
data(geneList)
de <- names(geneList)[abs(geneList) > 2]

edo <- enrichDGN(de)
```

## Gene Set Enrichment Analysis

```{r message=FALSE}
edo2 <- gseNCG(geneList, nPerm=10000)
```

# Visualization methods

## Bar plot


Bar plot is the most widely used method to visualize enriched terms. It depicts
the enrichment scores (*e.g.* p values) and gene count or ratio as bar height
and color.

```{r fig.width=12, fig.height=8}
barplot(edo, showCategory=20)
```

## Dot plot


Dot plot is similar to bar plot with the capability to encode another score as
dot size.

```{r fig.width=12, fig.height=8}
p1 <- dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
p2 <- dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA")
plot_grid(p1, p2, ncol=2)
```

Users can use formula to specify derived variable of x-axis.


```{r}
N <- as.numeric(sub("\\d+/", "", edo[1, "BgRatio"]))
N
dotplot(edo, showCategory=15, x = ~Count/N) + ggplot2::xlab("Rich Factor")
```


## Gene-Concept Network

Both the `barplot` and `dotplot` only displayed most significant enriched terms,
while users may want to know which genes are involved in these significant
terms. The `cnetplot` depicts the linkages of genes and biological concepts
(*e.g.* GO terms or KEGG pathways) as a network. GSEA result is also supported
with only core enriched genes displayed.

```{r fig.width=12, fig.height=8}
## convert gene ID to Symbol
edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID')
cnetplot(edox, foldChange=geneList)
cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE)
```

## UpSet Plot


The `upsetplot` is an alternative to `cnetplot` for visualizing the complex
association between genes and gene sets. It emphasizes the gene overlapping
among different gene sets.

```{r fig.width=12, fig.height=5}
upsetplot(edo)
```

## Heatmap-like functional classification


The `heatplot` is similar to `cnetplot`, while displaying the relationships as a
heatmap. The gene-concept network may become too complicated if user want to
show a large number significant terms. The `heatplot` can simplify the result
and more easy to identify expression patterns.

```{r fig.width=16, fig.height=4}
heatplot(edox)
heatplot(edox, foldChange=geneList)
```


## Enrichment Map


Enrichment map organizes enriched terms into a network with edges connecting
overlapping gene sets. In this way, mutually overlapping gene sets are tend to
cluster together, making it easy to identify functional module.


```{r fig.width=12, fig.height=10}
emapplot(edo)
```

## ridgeline plot for expression distribution of GSEA result

The `ridgeplot` will visualize expression distributions of core enriched genes
for GSEA enriched categories. It helps users to interpret up/down-regulated pathways.

```{r fig.width=12, fig.height=8, message=FALSE}
ridgeplot(edo2)
```


## running score and preranked list of GSEA result

Running score and preranked list are traditional methods for visualizing GSEA
result. The `r Biocpkg("enrichplot")` package supports both of them to visualize
the distribution of the gene set and the enrichment score.


```{r fig.width=12, fig.height=4}
gseaplot(edo2, geneSetID = 1, by = "runningScore", title = edo2$Description[1])
gseaplot(edo2, geneSetID = 1, by = "preranked", title = edo2$Description[1])
```

```{r fig.width=12, fig.height=8}
gseaplot(edo2, geneSetID = 1, title = edo2$Description[1])
```

Another method to plot GSEA result is the `gseaplot2` function:

```{r fig.width=12, fig.height=8}
gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])
```

The `gseaplot2` also supports multile gene sets to be displayed on the same figure:

```{r fig.width=12, fig.height=8}
gseaplot2(edo2, geneSetID = 1:3)
```

User can also displaying the pvalue table on the plot via `pvalue_table`
parameter:

```{r fig.width=12, fig.height=8}
gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE,
          color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")
```


User can specify `subplots` to only display a subset of plots:

```{r fig.width=12, fig.height=4}
gseaplot2(edo2, geneSetID = 1:3, subplots = 1)
```

```{r fig.width=12, fig.height=8}
gseaplot2(edo2, geneSetID = 1:3, subplots = 1:2)
```


The `gsearank` function plot the ranked list of genes belong to the specific
gene set.

```{r fig.width=8, fig.height=4}
gsearank(edo2, 1, title = edo2[1, "Description"])
```

Multiple gene sets can be aligned using `cowplot`:

```{r fig.width=8, fig.height=6}
pp <- lapply(1:3, function(i) {
    anno <- edo2[i, c("NES", "pvalue", "p.adjust")]
    lab <- paste0(names(anno), "=",  round(anno, 3), collapse="\n")

    gsearank(edo2, i, edo2[i, 2]) + xlab(NULL) +ylab(NULL) +
        annotate("text", 0, edo2[i, "enrichmentScore"] * .9, label = lab, hjust=0, vjust=0)
})
plot_grid(plotlist=pp, ncol=1)
```


## pubmed trend of enriched terms

One of the problem of enrichment analysis is to find pathways for further
investigation. Here, we provide `pmcplot` function to plot the number/proportion
of publications trend based on the query result from PubMed Central. Of course,
users can use `pmcplot` in other scenarios. All text that can be queried on PMC
is valid as input of `pmcplot`.


```{r fig.width=12, fig.height=4}
terms <- edo$Description[1:3]
p <- pmcplot(terms, 2010:2017)
p2 <- pmcplot(terms, 2010:2017, proportion=FALSE)
plot_grid(p, p2, ncol=2)
```


# References