---
title: >
The `GeneTonic` User's Guide
author:
- name: Federico Marini
affiliation:
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
- Center for Thrombosis and Hemostasis (CTH), Mainz
email: marinif@uni-mainz.de
- name: Annekathrin Ludt
affiliation:
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
email: anneludt@uni-mainz.de
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('GeneTonic')`"
output:
BiocStyle::html_document:
toc_float: true
vignette: >
%\VignetteIndexEntry{The GeneTonic User's Guide}
%\VignetteEncoding{UTF-8}
%\VignettePackage{GeneTonic}
%\VignetteKeywords{GeneExpression, RNASeq, FunctionalAnnotation, Sequencing, Visualization, QualityControl, GUI}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
bibliography: GeneTonic.bib
---
**Compiled date**: `r Sys.Date()`
**Last edited**: 2022-04-04
**License**: `r packageDescription("GeneTonic")[["License"]]`
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
error = FALSE,
warning = FALSE,
eval = TRUE,
message = FALSE,
fig.width = 10
)
options(width = 100)
stopifnot(requireNamespace("htmltools"))
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
```
# Introduction {#introduction}
This vignette describes how to use the `r BiocStyle::Biocpkg("GeneTonic")` package for analyzing and integrating the results from Differential Expression analysis and functional enrichment analysis.
This package provides a Shiny application that aims to combine at different levels the existing pieces of the transcriptome data and results, in a way that makes it easier to generate insightful observations and hypothesis - combining the benefits of interactivity and reproducibility, e.g. by capturing the features and gene sets of interest highlighted during the live session, and creating an HTML report as an artifact where text, code, and output coexist.
In order to use `r BiocStyle::Biocpkg("GeneTonic")` in your workflow, the following inputs are required:
- `dds`, a `DESeqDataSet` containing the expression matrix;
- `res_de`, a `DESeqResults`, i.e. a `DataFrame` storing the results of the differential expression analysis;
- `res_enrich`, a `data.frame` with the results of functional enrichment analysis;
- `annotation_obj`, a `data.frame` with the correspondence between identifiers for the features under inspection in the `dds` object.
This workflow has mainly been tested with expression matrices with Ensembl identifiers, which are consistent and unambiguous across different releases, and results from functional enrichment analysis originating from ORA (OverRepresentation Analysis) methods, on the Gene Ontology signature database.
It may need to be slightly adjusted to work with other formats and sources, but the core functionality will remain available.
In the remainder of this vignette, we will illustrate the main features of `r BiocStyle::Biocpkg("GeneTonic")` on a publicly available dataset from Alasoo, et al. "Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response", published in Nature Genetics, January 2018 [@Alasoo2018]
[doi:10.1038/s41588-018-0046-7](https://doi.org/10.1038/s41588-018-0046-7).
The data is made available via the `r BiocStyle::Biocpkg("macrophage")` Bioconductor package, which contains the files output from the Salmon quantification (version 0.12.0, with Gencode v29 reference), as well as the values summarized at the gene level, which we will use to exemplify.
In the `macrophage` experimental setting, the samples are available from 6 different donors, in 4 different conditions (naive, treated with Interferon gamma, with SL1344, or with a combination of Interferon gamma and SL1344).
We will restrict our attention on the comparison between Interferon gamma treated samples vs naive samples.
# Getting started {#gettingstarted}
To install this package, start R and enter:
```{r install, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GeneTonic")
```
Once installed, the package can be loaded and attached to your current workspace as follows:
```{r loadlib, eval = TRUE}
library("GeneTonic")
```
If you have all four input elements ready, you can launch the `GeneTonic()` app by running:
```{r launchapp, eval=FALSE}
GeneTonic(dds = dds_object,
res_de = res_de_object,
res_enrich = res_enrich_object,
annotation_obj = annotation_object,
project_id = "myFirstGeneTonic")
```
In this vignette, we showcase the functionality of `r BiocStyle::Biocpkg("GeneTonic")` using the gene-level summarized version of the `macrophage` dataset.
If you want to dive in and start playing with the app immediately, you can simply run:
```{r examplerun, eval=FALSE}
example("GeneTonic", ask = FALSE)
```
Otherwise, you can follow the next chunks of code to generate the required input objects, step by step.
In case some formatting requirements are expected, these will be specified in the text/comments, and they will be checked internally upon launching the `GeneTonic` app.
For running `GeneTonic`, you will need the four components mentioned above as parameters.
1. `dds`: First you will need a `DESeqDataSet`, which is the main component in the `DESeq2` framework and extends the widely adopted `SummarizedExperiment` class.
This object will store the information related to the expression matrix.
```{r create_dds, eval=TRUE}
library("macrophage")
library("DESeq2")
data("gse", package = "macrophage")
dds_macrophage <- DESeqDataSet(gse, design = ~line + condition)
# changing the ids to Ensembl instead of the Gencode used in the object
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)
dds_macrophage
```
2. `res_de`: Next you are going to need the results of Differential Expression analysis, computed on the `dds_macrophage` object you just obtained.
The expected format is a `DESeqResults` object.
Here we are going to contrast two different conditions, `IFNg` and `naive`, while controlling for the cell line of origin (which has 6 levels, namely `r knitr::combine_words(levels(dds_macrophage$condition))`).
We start by filtering lowly expressed features (at least 10 counts in at least 6 samples - 6 being the size of the smallest group).
Then, we test against a null hypothesis of a log2FoldChange of 1 (instead of the default value of 0), in order to specify that we want to call DE genes with a consistent *and* robust expression change.
Finally, we add the gene symbol to the output `DataFrame`.
```{r create_resde1, eval = TRUE}
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
dds_macrophage <- dds_macrophage[keep, ]
dds_macrophage
```
```{r create_resde2, eval = FALSE}
dds_macrophage <- DESeq(dds_macrophage)
# vst_macrophage <- vst(dds_macrophage)
res_macrophage_IFNg_vs_naive <- results(dds_macrophage,
contrast = c("condition", "IFNg", "naive"),
lfcThreshold = 1, alpha = 0.05)
res_macrophage_IFNg_vs_naive$SYMBOL <- rowData(dds_macrophage)$SYMBOL
```
```{r load_resde, eval=TRUE}
## To speed up the operations in the vignette, we can also load this object directly
data("res_de_macrophage")
head(res_macrophage_IFNg_vs_naive)
```
3. `res_enrich`: With the DE results of the previous step, we are going to extract the vector of DE genes (via `deseqresult2df`), as well as the list of genes to be used as background, and feed these two objects into a function that computes the functional enrichment of the DE genes.
We are going to use the ORA method implemented in `r BiocStyle::Biocpkg("topGO")`, wrapped in the function `topGOtable` available in the `r BiocStyle::Biocpkg("pcaExplorer")` package, which by default uses the `BP` ontology and the `elim` method to decorrelate the GO graph structure and deliver less redundant functional categories.
You can also use as an alternative the `enrichGO` function from `r BiocStyle::Biocpkg("clusterProfiler")`.
```{r create_resenrich1, eval=TRUE}
library("AnnotationDbi")
de_symbols_IFNg_vs_naive <- deseqresult2df(res_macrophage_IFNg_vs_naive, FDR = 0.05)$SYMBOL
bg_ids <- rowData(dds_macrophage)$SYMBOL[rowSums(counts(dds_macrophage)) > 0]
```
```{r create_resenrich2, eval=FALSE}
library("topGO")
topgoDE_macrophage_IFNg_vs_naive <-
pcaExplorer::topGOtable(de_symbols_IFNg_vs_naive,
bg_ids,
ontology = "BP",
mapping = "org.Hs.eg.db",
geneID = "symbol",
topTablerows = 500)
```
```{r load_resenrich, eval=TRUE}
## To speed up the operations in the vignette, we also load this object directly
data("res_enrich_macrophage")
head(topgoDE_macrophage_IFNg_vs_naive, 2)
```
Since in this step the object can be created from different methods, but the `GeneTonic` main functions require some specific columns to be present, we have created some functions to convert the output of the upstream tools into a format compatible with `GeneTonic`.
The functions `shake_enrichResult()` and `shake_topGOtableResult()` serve this purpose - if your favorite tool might be different, you can write your own conversion function and contribute it to `GeneTonic`.
```{r convert_resenrich, eval=TRUE}
res_enrich_macrophage <- shake_topGOtableResult(topgoDE_macrophage_IFNg_vs_naive)
colnames(res_enrich_macrophage)
```
4. `annotation_obj`: Last object required is the annotation data frame, i.e. a simple data frame composed at least of two columns, `gene_id`, with a set of unambiguous identifiers (e.g. ENSEMBL ids) corresponding to the row names of the `dds` object, and `gene_name`, containing e.g. HGNC-based gene symbols.
This object can be constructed via the `org.eg.XX.db` packages, e.g. with convenience functions such as `pcaExplorer::get_annotation_orgdb()`.
Here we display the general way of obtaining this object.
```{r create_anno, eval=TRUE}
library("org.Hs.eg.db")
anno_df <- data.frame(
gene_id = rownames(dds_macrophage),
gene_name = mapIds(org.Hs.eg.db, keys = rownames(dds_macrophage), column = "SYMBOL", keytype = "ENSEMBL"),
stringsAsFactors = FALSE,
row.names = rownames(dds_macrophage)
)
## alternatively:
# anno_df <- pcaExplorer::get_annotation_orgdb(dds_macrophage, "org.Hs.eg.db", "ENSEMBL")
```
For some operations in the main app and in the single functions, sometimes it is required to have aggregated scores for the `res_enrich` data frame.
We can do so by calling `get_aggrscores()` on the combination of objects we just generated.
This adds two columns to the provided `res_enrich` object, `z_score` and `aggr_score`, which summarize at the gene set level the effect (log2FoldChange) of the differentially expressed genes which are its members.
In particular, the z score attempts to determine the "direction" of change, computed as $z = \frac{(upgenes - downgenes)}{\sqrt{(upgenes + downgenes)}}$, regardless of the effect size of the single members of the gene set.
```{r aggr_enrich, eval=TRUE}
res_enrich_macrophage <- get_aggrscores(res_enrich = res_enrich_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
annotation_obj = anno_df,
aggrfun = mean)
```
## All set!
Once you have the four objects specified above, you can set the `project_id` to be a character string at wish, and you can just launch the `GeneTonic()` app:
```{r dryrun, eval=FALSE}
GeneTonic(dds = dds_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df,
project_id = "GT1")
```
Alternatively (this applies from version >= 1.3.0), it is possible to provide a single list object to `GeneTonic()` and many of its functions.
Please note that the arguments should be named as specified in the example below.
```{r}
gtl <- GeneTonicList(
dds = DESeq2::estimateSizeFactors(dds_macrophage),
res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df
)
# if nothing is returned, the object is ready to be provided to GeneTonic
checkup_gtl(gtl)
if (interactive()) {
GeneTonic(gtl = gtl)
}
```
From version 2.0.0 onwards, `GeneTonic` offers the possibility to upload the `GeneTonicList` objects at runtime - these need to be provided as serialized RDS objects (which can be generated when calling `saveRDS()`, or alternatively exported from within the `GeneTonic` app).
This means it is possible to use `GeneTonic` also as a server-like application for internal deployment. An exemplary `app.R` file to be used on a Shiny Server could be as simple as
```
max_gtl_size <- 250 ## the max size in Megabytes for the gtl object to be uploaded
library("GeneTonic")
GeneTonic(size_gtl = max_gtl_size)
```
To upload your `GeneTonicList` object, click on the "Browse" widget and select the specific RDS file from the explorer window that will open up.
If the upload was successful, `GeneTonic` will make all the panels available, ready to be interactively explored.
If providing the data in a wrong or unexpected format, the app should be able to recognize this and notify the users in the lower right corner.
```{r, fig.cap="Screenshot of the `GeneTonic` application, when launched as a server where users can directly upload `GeneTonicList` objects. Information on the format and content for this object type are provided in the collapsible element on the right side of the dashboard body.", echo=FALSE}
knitr::include_graphics("upload_gtl.png")
```
# Description of the `GeneTonic` user interface {#userinterface}
The `GeneTonic` app is built with `r BiocStyle::CRANpkg("shiny")`, and its layout is built around the components of `r BiocStyle::CRANpkg("bs4Dash")`, a modern looking dashboard based on Bootstrap 4.
## Header (navbar)
The dashboard navbar (as it is called in `bs4Dash`) contains two dropdown menus, which can be expanded by clicking on the respective icons ( and ).
These provide additional buttons to do one the following:
- - inspect the `r BiocStyle::Biocpkg("GeneTonic")` vignette (either the one installed in bundle with the package, or the online version)
- - read some first help information on `GeneTonic` (what the package does in brief, what is required to run properly)
- - display information about the current session (via `sessionInfo()`)
- - show some general information on `r BiocStyle::Biocpkg("GeneTonic")`, like where to find its development version (and contribute to that), or how to cite this tool.
Additionally, one fundamental button is present in the navbar - labeled "Bookmark", with the heart icon.
This serves the purpose of bookmarking genes and gene sets of interest while exploring your data in an interactive session.
This functionality will be exploited in the Bookmarks section of the app, where a complete HTML report can be generated as a reproducible artifact.
## Sidebar
On the left side of the app, by clicking on the menu bar icon (or if the app is viewed in full screen, by simply moving the mouse over to the left side), the sidebar menu is triggered.
This constitutes the main way to access the different tabs of `GeneTonic()`, which will be explained in more detail in the next section (\@ref(functionality)).
## Controlbar
The controlbar is triggered by clicking on the cogs icon on the upper right part of the navbar.
There you can find widgets that control the appearance of output (plots, tables, graphs) in multiple tabs of the main application, such as for example the number of gene sets to focus your attention on.
## Body
The main body of the `GeneTonic` application is structured in tabs that are activated by clicking on the respective icons or text in the sidebar.
While the Welcome tab might be self-explanatory, the functionality of each tab can go in depth, and new users can benefit of the question circle button () button to trigger an interactive tour of `GeneTonic`, which allows users to learn the basic usage mechanisms by doing.
During a tour, the highlighted elements respond to the user's actions, while the rest of the UI is kept shaded.
Tours can be interrupted anytime by clicking outside of the focused window, and arrows (left, right) can be used as well to navigate between steps.
The tour functionality is provided by the beautiful `r BiocStyle::CRANpkg("rintrojs")` package.
# The `GeneTonic` functionality {#functionality}
The main `GeneTonic` app is structured in different panels, and each of these is presented in detail in the subsequent sections.
## The Welcome panel
This panel in intended to give an overview on the different ingredients you just provided for launching `GeneTonic`.
Collapsible elements, colored each by the typology of data it refers to, contain the underlying tabular information, and a value box provides a quick summary on the dimensions of the input parameters.
The tables are rendered with `DT::datatable`, which provides a practical interface for exploring, sorting, and filtering.
This framework is also extendable, for example as we do for the `log2FoldChange` column in the `res_de` object, where the values are also visually encoded as diverging bars of different color based on the expression change direction.
In the Welcome panel, the GeneSpector box will also enable users to explore a gene expression plot for any of the genes in the `dds` object, irrespective of them being members or not of any geneset.
To do so, select a gene from the dropdown menu (autocompletion is supported), and a covariate from the "Group/color by" selector in the control menu on the right (opened by clicking on the cogs icon).
If launched without providing input data (as individual components, or as `GeneTonicList` object), `GeneTonic` offers the possibility to upload the `GeneTonicList` objects at runtime.
The upload functionality is showing in this panel, and enables `GeneTonic` being used also as a server-like application for internal deployment.
## The Gene-Geneset panel
This panel has as its main component an interactive bipartite graph illustrating the relationship between genesets and genes that belong to them.
One of the relevant parameters for this output is the number of gene sets to be displayed, adjustable from the widget in the control panel on the right.
Such a representation is particularly useful to see overlap between terms, and grasp the way how different genes may play a concerting role in different processes.
The color and shape of the nodes depend on the entity they are related to:
- genes are shown as ellipses, colored with a palette diverging from a neutral color to red (log2FC > 0) or to blue (log2FC < 0)
- gene sets are enclosed in gold boxes
The interactivity in this graph is essential for zooming, panning, and selecting its nodes.
The node selection, in particular, triggers a drill-down exploration, which populates a box on the right side, either with a signature heatmap (for a gene set), or with an expression plot, split by the experimental covariates of interest (for a gene) - this behavior can be controlled by the "Group/color by" selectize widget in the control sidebar.
In both cases, some HTML content is also generated, with buttons linking to external databases where to retrieve additional information (e.g. ENSEMBL, NCBI, AmiGO, ...).
A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.
**Bookmarking from the Gene-Geneset panel:** When clicking on the Bookmark button while in this tab, it is possible to add to the shortlisted items of interest both genes and gene sets.
To do so, while one of the nodes is selected, click on the button with the mouse - or alternatively, click on the Left Ctrl button of your keyboard.
A notification will be triggered, informing the status of the bookmarked elements.
## The Enrichment Map panel
This panel, similar to the Gene-Geneset tab, has again an interactive graph of a subset of the gene sets (you can control again the number of included sets from the control sidebar).
This is also called an enrichment map - it is also possible to generate them in tools such as Cytoscape.
In this case, the genesets are directly connected by an edge, where the thickness encodes the degree of similarity (e.g. in terms of overlap) between them; sets below a similarity threshold are not displayed for the sake of better clarity.
The size of the node encodes the information of the number of genes detected as Differentially Expressed (from the column `gs_de_count`), while the color is representative of the computed Z score for each set.
This value is computed as $z = \frac{(upgenes - downgenes)}{\sqrt{(upgenes + downgenes)}}$ and is indicative of the expression changes of the genes assigned to a particular set.
Please, keep in mind that methods based on overrepresentation analysis only do not account for the topology that might be behind the set of interacting features (i.e. an upregulated gene involved in inhibition of a pathway would still be counted as a positive element in the equation above).
The `z_score` and `aggr_score` (in the simplest form, a mean of the log2FoldChange values for all the genes of a set) can be easily computed with the `get_aggrscores()`, which appends these scores for each gene set to the original `res_enrich` data frame.
The interactive flavor in this tab permits a quick exploration of the signature heatmaps for the genesets of interest by simply clicking on a node; this also triggers the display of some HTML content, if related to a Gene Ontology term.
A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.
In the section below the interactive enrichment map, the functionality for the _distillation_ of gene sets into meta-genesets is provided.
In brief, a community detection algorithm is run on the graph underlying the enrichment map, and the additional information extracted is rendered as a table, linked to a meta-geneset box to display the expression signature for all its member genes.
**Bookmarking from the Enrichment Map panel:** If the user clicks on the Bookmark button while in this tab, it will add the selected node to the shortlist of genesets, visible later in the Bookmarks panel.
The Left Ctrl button of your keyboard can again be used for this purpose, and this will trigger a notification in the lower right part of the screen.
## The Overview panel
This panel presents to the user a series of summary plots, based on the outputs of `enhance_table()` and `gs_volcano()`, aiming to provide a mean to summarize the most relevant genesets in the `res_enrich` object.
The number of displayed (or labeled, according to each function) genesets can be as usual controlled by the widget in the right sidebar.
We avoid (re)rendering all plots at once by placing them as content of different tabs in a `bs4TabCard` container - this allows the user to navigate in a more agile way to the desired output elements.
Interactive versions of the plots are returned via calls to `ggplotly`.
A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.
## The GSViz panel
This panel, similarly to the Overview panel, presents a variety of plots visualizing the gene sets - and their mutual relationships, or their behavior across samples.
This includes, for example:
- a genesets by sample heatmap (via `gs_scores()` and `gs_scoresheat()`), with a score for each matrix element corresponding to the sum of the standardized expression values of its components
- a genesets-gene Sankey diagram (via `gs_alluvial()`, aliased to `gs_sankey()`), to represent the N:N membership relationships
- a summary heatmap (via `gs_summary_heat()`) to display the memberships and at the same time the expression change of each gene set member
- a multi-dimensional scaling (MDS) plot, based on `gs_mds()`, to provide a 2d visualization of the distance among genesets (based on a similarity measure e.g. their overlap or other criteria, such as their semantic similarity)
- a compact summary of `res_enrich`, via `gs_summary_overview()`, where aggregated scores and significance can be displayed on the same plot - for this, there are functions to compare different `res_enrich` objects (`gs_summary_overview()`, `gs_summary_overview_pair()`)
- alternative visual representation of the `res_enrich` objects, based on `gs_radar()` (equivalent to `gs_spider()`) and `gs_dendro()`
As in many other panels, the control over the number of genesets displayed is tweaked with the widget in the right sidebar, "Number of genesets".
A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.
## The Bookmarks panel
In this panel, the user can obtain a complete overview on the bookmarked features of interest, both genes (left side) and genesets (right side).
During a live exploration session, it might be tricky to recall the exact aspect of each point of interest.
Below the interactive `DT::datatable` objects, the content of each table can also be separately downloaded by clicking on the respective button.
Moreover, there is a button (with the cocktail icon ) that triggers the generation of a content-rich report which will focus on all the shortlisted elements.
This leverages a template report delivered as a part of `r BiocStyle::Biocpkg("GeneTonic")`, which accesses the input elements and the reactive values for the bookmarks, based on the `happy_hour()` function.
If a user already knows a set of features and geneset of interest, it is also possible to practically call `happy_hour()` directly from the command line (see the example in the chunk below).
Please keep in mind that the identifiers to be used should match the values used as row names of the `dds` object and as `gs_id` column of the original `res_enrich` input.
```{r starthappyhour, eval = FALSE}
happy_hour(dds = dds_macrophage,
res_de = res_de,
res_enrich = res_enrich,
annotation_obj = anno_df,
project_id = "examplerun",
mygenesets = res_enrich$gs_id[c(1:5,11,31)],
mygenes = c("ENSG00000125347",
"ENSG00000172399",
"ENSG00000137496")
)
```
This report will serve as a useful means to generate a permanent and reproducible analysis output, that can be further stored or shared.
Ideally, each live session with `GeneTonic` could terminate with a call to `happy_hour()` - admittedly, an aperitivo should start with that, and not end - please accept our apologies :) .
A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.
In this panel, an export button to a `SummarizedExperiment` object to be provided as input to the `r BiocStyle::Biocpkg("iSEE")` package has been added in version >= 1.2.0.
This enables further and fine-grained visualization options in the `iSEE` framework - if the flavors of `GeneTonic` are not yet what you would want, you might find an excellent venue there.
# Using the `GeneTonic` functions {#functions}
The `r BiocStyle::Biocpkg("GeneTonic")` package provides, apart from the main app where to perform the analysis in an interactive way, a number of functions which can be used also as standalone components for a workflow.
This sections contains some showcase examples on how to use them.
## Overview functions: genes and gene sets
Summarize the top genesets by displaying the logFC of each set's components - and transform the `ggplot` output into a `plotly` visualization, where tooltips can deliver additional information:
```{r enhancetable}
p <- enhance_table(res_enrich_macrophage,
res_macrophage_IFNg_vs_naive,
n_gs = 30,
annotation_obj = anno_df,
chars_limit = 60)
p
library("plotly")
ggplotly(p)
```
Display the relationship between genesets and genes that belong to them (useful e.g. to see overlap between terms):
```{r alluvial}
gs_alluvial(res_enrich = res_enrich_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
annotation_obj = anno_df,
n_gs = 4)
```
Display a bipartite graph where genesets and genes are included, and make it interactive:
```{r ggs}
ggs <- ggs_graph(res_enrich_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
anno_df,
n_gs = 20)
ggs
# could be viewed interactively with
library(visNetwork)
library(magrittr)
ggs %>%
visIgraph() %>%
visOptions(highlightNearest = list(enabled = TRUE,
degree = 1,
hover = TRUE),
nodesIdSelection = TRUE)
```
## Summary representations
Plot an enrichment map of the enrichment results, where you can choose with `n_gs` the number of top genesets to be displayed, and specify in `gs_ids` the gene set identifiers - these will be merged together for each function's context:
```{r summaryrep}
em <- enrichment_map(res_enrich_macrophage,
res_macrophage_IFNg_vs_naive,
n_gs = 30,
color_by = "z_score",
anno_df)
library("igraph")
library("visNetwork")
library("magrittr")
em %>%
visIgraph() %>%
visOptions(highlightNearest = list(enabled = TRUE,
degree = 1,
hover = TRUE),
nodesIdSelection = TRUE)
```
Expanding on the enrichment map, the distilled content of the `res_enrich` can be represented as enhanced tables and interactive graphs, depicting the membership of the respective meta-genesets
```{r}
distilled <- distill_enrichment(res_enrich_macrophage,
res_macrophage_IFNg_vs_naive,
anno_df,
n_gs = 60,
cluster_fun = "cluster_markov")
DT::datatable(distilled$distilled_table[,1:4])
dim(distilled$distilled_table)
DT::datatable(distilled$res_enrich[,])
dg <- distilled$distilled_em
library("igraph")
library("visNetwork")
library("magrittr")
# defining a color palette for nicer display
colpal <- colorspace::rainbow_hcl(length(unique(V(dg)$color)))[V(dg)$color]
V(dg)$color.background <- scales::alpha(colpal, alpha = 0.8)
V(dg)$color.highlight <- scales::alpha(colpal, alpha = 1)
V(dg)$color.hover <- scales::alpha(colpal, alpha = 0.5)
V(dg)$color.border <- "black"
visNetwork::visIgraph(dg) %>%
visOptions(highlightNearest = list(enabled = TRUE,
degree = 1,
hover = TRUE),
nodesIdSelection = TRUE,
selectedBy = "membership")
```
Another possibility to summarize the result of enrichment is by applying fuzzy clustering on it to detect groups of related genesets, with the peculiarity that a geneset can belong to different clusters - this reflects the original implementation of DAVID [@Huang2009].
The `gs_fuzzyclustering()` function returns a table with additional columns, related to the cluster id and the status for the geneset in that cluster ("Representative" or "Member", where the Representative geneset is the one with lowest p-value in that cluster).
```{r}
res_enrich_subset <- res_enrich_macrophage[1:100, ]
fuzzy_subset <- gs_fuzzyclustering(
res_enrich = res_enrich_subset,
n_gs = nrow(res_enrich_subset),
gs_ids = NULL,
similarity_matrix = NULL,
similarity_threshold = 0.35,
fuzzy_seeding_initial_neighbors = 3,
fuzzy_multilinkage_rule = 0.5)
# show all genesets members of the first cluster
fuzzy_subset[fuzzy_subset$gs_fuzzycluster == "1", ]
# list only the representative clusters
DT::datatable(
fuzzy_subset[fuzzy_subset$gs_cluster_status == "Representative", ]
)
```
Display a volcano plot for genesets, plotting significance (y-axis) versus Z score (x-axis), with color and size encoding some geneset features:
```{r volcano}
gs_volcano(res_enrich_macrophage,
p_threshold = 0.05,
color_by = "aggr_score",
volcano_labels = 10,
gs_ids = NULL,
plot_title = "my volcano")
res_enrich_simplified <- gs_simplify(res_enrich_macrophage,
gs_overlap = 0.7)
dim(res_enrich_macrophage)
dim(res_enrich_simplified)
gs_volcano(res_enrich_simplified,
color_by = "aggr_score")
```
Plot a dendrogram of the enrichment results (or a subset thereof), using node color, node size, and branch color to encode relevant geneset features (and their groups):
```{r dendro}
gs_dendro(res_enrich_macrophage,
n_gs = 50,
gs_dist_type = "kappa",
clust_method = "ward.D2",
color_leaves_by = "z_score",
size_leaves_by = "gs_pvalue",
color_branches_by = "clusters",
create_plot = TRUE)
```
Produce a MultiDimensional Scaling plot of the genesets in the enrichment results:
```{r mds}
gs_mds(res_enrich_macrophage,
res_macrophage_IFNg_vs_naive,
anno_df,
n_gs = 200,
gs_ids = NULL,
similarity_measure = "kappa_matrix",
mds_k = 2,
mds_labels = 5,
mds_colorby = "z_score",
gs_labels = NULL,
plot_title = NULL)
```
Obtain a simple overview of enrichment results, showing e.g. significance and direction of change (`z_score`)
```{r overview}
gs_summary_overview(res_enrich_macrophage,
n_gs = 30,
p_value_column = "gs_pvalue",
color_by = "z_score")
```
Plot a summary heatmap of genes vs genesets, encoding the logFC of genes and representing the overlap among genesets:
```{r sumheat}
gs_summary_heat(res_enrich_macrophage,
res_macrophage_IFNg_vs_naive,
anno_df,
n_gs = 15)
```
Recalling that you can also use a `gtl` list object as input, a similar call would be:
```{r}
gs_summary_heat(gtl = gtl,
n_gs = 10)
```
Plot a heatmap on the geneset scores, computed sample-wise via `gs_scores` so that the geneset members compose a Z score that gets summarized:
```{r scoresheat}
vst_macrophage <- vst(dds_macrophage)
scores_mat <- gs_scores(
se = vst_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df
)
gs_scoresheat(scores_mat,
n_gs = 30)
```
## Reporting
Enjoy a fully fledged `happy_hour` by running offline the analysis contained in the report, included in the `r BiocStyle::Biocpkg("GeneTonic")` package.
```{r happyhour, eval=FALSE}
happy_hour(dds = dds_macrophage,
res_de = res_de,
res_enrich = res_enrich,
annotation_obj = anno_df,
project_id = "examplerun",
mygenesets = res_enrich$gs_id[c(1:5,11,31)],
mygenes = c("ENSG00000125347",
"ENSG00000172399",
"ENSG00000137496")
)
```
Again, if used with the `gtl` object, the call would look like this:
```{r happyhour2, eval=FALSE}
happy_hour(gtl = gtl,
project_id = "examplerun",
mygenesets = gtl$res_enrich$gs_id[c(1:5,11,31)],
mygenes = c("ENSG00000125347",
"ENSG00000172399",
"ENSG00000137496"),
open_after_creating = TRUE
)
```
The functionality is also the same activated in the Bookmarks section of the interactive app.
Here you can choose a number of genesets (`mygenesets`) and genes (`mygenes`), specifying their identifiers, and obtain in batch a number of summary representations focused on these elements.
Advanced users can also provide a custom RMarkdown file, and to guarantee it works in the context of `happy_hour`, we recommend to inspect the template available here:
```{r template}
template_rmd <- system.file("extdata",
"cocktail_recipe.Rmd",
package = "GeneTonic")
template_rmd
```
## Comparison between sets
While `GeneTonic` works currently with a single `res_enrich` object, there are some functions that allow a visual comparison between two different enrichment results (the functions are displaying a shuffled instance of the single `res_enrich`):
```{r comparepair}
# generating some shuffled gene sets
res_enrich2 <- res_enrich_macrophage[1:50, ]
set.seed(42)
shuffled_ones <- sample(seq_len(50)) # to generate permuted p-values
res_enrich2$gs_pvalue <- res_enrich2$gs_pvalue[shuffled_ones]
res_enrich2$z_score <- res_enrich2$z_score[shuffled_ones]
res_enrich2$aggr_score <- res_enrich2$aggr_score[shuffled_ones]
gs_summary_overview_pair(res_enrich = res_enrich_macrophage,
res_enrich2 = res_enrich2,
n_gs = 25)
```
```{r compare4}
res_enrich2 <- res_enrich_macrophage[1:42, ]
res_enrich3 <- res_enrich_macrophage[1:42, ]
res_enrich4 <- res_enrich_macrophage[1:42, ]
set.seed(2*42)
shuffled_ones_2 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich2$gs_pvalue <- res_enrich2$gs_pvalue[shuffled_ones_2]
res_enrich2$z_score <- res_enrich2$z_score[shuffled_ones_2]
res_enrich2$aggr_score <- res_enrich2$aggr_score[shuffled_ones_2]
set.seed(3*42)
shuffled_ones_3 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich3$gs_pvalue <- res_enrich3$gs_pvalue[shuffled_ones_3]
res_enrich3$z_score <- res_enrich3$z_score[shuffled_ones_3]
res_enrich3$aggr_score <- res_enrich3$aggr_score[shuffled_ones_3]
set.seed(4*42)
shuffled_ones_4 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich4$gs_pvalue <- res_enrich4$gs_pvalue[shuffled_ones_4]
res_enrich4$z_score <- res_enrich4$z_score[shuffled_ones_4]
res_enrich4$aggr_score <- res_enrich4$aggr_score[shuffled_ones_4]
compa_list <- list(
scenario2 = res_enrich2,
scenario3 = res_enrich3,
scenario4 = res_enrich4
)
gs_horizon(res_enrich_macrophage,
compared_res_enrich_list = compa_list,
n_gs = 20,
sort_by = "clustered")
```
```{r compareradar}
# with only one set
gs_radar(res_enrich = res_enrich_macrophage)
# with a dataset to compare against
gs_radar(res_enrich = res_enrich_macrophage,
res_enrich2 = res_enrich2)
```
In particular, the `gs_horizon` function tries to group the different genesets based on their relative enrichment across conditions, similar to a representation of [@Dutertre2019] (e.g. Fig. 5G).
Thanks to Charlotte Soneson for providing a compact working implementation of this!
## Miscellaneous functions
Extract results and sort them (filtering by FDR) with `deseqresult2df`, select one top-ranking gene, extract its expression values and plot them, and display some buttons to link to external databases:
```{r misc}
head(deseqresult2df(res_macrophage_IFNg_vs_naive))
# to make sure normalized values are available...
dds_macrophage <- estimateSizeFactors(dds_macrophage)
gene_plot(dds_macrophage,
gene = "ENSG00000125347",
intgroup = "condition",
annotation_obj = anno_df,
plot_type = "auto")
gene_plot(dds_macrophage,
gene = "ENSG00000174944",
intgroup = "condition",
assay = "abundance",
annotation_obj = anno_df,
plot_type = "auto")
geneinfo_2_html("IRF1")
```
Plot a signature heatmap for a gene set, where a matrix of genes times samples for the set members is extracted, plus generate some HTML code to summarize the information of a Gene Ontology term:
```{r gsheatmap}
gs_heatmap(se = vst_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df,
geneset_id = "GO:0060337" ,
cluster_columns = TRUE,
anno_col_info = "condition"
)
go_2_html("GO:0060337",
res_enrich = res_enrich_macrophage)
```
Plot a signature volcano plot for a gene set, with the genes of the geneset highlighted in color and the remaining genes shown shaded in the background:
```{r signaturevolcano}
signature_volcano(res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df,
geneset_id = "GO:0060337",
FDR = 0.05,
color = "#1a81c2"
)
```
Some functions are just to ensure that the input objects are conform with the format expected by `GeneTonic`:
```{r shakers, eval=FALSE}
res_enrich <- shake_enrichResult(enrichment_results_from_clusterProfiler)
res_enrich <- shake_topGOtableResult(enrichment_results_from_topGOtable)
```
As an internal check, one can see if all required arguments are fine for running `GeneTonic` by running `checkup_GeneTonic`:
```{r checkup}
checkup_GeneTonic(dds = dds_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
res_enrich = res_enrich_macrophage,
annotation_obj = anno_df)
# if all is fine, it should return an invisible NULL and a simple message
```
# Additional Information {#additionalinfo}
Bug reports can be posted on the Bioconductor support site (https://support.bioconductor.org/) or raised as issues in the `GeneTonic` GitHub repository (https://github.com/federicomarini/GeneTonic/issues).
The GitHub repository also contains the development version of the package, where new functionality is added over time - careful, you might be running bleeding edge versions!
The authors appreciate well-considered suggestions for improvements or new features, or even better, pull requests.
If you use `GeneTonic` for your analysis, please cite it as shown below:
```{r cite}
citation("GeneTonic")
```
# FAQs {#faqs}
**Q: Why is this package called `GeneTonic`?**
A: For obvious reasons :) Like in a cocktail, ingredients (expression matrix, results from DE, and results from functional enrichment analysis) do taste better together.
Plus, I like puns with words.
And cocktails.
Sometimes simultaneously.
**Q: My configuration on two machines is somewhat different, so I am having difficulty in finding out what packages are different. Is there something to help on this?**
A: Yes, you can check out `r BiocStyle::Githubpkg("federicomarini/sessionDiffo")`, a small utility to compare the outputs of two different `sessionInfo` outputs.
This can help you pinpoint what packages might be causing the issue.
**Q: I am using a different service/software for generating the results of functional enrichment analysis. How do I plug this into `GeneTonic`?**
A: You can set up a small conversion function, on the line of `shake_topGOtableResult` and `shake_enrichResult` (*use the source, Luke* to see how you can build your own), or request such a function - make sure to specify the formatting specifications of your tool of choice (e.g. DAVID, g:Profiler, or commercial solutions such as Ingenuity Pathway Analysis).
As of Bioconductor release 3.12 (version >= 1.2.0 for `r BiocStyle::Biocpkg("GeneTonic")`), shaker functions are available for these tools:
- `DAVID` (https://david.ncifcrf.gov/), with the result exported in text format
- `topGO` (available in the `r BiocStyle::Biocpkg("topGO")` package), as in the wrapper function provided by `r BiocStyle::Biocpkg("pcaExplorer")`, `pcaExplorer::topGOtable()`
- `Enrichr` (https://amp.pharm.mssm.edu/Enrichr/), either from text format or from a call via the `r BiocStyle::CRANpkg("enrichr")` package
- `fgsea`, taking the output of the workflow from the `r BiocStyle::Biocpkg("fgsea")` package
- `g:Profiler` (https://biit.cs.ut.ee/gprofiler/), either from text file format or with the output of `gost()` in `r BiocStyle::CRANpkg("gprofiler2")`
- `clusterProfiler`, providing `enrichResult` objects, commonly returned by `enrichGO()` from `r BiocStyle::Biocpkg("clusterProfiler")` or `enrichPathway()` from `r BiocStyle::Biocpkg("ReactomePA")`
**Q: I'd like to try `GeneTonic` but I could not install it/I just want a quick peek into it. What can I do for this?**
A: We set up an instance of `GeneTonic` running on the `macrophage` dataset at this address: http://shiny.imbei.uni-mainz.de:3838/GeneTonic.
# Session Info {- .smaller}
```{r sessioninfo}
sessionInfo()
```
# References {-}