## No methods found in "S4Vectors" for requests: rowMeans
## No methods found in "S4Vectors" for requests: rowSums
Interactive Differential Expression Analysis Tool
Differential expression (DE) analysis has become an increasingly popular tool in determining and viewing up and/or down experssed genes between two sets of samples. The goal of differential gene expression analysis is to find genes or transcripts whose difference in expression, when accounting for the variance within condition, is higher than expected by chance. DESeq2 is an R package available via Bioconductor and is designed to normalize count data from high-throughput sequencing assays such as RNA-Seq and test for differential expression (Love et al. 2014). With multiple parameters such as padjust values, log fold changes, plot styles, and so on, altering plots created with your DE data can be a hassle as well as time consuming. The Differential Expression Browser uses DESeq2 (Love et al., 2014), EdgeR (Robinson et al., 2010), and Limma (Ritchie et al., 2015) coupled with shiny (Chang, W. et al., 2016) to produce real-time changes within your plot queries and allows for interactive browsing of your DE results. In addition to DE analysis, DEBrowser also offers a variety of other plots and analysis tools to help visualize your data even further.
DEBrowser utilizes Shiny, a R based application development tool that creates a wonderful interactive user interface (UI) combined with all of the computing prowess of R. After the user has selected the data to analyze and has used the shiny UI to run DESeq2, the results are then input to DEBrowser. DEBrowser manipulates your results in a way that allows for interactive plotting by which changing padj or fold change limits also changes the displayed graph(s). For more details about these plots and tables, please visit our quick start guide for some helpful tutorials.
For comparisons against other popular data visualization tools, see the comparison table below (Figure 40).
Before you start;
First, you will have to install R and/or RStudio. (On Fedora/Red Hat/CentOS, these packages have to be installed; openssl-devel, libxml2-devel, libcurl-devel, libpng-devel) Running these simple commands will launch the DEBrowser within your local machine:
# Installation instructions:
# 1. Install DEBrowser and its dependencies by running the lines below
# in R or RStudio.
source(“https://www.bioconductor.org/biocLite.R”)
biocLite("debrowser")
# 2. Load the library
library(debrowser)
# 3. Start DEBrowser
startDEBrowser()
Once you have the DEBrowser running, a page will load asking to choose a TSV file or to load the demo data. In order to run DESeq2, we are going to need gene quantifications for genes contained in a tab-separated values (TSV) format. Gene quantifications table can be obtained running standard software like HTSeq (Anders,S. et al, 2014) or RSEM (Li and Dewey, 2011). The file values must contain the gene, transcript(s), and the sample raw count values you wish to enter into DEBrowser.
It’s important to note that if your rows contain duplicate gene names, DEBrowser will reject your TSV file. Please try to keep unique gene names. A sample file looks like:
gene | transcript | exper_rep1 | exper_rep2 | control_rep1 | control_rep2 |
---|---|---|---|---|---|
DQ714826 | uc007tfl.1 | 0.00 | 0.00 | 0.00 | 0.00 |
DQ551521 | uc008bml.1 | 0.00 | 0.00 | 0.00 | 0.00 |
AK028549 | uc011wpi.1 | 2.00 | 1.29 | 0.00 | 0.00 |
You can also view/use the demo data by clicking the ‘Load Demo!’ text as an example. For the case study demo data, feel free to download our case study demo files at https://bioinfo.umassmed.edu/pub/debrowser/advanced_demo.tsv or a simplified version https://bioinfo.umassmed.edu/pub/debrowser/simple_demo.tsv. Please also note that, DEBrowser skips second column and starts reading the quantification values from the 3rd column.
DEBrowser also accepts JSON objects via hyperlink by following a few conversion steps.
First, using the API provided by Dolphin, we will convert a TSV file into a JSON object using this web api:
https://dolphin.umassmed.edu/public/api/
The Two parameters it accepts (and examples) are:
1. source=https://bioinfo.umassmed.edu/pub/debrowser/advanced_demo.tsv
2. format=JSON
Leaving you with a hyperlink for:
https://dolphin.umassmed.edu/public/api/?source=https://bioinfo.umassmed.edu/pub/debrowser/
advanced_demo.tsv&format=JSON
Next you will need to encode the URL so you can pass it to the DEBrowser website. You can find multiple URL encoders online, such as the one located at this web address: https://www.url-encode-decode.com/.
Encoding our URL will turn it into this:
http%3A%2F%2Fdolphin.umassmed.edu%2Fpublic%2Fapi%2F%3Fsource%3Dhttp%3A%2F%2Fbioinfo
.umassmed.edu%2Fpub%2Fdebrowser%2Fadvanced_demo.tsv%26format%3DJSON
Now this link can be be used in debrowser as:
https://debrowser.umassmed.edu:444/debrowser/R/
It accepts two parameters:
1. jsonobject=http%3A%2F%2Fdolphin.umassmed.edu%2Fpublic%2Fapi%2F%3Fsource%3Dhttp%3A%2F%2F
bioinfo.umassmed.edu%2Fpub%2Fdebrowser%2Fadvanced_demo.tsv%26format%3DJSON
2. title=no
The finished product of the link will look like this:
https://debrowser.umassmed.edu:444/debrowser/R/?jsonobject=https://dolphin.umassmed.edu/public/
api/?source=https://bioinfo.umassmed.edu/pub/debrowser/advanced_demo.tsv&format=JSON&title=no
Entering this URL into your web browser will automatically load in your data as a JSON object, allowing you to start browsing your data right away.
In addition to the sample TSV file you will provide; you can also correct for batch effects or any other normalizing conditions you might want to address that might be within your results. To handle for these corrections, simply create a TSV file such as the one located below:
sample | batch | condition |
---|---|---|
s1_b1_cA | 1 | A |
s2_b1_cA | 1 | A |
s3_b2_cB | 2 | B |
s4_b2_cB | 2 | B |
s5_b1_cB | 1 | B |
This meta data file is custom made TSV created by the user and is used in order to establish different batch effects for multiple conditions. You can have as many conditions as you may require, as long as all of the samples are present. Once the TSV file has been loaded in along with your data TSV file, DEBrowser uses ComBat (part of the SVA bioconductor package) to adjust for possible batch effect or conditional biases. For more information about ComBat within the SVA package you can visit here: https://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf.
To load in the specific file that contains the batch meta data, at the start of the DEBrowser there will be a “Choose Meta Data File (Optional)” which you can then select the batch meta data file to use for this analysis. Upon meta-data loading, you will then be able to select from a drop down box that will specify which condition column you want to use for analysis.
After obtaining and loading in the gene quantifications file, and if specified the meta data file containing your batch correction fields, you then have the option to view QC information of your quantifications or you can continue on to running DESeq2 ( ).
Figure 1. (A) The initial data selection menu. Intial TSV data is loaded in the ‘Choose TSV File’ while the optional meta data file is loaded in under ‘Choose Mera Data File (Optional)’. (B) Options list once data/meta data have been loaded in.
If you prefer to select conditions beforehand, and save them as a TSV file to upload, you have this option as of February 2017. You can split up conditions into two groups in a TSV file, and have as many selections as you want for different groupings.
To load in the specific file that contains the meta data, at the start of the DEBrowser there will be a “Choose Meta Data File (Optional)” which you can then select the meta data file to use for this analysis. In the metadata file, you will need to have a sample column as the first column and from then on exactly 2 groups in each column([cond1, cond2], [1, 2], etc) to be matched to the sample column. Sample TSV:
sample | select1 | selection2 |
---|---|---|
exper_rep1 | cond1 | 1 |
exper_rep2 | cond1 | 2 |
exper_rep3 | cond2 | 1 |
control_rep1 | cond2 | 2 |
control_rep2 | cond2 | 1 |
control_rep3 | cond2 | 2 |
The example above would result in ‘select1’ having the first set of conditions as {exper_rep1, exper_rep2} from ‘cond1’ and second set of conditions as {exper_rep3, control_rep1, control_rep2, control_rep3} from ‘cond2’ as they correspond to those conditions in the ‘sample’ column.
In the same way, ‘selection2’ would have the first set as {exper_rep1, exper_rep3, control_rep2} from ‘1’ and second set as {exper_rep2, control_rep1, control_rep3} from ‘2’ as they correspond to those conditions in the ‘sample’ column.
Upon selection of QC information, you will be shown an all-to-all plot of your samples (Figure 2). This sample-by-sample comparison will help you visualize possible discrepancies between replicate samples, in case you may want to omit them for further analysis. This graph includes sample-to-sample dotplot correlations as well as sample histograms. To the left of this plot are various plot-shaping options you can alter to more easily view the all-to-all plot.
Additionally, two more QC plots are available for you to use: heatmap and PCA plots. The heatmap (Figure 3) will display genes for each sample within your dataset in the form of a heatmap based on your dataset selection and PCA (Figure 4) will display Principal component analysis of your dataset. Additionally, you can view the IRQ (Interquartile Range) for both your raw data and your data after normalization (Figure 5). You can also view a density plot for your sample data for your raw data and the data after normalization (Figure 6). IQR and Density plots are another great visualization too to help you spot outliers within your sample data incase you want to remove or look into any possible discrepancies.
All of these plots will aid in viewing your preliminary data to see if there are any potential errors between replicates or batch effects (Reese et. al, 2013; Risso et al., 2014). You have the option of viewing an interactive heatmap by selecting the ‘Interactive’ checkbox in the left side panel when you have selected the Heatmap option. This Interactive heatmap will display genes as you hover over them for a more in-depth understanding.
You can select these various plot options by selecting the type of plot you wish to view on the left panel.
Figure 2. Display of the all-to-all plot in the initial QC plots page.
You can also view the genes within your quantification file in various ways. The ‘Tables’ tab will bring you to a table setup based on the dataset you have selected on the left options panel. The ‘All detected’ option lists all of the genes present within your file. The ‘Selected’ option lets you browse your gene selection based on your interactive heatmap selection. The Last option, ‘Most Varied’ (Figure 7), will display your top N varied genes. You can alter the value of N by selecting ‘most-varied’ from the dropdown menu on the left.