--- title: 'MetaVolcanoR: Differential expression meta-analysis tool' date: June 5, 2019 output: html_document: toc: yes pdf_document: toc: yes prettydoc::html_pretty: highlight: github theme: lumen toc: yes vignette: > %\VignetteIndexEntry{MetaVolcanoR: Differential expression meta-analysis tool} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE, cache=TRUE) ``` # Introduction The measurement of how gene expression changes under different biological conditions is necessary to reveal the gene regulatory programs that determine the cellular phenotype. Comparing the expresion of genes under a given condition against a reference biological state is usually applied to identify sets of differentially expressed genes (DEG). These DEG point out the genomic regions functionally relevant under the biological condition of interest. Athough individual genome-wide expression studies have small signal/noise ratio, today's genomic data availability usually allows to combine differential gene expression results from independent studies to overcome this limitation. Databases such as GEO (https://www.ncbi.nlm.nih.gov/geo/), SRA (https://www.ncbi.nlm.nih.gov/sra), ArrayExpress, (https://www.ebi.ac.uk/arrayexpress/), and ENA (https://www.ebi.ac.uk/ena) offer systematic access to vast amounts of transcriptome data. There exists more than one gene expression study for many biological conditions. This redundancy could be exploit by meta-analysis approaches to reveal genes that are consistently and differentially expressed under given conditions. MetaVolcanoR was designed to identify the genes whose expression is consistently perturbed across several studies. # Usage ## Overview The MetaVolcanoR R package combines differential gene expression results. It implements three strategies to summarize gene expression activities from different studies. i) Random Effects Model (REM) approach. ii) a vote-counting approach, and iii) a p-value combining-approach. MetaVolcano exploits the Volcano plot reasoning to visualize the gene expression meta-analysis results. ## Installation ```{r} BiocManager::install("MetaVolcanoR", eval = FALSE) ``` ## Load library ```{r} library(MetaVolcanoR) ``` ## Input Data Users should provide a named list of \code{data.table/data.frame} objects containing differential gene expression results. Each object of the list must contain a *gene name*, a *fold change*, and a *p-value* variable. It is highly recommended to also include the *variance* or the confidence intervals of the *fold change* variable. Take a look at the demo data. It includes differential gene expression results from five studies. ```{r} data(diffexplist) class(diffexplist) head(diffexplist[[1]]) length(diffexplist) ``` # Implemented meta-analysis approaches ## Random Effect Model MetaVolcano The *REM* MetaVolcano summarizes the gene fold change of several studies taking into account the variance. The *REM* estimates a *summary p-value* which stands for the probability of the *summary fold-change* is not different than zero. Users can set the *metathr* parameter to highlight the top percentage of the most consistently perturbed genes. This perturbation ranking is defined following the *topconfects* approach. ```{r} meta_degs_rem <- rem_mv(diffexp=diffexplist, pcriteria="pvalue", foldchangecol='Log2FC', genenamecol='Symbol', geneidcol=NULL, collaps=FALSE, llcol='CI.L', rlcol='CI.R', vcol=NULL, cvar=TRUE, metathr=0.01, jobname="MetaVolcano", outputfolder=".", draw='HTML', ncores=1) head(meta_degs_rem@metaresult, 3) meta_degs_rem@MetaVolcano draw_forest(remres=meta_degs_rem, gene="MMP9", genecol="Symbol", foldchangecol="Log2FC", llcol="CI.L", rlcol="CI.R", jobname="MetaVolcano", outputfolder=".", draw="HTML") draw_forest(remres=meta_degs_rem, gene="COL6A6", genecol="Symbol", foldchangecol="Log2FC", llcol="CI.L", rlcol="CI.R", jobname="MetaVolcano", outputfolder=".", draw="HTML") ```   The *REM* MetaVolcano also allows users to explore the forest plot of a given gene based on the REM results. ## Vote-counting approach The *vote-counting* MetaVolcano identifies differential expressed genes (DEG) for each study based on the user-defined *p-value* and *fold change* thresholds. It displays the number of differentially expressed and unperturbed genes per study. In addition, it plots the inverse cumulative distribution of the consistently DEG, so the user can identify the number of genes whose expression is perturbed in at least 1 or n studies. ```{r} meta_degs_vote <- votecount_mv(diffexp=diffexplist, pcriteria='pvalue', foldchangecol='Log2FC', genenamecol='Symbol', geneidcol=NULL, pvalue=0.05, foldchange=0, metathr=0.01, collaps=FALSE, jobname="MetaVolcano", outputfolder=".", draw='HTML') head(meta_degs_vote@metaresult, 3) meta_degs_vote@degfreq ``` The *vote-counting* MetaVolcano visualizes genes based on the number of studies where genes were identified as differentially expressed and the gene fold change sign consistency. It means that a gene that was differentially expressed in five studies, from which three of them it was downregulated, will get a sign consistency score of 2 + (-3) = -1. Based on user preference, MetaVolcano can highlight the top *metathr* percentage of consistently perturbed genes. ```{r} meta_degs_vote@MetaVolcano ``` ## Combining-approach The *combinig* MetaVolcano summarizes the fold change of a gene in different studies by the *mean* or *median* depending on the user preference. In addition, the *combinig* MetaVolcano summarizes the gene differential expression *p-values* using the Fisher method. The *combining* MetaVolcano can highlight the top *metathr* percentage of consistently perturbed genes. ```{r} meta_degs_comb <- combining_mv(diffexp=diffexplist, pcriteria='pvalue', foldchangecol='Log2FC', genenamecol='Symbol', geneidcol=NULL, metafc='Mean', metathr=0.01, collaps=TRUE, jobname="MetaVolcano", outputfolder=".", draw='HTML') head(meta_degs_comb@metaresult, 3) meta_degs_comb@MetaVolcano ```