--- title: "Supporting Material for Trefflich2019" author: "Sheyla Trefflich, Rodrigo J. S. Dalmolin, José M. Ortega, Mauro A. A. Castro." date: "`r BiocStyle::doc_date()`" bibliography: bibliography.bib output: BiocStyle::html_document: css: custom.css vignette: > %\VignetteIndexEntry{"Supporting Material for Trefflich2019."} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Context @Fletcher2013 reconstructed regulons for 809 transcription factors (TFs) using microarray transcriptomic data from the METABRIC breast cancer cohort [@Curtis2012]. **Our goal here is to assess the evolutionary root of the regulons reconstructed by @Fletcher2013** using the *geneplast* package. This script reproduces the main observations described in @Trefflich2019, which proposed a framework to explore the evolutionary roots of regulons. # Package installation and data sets Please make sure to install all required packages. Installing and then loading the *geneplast.data.string.v91* and *Fletcher2013b* data packages will make available all data required for this case study. ```{r eval=TRUE, message=FALSE, results='hide'} #-- Call packages library(geneplast) library(geneplast.data.string.v91) library(Fletcher2013b) library(ggplot2) library(ggpubr) ``` # Inferring evolutionary roots This analysis will determine the evolutionary root of a gene based on the distribution of its orthologs in a given species tree. We will need two data objects, `cogdata` and `phyloTree`, both loaded with the `gpdata_string_v91` call. The `cogdata` is a `data.frame` object listing orthologous groups (OGs) predicted for 121 eukaryotic species, while the `phyloTree` is a phylogenetic tree object of class `phylo`. The `groot.preprocess` function will check the input data and build an object of class `OGR`, which will be used in the subsequent steps of the analysis pipeline. ```{r eval=FALSE, message=FALSE, warning=FALSE} #-- Load orthology data from the 'geneplast.data.string.v91' package data(gpdata_string_v91) #-- Create an object of class 'OGR' for a reference 'spid' ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606") ``` The `groot` function addresses the problem of finding the evolutionary root of a feature in an phylogenetic tree. The method infers the probability that such feature was present in the Last Common Ancestor (LCA) of a given lineage. The `groot` function assesses the presence and absence of the orthologs in the extant species of the phylogenetic tree in order to build a probability distribution, which is used to identify vertical heritage patterns. The `spid=9606` parameter sets *Homo sapiens* as the reference species, which defines the ancestral lineage assessed in the query (*i.e. each ortholog of the reference species will be rooted in an ancestor of the reference species*). ```{r eval=FALSE, message=FALSE, warning=FALSE} #-- Run the 'groot' function and infer the evolutionary roots ogr <- groot(ogr, nPermutations=1000, verbose=TRUE) ``` # Mapping root-to-gene annotation In this section we will map the inferred evolutionary roots (available in the `ogr` object) to genes annotated in the regulons reconstructed by @Fletcher2013 (available in the `rtni1st` object). For a summary of the regulons in the `rtni1st` object we recommend using the `tni.regulon.summary` function, which shows that there are 809 regulatory elements (TFs) and 14131 targets. ```{r eval=FALSE, message=FALSE, warning=FALSE} #-- Load regulons data("rtni1st") tni.regulon.summary(rtni1st) ``` ```{r eval=FALSE, message=FALSE, warning=FALSE} ## This regulatory network comprised of 809 regulons. ## regulatoryElements Targets Edges ## 809 14131 47012 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0 10.0 37.0 58.1 80.0 523.0 ## regulatoryElements Targets Edges ## 809 14131 617672 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 43 449 764 1245 4148 ## --- ``` We will transform the `rtni1st` into a `graph` object using the `tni.graph` function. The resulting `graph` will be assessed by the `ogr2igraph` function, which will map the root-to-gene annotation; the results will be available in the `roots_df` data frame for subsequent analysis. ```{r eval=FALSE, message=FALSE, warning=FALSE} #-- Put regulons into an 'igraph' object #-- Note: small regulons (n<15 targets) are romeved in this step. graph <- tni.graph(rtni1st, gtype = "rmap") #-- Map the 'ogr' object to the 'igraph' object graph <- ogr2igraph(ogr, cogdata, graph, idkey = "ENTREZ") #-- Make a data frame with the gene roots roots_df <- data.frame(SYMBOL = V(graph)$SYMBOL, ENTREZ = V(graph)$ENTREZ, Root = V(graph)$Root, TF_Targets = c("Targets","TFs")[V(graph)$tfs+1]) ``` Please note that some level of missing annotation is expected, as not all gene ids listed in the `cogdata` might be available in the `graph` object. Also, small regulons (n < 15 targets) are removed by the `tni.graph` function. As a final pre-processing step, we will remove genes rooted at the base of the phylogenetic tree, for which the predictions can not discriminate from earlier ancestor roots. Here, 307 TFs and 6308 targets were retained. ```{r eval=FALSE} #-- Remove NAs from missing annotation roots_df <- roots_df[complete.cases(roots_df),] #-- Remove genes rooted at the base of the phylogenetic tree roots_df <- roots_df[roots_df$RootFigure 1. Distribution of the inferred evolutionary roots of TFs and target genes. Next we compute the root distance between a TF and its targets, and then generate a pie chart and a boxplot that reproduce the evolutionary scenarios discussed in @Trefflich2019. ```{r eval=FALSE} #-- Get roots for TFs idx <- roots_df$TF_Targets=="TFs" tfroots <- roots_df$Root[idx] names(tfroots) <- roots_df$SYMBOL[idx] #-- Get roots for TF-target genes regulons <- tni.get(rtni1st, what = "regulons", idkey = "ENTREZ")[names(tfroots)] regroots <- lapply(regulons, function(reg){ roots_df$Root[roots_df$ENTREZ%in%reg] }) tfroots <- tfroots[names(regroots)] #-- Compute root distance between a TF and its targets rootdist <- lapply(names(regroots), function(reg){ regroots[[reg]]-tfroots[reg] }) names(rootdist) <- names(regroots) #-- Compute median root distance med_rootdist <- unlist(lapply(rootdist, median)) #-- Plot a pie chart grouping regulons based on #-- the median root distance cols = c("#1c92d5","grey","#98d1f2") n <- as.numeric(table(sign(med_rootdist))) pie(n, labels = paste(n,"regulons"), col = cols, border="white", cex=1.5) legend("bottomleft", legend = c("TF-target genes rooted before the TF", "TF-target genes rooted with the TF", "TF-target genes rooted after the TF"), fill = rev(cols), bty = "n") ``` ```{r eval=FALSE, include=FALSE} cols = c("#1c92d5","grey","#98d1f2") n <- as.numeric(table(sign(med_rootdist))) pdf(file = "pie_regulon_roots.pdf") pie(n, labels = paste(n,"regulons"), col = cols, border="white", cex=1.5) legend("bottomleft", legend = c("TF-target genes rooted before the TF", "TF-target genes rooted with the TF", "TF-target genes rooted after the TF"), fill = rev(cols), bty = "n") dev.off() ``` ![title](geneplast_Trefflich2019_2.png) Figure 2. Regulons grouped based on the median distance between a TF's root and its targets' roots. ```{r eval=FALSE} #-- Plot a boxplot showing individual regulons #-- Sort regulons based on the calculated root distances med_rootdist <- sort(med_rootdist, decreasing = T) rootdist <- rootdist[names(med_rootdist)] regroots <- regroots[names(med_rootdist)] tfroots <- tfroots[names(med_rootdist)] #-- Generate the boxplot plot.new() par(usr=c(c(0,length(rootdist)),range(rootdist))) idx <- sign(med_rootdist)+2 cols = c("#1c92d5","grey","#98d1f2") boxplot(rootdist, horizontal= F, outline=FALSE, las=2, axes=FALSE, add=T, pars = list(boxwex = 0.6, boxcol=cols[idx], whiskcol=cols[idx]), pch="|", lty=1, lwd=0.75, col = cols[idx]) abline(h=0, lmitre=5, col="#E69F00", lwd=3, lt=2) par(mgp=c(2,0.1,0)) axis(side=1, cex.axis=1.2, padj=0.5, hadj=0.5, las=1, lwd=1.5, tcl= -0.2) par(mgp=c(2.5,1.2,0.5)) axis(side=2, cex.axis=1.2, padj=0.5, hadj=0.5, las=1, lwd=1.5, tcl= -0.2) legend("topright",legend = c("TF-target genes rooted before the TF", "TF-target genes rooted with the TF", "TF-target genes rooted after the TF"), fill = rev(cols), bty = "n") title(xlab = "Regulons sorted by the median distance to TF root", ylab = "Distance to TF root") ``` ![title](geneplast_Trefflich2019_3.png) Figure 3. Regulons sorted by the median distance to TF root. # Session information ```{r label='Session information', eval=TRUE, echo=FALSE} sessionInfo() ``` # References