---
title: "Moonlight: an approach to identify multiple role of biomarkers as oncogene or tumorsuppressor in different tumor types and stages."
author: "Antonio Colaprico^+^ ,
Catharina Olsen^+^,
Claudia Cava,
Thilde Terkelsen,
Laura Cantini,
Gloria Bertoli,
Andre Olsen,
Andrei Zinovyev,
Emmanuel Barillot,
Isabella Castiglioni,
Elena Papaleo,
Gianluca Bontempi"
subtitle: ^+^ These authors contributed equally to the paper as first authors.
date: "`r Sys.Date()`"
output:
BiocStyle::html_document:
toc: true
number_sections: false
toc_depth: 2
highlight: haddock
references:
- id: ref1
title: Orchestrating high-throughput genomic analysis with Bioconductor
author:
- family: Huber, Wolfgang and Carey, Vincent J and Gentleman, Robert and Anders, Simon and Carlson, Marc and Carvalho, Benilton S and Bravo, Hector Corrada and Davis, Sean and Gatto, Laurent and Girke, Thomas and others
given:
journal: Nature methods
volume: 12
number: 2
pages: 115-121
issued:
year: 2015
- id: ref2
title: GC-content normalization for RNA-Seq data
author:
- family: Risso, Davide and Schwartz, Katja and Sherlock, Gavin and Dudoit, Sandrine
given:
journal: BMC bioinformatics
volume: 12
number: 1
pages: 480
issued:
year: 2011
- id: ref3
title: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
author:
- family: Bullard, James H and Purdom, Elizabeth and Hansen, Kasper D and Dudoit, Sandrine
given:
journal: BMC bioinformatics
volume: 11
number: 1
pages: 94
issued:
year: 2010
- id: ref4
title: Inferring regulatory element landscapes and transcription factor networks from cancer methylomes
author:
- family: Yao, L., Shen, H., Laird, P. W., Farnham, P. J., & Berman, B. P.
given:
journal: Genome biology
volume: 16
number: 1
pages: 105
issued:
year: 2015
- id: ref5
title: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
author:
- family: James H Bullard, Elizabeth Purdom, Kasper D Hansen and Sandrine Dudoit
given:
journal: BMC Bioinformatics
volume: 11
number: 1
pages: 94
issued:
year: 2010
- id: ref6
title: GC-content normalization for RNA-Seq data
author:
- family: Risso, D., Schwartz, K., Sherlock, G., & Dudoit, S.
given:
journal: BMC Bioinformatics
volume: 12
number: 1
pages: 480
issued:
year: 2011
- id: ref7
title: Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma
author:
- family: Noushmehr, H., Weisenberger, D.J., Diefes, K., Phillips, H.S., Pujara, K., Berman, B.P., Pan, F., Pelloski, C.E., Sulman, E.P., Bhat, K.P. et al.
given:
journal: Cancer cell
volume: 17
number: 5
pages: 510-522
issued:
year: 2010
- id: ref8
title: Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma
author:
- family: Ceccarelli, Michele and Barthel, Floris P and Malta, Tathiane M and Sabedot, Thais S and Salama, Sofie R and Murray, Bradley A and Morozova, Olena and Newton, Yulia and Radenbaugh, Amie and Pagnotta, Stefano M and others
given:
journal: Cell
URL: "http://doi.org/10.1016/j.cell.2015.12.028"
DOI: "10.1016/j.cell.2015.12.028"
volume: 164
number: 3
pages: 550-563
issued:
year: 2016
- id: ref9
title: Comprehensive molecular profiling of lung adenocarcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature13385"
DOI: "10.1038/nature13385"
volume: 511
number: 7511
pages: 543-550
issued:
year: 2014
- id: ref10
title: Comprehensive molecular characterization of gastric adenocarcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature13480"
DOI: "10.1038/nature13480"
issued:
year: 2014
- id: ref11
title: Comprehensive molecular portraits of human breast tumours
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature11412"
DOI: "10.1038/nature11412"
volume: 490
number: 7418
pages: 61-70
issued:
year: 2012
- id: ref12
title: Comprehensive molecular characterization of human colon and rectal cancer
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature11252"
DOI: "10.1038/nature11252"
volume: 487
number: 7407
pages: 330-337
issued:
year: 2012
- id: ref13
title: Genomic classification of cutaneous melanoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Cell
URL: "http://doi.org/10.1016/j.cell.2015.05.044"
DOI: "10.1016/j.cell.2015.05.044"
volume: 161
number: 7
pages: 1681-1696
issued:
year: 2015
- id: ref14
title: Comprehensive genomic characterization of head and neck squamous cell carcinomas
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature14129"
DOI: "10.1038/nature14129"
volume: 517
number: 7536
pages: 576-582
issued:
year: 2015
- id: ref15
title: The somatic genomic landscape of chromophobe renal cell carcinoma
author:
- family: Davis, Caleb F and Ricketts, Christopher J and Wang, Min and Yang, Lixing and Cherniack, Andrew D and Shen, Hui and Buhay, Christian and Kang, Hyojin and Kim, Sang Cheol and Fahey, Catherine C and others
given:
journal: Cancer Cell
URL: "http://doi.org/10.1016/j.ccr.2014.07.014"
DOI: "10.1016/j.ccr.2014.07.014"
volume: 26
number: 3
pages: 319-330
issued:
year: 2014
- id: ref16
title: Comprehensive genomic characterization of squamous cell lung cancers
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature11404"
DOI: "10.1038/nature11404"
volume: 489
number: 7417
pages: 519-525
issued:
year: 2012
- id: ref17
title: Integrated genomic characterization of endometrial carcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature12113"
DOI: "10.1038/nature12113"
volume: 497
number: 7447
pages: 67-73
issued:
year: 2013
- id: ref18
title: Integrated genomic characterization of papillary thyroid carcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Cell
URL: "http://doi.org/10.1016/j.cell.2014.09.050"
DOI: "10.1016/j.cell.2014.09.050"
volume: 159
number: 3
pages: 676-690
issued:
year: 2014
- id: ref19
title: The molecular taxonomy of primary prostate cancer
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Cell
URL: "http://doi.org/10.1016/j.cell.2015.10.025"
DOI: "10.1016/j.cell.2015.10.025"
volume: 163
number: 4
pages: 1011-1025
issued:
year: 2015
- id: ref20
title: Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma
author:
- family: Linehan, W Marston and Spellman, Paul T and Ricketts, Christopher J and Creighton, Chad J and Fei, Suzanne S and Davis, Caleb and Wheeler, David A and Murray, Bradley A and Schmidt, Laura and Vocke, Cathy D and others
given:
journal: NEW ENGLAND JOURNAL OF MEDICINE
URL: "http://doi.org/10.1056/NEJMoa1505917"
DOI: "10.1056/NEJMoa1505917"
volume: 374
number: 2
pages: 135-145
issued:
year: 2016
- id: ref21
title: Comprehensive molecular characterization of clear cell renal cell carcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Nature
URL: "http://doi.org/10.1038/nature12222"
DOI: "10.1038/nature12222"
volume: 499
number: 7456
pages: 43-49
issued:
year: 2013
- id: ref22
title: Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma
author:
- family: Cancer Genome Atlas Research Network and others
given:
journal: Cancer Cell
URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002"
DOI: "10.1016/j.ccell.2016.04.002"
volume: 29
pages: 43-49
issued:
year: 2016
- id: ref23
title: Complex heatmaps reveal patterns and correlations in multidimensional genomic data
author:
- family: Gu, Zuguang and Eils, Roland and Schlesner, Matthias
given:
journal: Bioinformatics
URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002"
DOI: "10.1016/j.ccell.2016.04.002"
pages: "btw313"
issued:
year: 2016
- id: ref24
title: "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages"
author:
- family: Silva, TC and Colaprico, A and Olsen, C and D'Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H
given:
journal: F1000Research
URL: "http://dx.doi.org/10.12688/f1000research.8923.1"
DOI: "10.12688/f1000research.8923.1"
volume: 5
number: 1542
issued:
year: 2016
- id: ref25
title: "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data"
author:
- family: Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan
given:
journal: Nucleic Acids Research
URL: "http://dx.doi.org/10.1093/nar/gkv1507"
DOI: "10.1093/nar/gkv1507"
volume: 44
number: 8
pages: e71
issued:
year: 2016
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(dpi = 300)
knitr::opts_chunk$set(cache=FALSE)
```
```{r, echo = FALSE,hide=TRUE, message=FALSE,warning=FALSE}
devtools::load_all(".")
```
# Abstract
In order to make light of cancer development, it is crucial to understand which genes play a role in the mechanisms linked to this disease and moreover which role that is. Commonly biological processes such as proliferation and apoptosis have been linked to cancer progression.
Based on expression data we perform functional enrichment analysis, infer gene regulatory networks and upstream regulator analysis to score the importance of well-known biological processes with respect to the studied cancer.
We then use these scores to predict two specific roles: genes that act as tumor suppressor genes (TSGs) and genes that act as oncogenes (OCGs). This methodology not only allows us to identify genes with dual role (TSG in one cancer type and OCG in another) but also to elucidate the underlying biological processes.
# Introduction
Cancer development is influenced by mutations in two distinctly different categories of genes, known as tumor suppressor genes (TSG) and oncogenes (OCG). The occurrence of mutations in genes of the first category leads to faster cell proliferation while mutations in genes of second category increases or changes their function.
We propose `MoonlightR` a new approach to define TSGs and OCGs based on functional enrichment analysis, infer gene regulatory networks and upstream regulator analysis to score the importance of well-known biological processes with respect to the studied cancer.
# Moonlight's pipeline
Moonlight's pipeline is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
library(png)
library(grid)
img <- readPNG("Moonlight_Pipeline.png")
grid.raster(img)
```
# Moonlight's proposed workflow
The proposed pipeline consists of following eight steps:
1. **getDataTCGA** \& **getDataGEO** for Data collection: expression levels of genes in all samples obtained with IlluminaHiSeq RNASeqV2 in 18 normal tissues (NT) and 18 cancer tissues (CT) according to TCGA criteria, and GEO data set matched to one of the 18 given TCGA cancer types as described in following Table TCGA / GEO.
2. **DPA** Differential Phenotype Analysis (DEA) to identify genes or probes that are different significantly with two phenotypes such as normal and tumor, or normal and stageI, normal and molecular subtype.
3. **FEA** Functional Enrichment Analysis (EA), using Fisher's test, to identify gene sets (with biological functions linked to cancer1) significantly enriched by RG.
4. **GRN** Gene regulatory network inferred between each single DEG (sDEG) and all genes by means of mutual information, obtaining for each DEG a list of regulated genes (RG).
5. **URA** Upstream Regulator Analysis for DEGs in each enriched gene set, we applied z-score being the ratio between the sum of all predicted effects for all the gene involved in the specific function and the square-root of the number of all genes.
6. **PRA** Pattern recognition analysis identifies candidate TCGs (down) and OCGs (up). We either use user defined biological processes or random forests.
7. We applied the above procedure to multiple cancer types to obtain cancer-specific lists of TCGs and OCGs. We compared the lists for each cancer: if a sDEG was TSG in a cancer and OCG in another we defined it as dual-role TSG-OCG. Otherwise if we found a sDEG defined as OCG or TSG only in one tissue we defined it tissue specific biomarker.
8. We use the COSMIC database to define a list of gold standard TSG and OCGs to assess the accuracy of the proposed method.
1 For the devel version of MoonlightR we use a short extract of ten biological functions from QIAGEN'S Ingenuity Pathway Analysis (IPA). We are still working to integrate the\Biocpkg{ReactomePA} package.
# Installation
To install use the code below.
```{r, eval = FALSE}
source("https://bioconductor.org/biocLite.R")
biocLite("MoonlightR")
```
## Citation
Please cite TCGAbiolinks package:
* "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data." Nucleic acids research (2015): [gkv1507](http://dx.doi.org/doi:10.1093/nar/gkv1507). [@ref25]
Related publications to this package:
* "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages". F1000Research [10.12688/f1000research.8923.1](http://dx.doi.org/doi:10.12688/f1000research.8923.1) [@ref24]
# `Download`: Get TCGA data
You can search TCGA data using the `getDataTCGA` function.
## `getDataTCGA`: Search by cancer type and data type [Gene Expression]
The user can query and download the cancer types supported by TCGA, using the function `getDataTCGA`:
In this example we used LUAD gene expression data with only 4 samples to reduce time downloading.
```{r, eval = FALSE}
dataFilt <- getDataTCGA(cancerType = "LUAD",
dataType = "Gene expression",
directory = "data",
nSample = 4)
```
## `getDataTCGA`: Search by cancer type and data type [Methylation]
The user can also query and download methylation data using the function `getDataTCGA`:
```{r, eval = FALSE}
dataFilt <- getDataTCGA(cancerType = "BRCA",
dataType = "Methylation",
directory = "data",nSample = 4)
```
# `Download`: Get GEO data
You can search GEO data using the `getDataGEO` function.
GEO_TCGAtab: a 18x12 matrix that provides the GEO data set we matched to one of the 18 given TCGA cancer types
```{r, eval = TRUE, echo = TRUE}
knitr::kable(GEO_TCGAtab, digits = 2,
caption = "Table with GEO data set matched to one
of the 18 given TCGA cancer types ",
row.names = TRUE)
```
## `getDataGEO`: Search by cancer type and data type [Gene Expression]
The user can query and download the cancer types supported by GEO, using the function `getDataGEO`:
```{r, eval = FALSE , echo = TRUE, results='hide', warning = FALSE, message = FALSE}
dataFilt <- getDataGEO(GEOobject = "GSE20347",platform = "GPL571")
```
```{r, eval = FALSE, echo = TRUE, results='hide', warning = FALSE, message = FALSE}
dataFilt <- getDataGEO(TCGAtumor = "ESCA")
```
# `Analysis`: To analyze TCGA data
## `DPA`: Differential Phenotype Analysis
Differential phenotype analysis is able to identify genes or probes that are significantly different between two phenotypes such as normal vs. tumor, or normal vs. stageI, normal vs. molecular subtype.
For gene expression data, DPA is running a differential expression analysis (DEA) to identify differentially expressed genes (DEGs) using the `TCGAanalyze_DEA` function from \Biocpkg{TCGAbiolinks}.
For methylation data, DPA is running a differentially methylated regions analysis (DMR) to identify
differentially methylated CpG sites using the `TCGAanalyze_DMR` the `TCGAanalyze_DMR` function from \Biocpkg{TCGAbiolinks}.
```{r, eval = FALSE, message=FALSE, results='hide', warning=FALSE}
dataDEGs <- DPA(dataFilt = dataFilt,
dataType = "Gene expression")
```
For gene expression data, DPA dealing with GEO data is running a differential expression analysis (DEA) to identify differentially expressed genes (DEGs) using to the `eBayes` and `topTable` functions from \Biocpkg{limma}.
```{r, eval = FALSE, echo = TRUE, hide=TRUE, results='hide', warning = FALSE, message = FALSE}
data(GEO_TCGAtab)
DataAnalysisGEO<- "../GEO_dataset/"
i<-5
cancer <- GEO_TCGAtab$Cancer[i]
cancerGEO <- GEO_TCGAtab$Dataset[i]
cancerPLT <-GEO_TCGAtab$Platform[i]
fileCancerGEO <- paste0(cancer,"_GEO_",cancerGEO,"_",cancerPLT, ".RData")
dataFilt <- getDataGEO(TCGAtumor = cancer)
GEOdegs <- DPA(dataConsortium = "GEO",
gset = dataFilt ,
colDescription = "title",
samplesType = c(GEO_TCGAtab$GEO_Normal[i],
GEO_TCGAtab$GEO_Tumor[i]),
fdr.cut = 0.01,
logFC.cut = 1,
gsetFile = paste0(DataAnalysisGEO,fileCancerGEO))
```
We can visualize those differentially expressed genes (DEGs) with a volcano plot using the `TCGAVisualize_volcano` function from \Biocpkg{TCGAbiolinks.}.
```{r, eval = TRUE, echo = TRUE}
library(TCGAbiolinks)
TCGAVisualize_volcano(DEGsmatrix$logFC, DEGsmatrix$FDR,
filename = "DEGs_volcano.png",
x.cut = 7,
y.cut = 10^-5,
names = rownames(DEGsmatrix),
color = c("black","red","dodgerblue3"),
names.size = 2,
xlab = " Gene expression fold change (Log2)",
legend = "State",
title = "Volcano plot (Normal NT vs Tumor TP)",
width = 10)
```
The figure generated by the code above is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
img <- readPNG("DEGs_volcano.png")
grid.raster(img)
```
## `LPA`: Literature Phenotype Analysis
The user can perform a literature phenotype analysis using the function `LPA`, for example for the biological process 'apoptosis'.
```{r, eval = TRUE, echo = TRUE, results='hide'}
data(DEGsmatrix)
BPselected <- "apoptosis"
BPannotations <- DiseaseList[[match(BPselected,names(DiseaseList))]]$ID
dataLPA <- LPA(dataDEGs = DEGsmatrix[1:5,],
BP = BPselected,
BPlist = BPannotations)
DiseaseListNew <- dataLPA
names(DiseaseListNew) <- BPselected
```
## `FEA`: Functional Enrichment Analysis
The user can perform a functional enrichment analysis using the function `FEAcomplete`. For each DEG in the gene set a z-score is calculated. This score indicates how the genes act in the gene set.
```{r, eval = TRUE, echo = TRUE, results='hide'}
data(DEGsmatrix)
dataFEA <- FEA(DEGsmatrix = DEGsmatrix)
```
The output can be visualized with a FEA plot.
## `FEAplot`: Functional Enrichment Analysis Plot
The user can plot the result of a functional enrichment analysis using the function `plotFEA`. A negative z-score indicates that the process' activity is decreased. A positive z-score indicates that the process' activity is increased.
```{r, eval = TRUE, echo = TRUE, message=FALSE, results='hide', warning=FALSE}
plotFEA(dataFEA = dataFEA, additionalFilename = "_exampleVignette", height = 20, width = 10)
```
The figure generated by the above code is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
img <- readPNG("FEAplot.png")
grid.raster(img)
```
## `GRN`: Gene Regulatory Network
The user can perform a gene regulatory network analysis using the function `GRN` which infers the network using the parmigene package.
```{r, eval = TRUE}
dataGRN <- GRN(TFs = rownames(DEGsmatrix)[1:100], normCounts = dataFilt,
nGenesPerm = 10,kNearest = 3,nBoot = 10)
```
## `URA`: Upstream Regulator Analysis
The user can perform upstream regulator analysis using the function `URA`. This function is applied to each DEG in the enriched gene set and its neighbors in the GRN.
```{r, eval = TRUE, echo = TRUE, results='hide'}
data(dataGRN)
data(DEGsmatrix)
dataURA <- URA(dataGRN = dataGRN,
DEGsmatrix = DEGsmatrix,
BPname = NULL, nCores=2)
```
## `PRA`: Pattern Regognition Analysis
The user can retrieve TSG/OCG candidates using either selected biological processes or a random forest classifier trained on known COSMIC OCGs/TSGs.
```{r, eval = TRUE}
data(dataURA)
dataDual <- PRA(dataURA = dataURA,
BPname = c("apoptosis","proliferation of cells"),
thres.role = 0)
```
## `plotNetworkHive`: GRN hive visualization taking into account Cosmic cancer genes
In the following plot the nodes are separated into three groups: known tumor suppressor genes (yellow), known oncogenes (green) and the rest (gray).
```{r, eval = TRUE, echo = TRUE, results='hide', warning = FALSE, message = FALSE}
data(knownDriverGenes)
data(dataGRN)
plotNetworkHive(dataGRN, knownDriverGenes, 0.55)
```
# TCGA Downstream Analysis: Case Studies
### Introduction
This vignette shows a complete workflow of the 'MoonlightR' package except for the data download.
The code is divided into three case study:
* 1. Downstream analysis LUAD using RNA expression data
* 2. Expression pipeline Pan Cancer with five cancer types
* 3. Expression pipeline with stages I II III IV (BRCA)
## Case study n. 1: Downstream analysis LUAD
```{r,eval = TRUE,echo=TRUE,message=FALSE,warning=FALSE, results='hide'}
dataDEGs <- DPA(dataFilt = dataFilt,
dataType = "Gene expression")
dataFEA <- FEA(DEGsmatrix = dataDEGs)
dataGRN <- GRN(TFs = rownames(dataDEGs)[1:100],
DEGsmatrix = dataDEGs,
DiffGenes = TRUE,
normCounts = dataFilt)
dataURA <- URA(dataGRN = dataGRN,
DEGsmatrix = dataDEGs,
BPname = c("apoptosis",
"proliferation of cells"))
dataDual <- PRA(dataURA = dataURA,
BPname = c("apoptosis",
"proliferation of cells"),
thres.role = 0)
CancerGenes <- list("TSG"=names(dataDual$TSG), "OCG"=names(dataDual$OCG))
```
## `plotURA`: Upstream regulatory analysis plot
The user can plot the result of the upstream regulatory analysis using the function `plotURA`.
```{r, eval = TRUE,message=FALSE,warning=FALSE, results='hide'}
plotURA(dataURA = dataURA[c(names(dataDual$TSG), names(dataDual$OCG)),, drop = FALSE], additionalFilename = "_exampleVignette")
```
The figure resulted from the code above is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
img <- readPNG("URAplot.png")
grid.raster(img)
```
## Case study n. 2: Expression pipeline Pan Cancer 5 cancer types
```{r,eval = FALSE,echo=TRUE,message=FALSE,warning=FALSE}
cancerList <- c("BLCA","COAD","ESCA","HNSC","STAD")
listMoonlight <- moonlight(cancerType = cancerList,
dataType = "Gene expression",
directory = "data",
nSample = 10,
nTF = 100,
DiffGenes = TRUE,
BPname = c("apoptosis","proliferation of cells"))
save(listMoonlight, file = paste0("listMoonlight_ncancer4.Rdata"))
```
## `plotCircos`: Moonlight Circos Plot
The results of the moonlight pipeline can be visualized with a circos plot.
Outer ring: color by cancer type, Inner ring: OCGs and TSGs,
Inner connections: green: common OCGs yellow: common TSGs red: possible dual role
```{r, eval = TRUE, echo = TRUE, results='hide', warning = FALSE, message = FALSE}
plotCircos(listMoonlight = listMoonlight, additionalFilename = "_ncancer5")
```
The figure generated by the code above is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
img <- readPNG("circos_ocg_tsg_ncancer5.png")
grid.raster(img)
```
## Case study n. 3: Downstream analysis BRCA with stages
```{r,eval = FALSE,echo=TRUE,message=FALSE,warning=FALSE}
listMoonlight <- NULL
for (i in 1:4){
dataDual <- moonlight(cancerType = "BRCA",
dataType = "Gene expression",
directory = "data",
nSample = 10,
nTF = 5,
DiffGenes = TRUE,
BPname = c("apoptosis","proliferation of cells"),
stage = i)
listMoonlight <- c(listMoonlight, list(dataDual))
save(dataDual, file = paste0("dataDual_stage",as.roman(i), ".Rdata"))
}
names(listMoonlight) <- c("stage1", "stage2", "stage3", "stage4")
# Prepare mutation data for stages
mutation <- GDCquery_Maf(tumor = "BRCA")
res.mutation <- NULL
for(stage in 1:4){
curStage <- paste0("Stage ", as.roman(stage))
dataClin$tumor_stage <- toupper(dataClin$tumor_stage)
dataClin$tumor_stage <- gsub("[ABCDEFGH]","",dataClin$tumor_stage)
dataClin$tumor_stage <- gsub("ST","Stage",dataClin$tumor_stage)
dataStg <- dataClin[dataClin$tumor_stage %in% curStage,]
message(paste(curStage, "with", nrow(dataStg), "samples"))
dataSmTP <- mutation$Tumor_Sample_Barcode
dataStgC <- dataSmTP[substr(dataSmTP,1,12) %in% dataStg$bcr_patient_barcode]
dataSmTP <- dataStgC
info.mutation <- mutation[mutation$Tumor_Sample_Barcode %in% dataSmTP,]
ind <- which(info.mutation[,"Consequence"]=="inframe_deletion")
ind2 <- which(info.mutation[,"Consequence"]=="inframe_insertion")
ind3 <- which(info.mutation[,"Consequence"]=="missense_variant")
res.mutation <- c(res.mutation, list(info.mutation[c(ind, ind2, ind3),c(1,51)]))
}
names(res.mutation) <- c("stage1", "stage2", "stage3", "stage4")
tmp <- NULL
tmp <- c(tmp, list(listMoonlight[[1]][[1]]))
tmp <- c(tmp, list(listMoonlight[[2]][[1]]))
tmp <- c(tmp, list(listMoonlight[[3]][[1]]))
tmp <- c(tmp, list(listMoonlight[[4]][[1]]))
names(tmp) <- names(listMoonlight)
mutation <- GDCquery_Maf(tumor = "BRCA")
plotCircos(listMoonlight=listMoonlight,listMutation=res.mutation, additionalFilename="proc2_wmutation", intensityColDual=0.2,fontSize = 2)
```
The results of the moonlight pipeline can be visualized with a circos plot.
Outer ring: color by cancer type, Inner ring: OCGs and TSGs,
Inner connections: green: common OCGs yellow: common TSGs red: possible dual role
The figure generated by the code above is shown below:
```{r, fig.width=6, fig.height=4, echo = FALSE, fig.align="center",hide=TRUE, message=FALSE,warning=FALSE}
img <- readPNG("circos_ocg_tsg_stages.png")
grid.raster(img)
```
******
Session Information
******
```{r sessionInfo}
sessionInfo()
```
# References