The Cancer Genome Atlas (TCGA) (Weinstein et al. 2013), The Encyclopedia of DNA Elements (ENCODE) (Consortium and others 2011), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap) (Fingerman et al. 2011; Bernstein et al. 2010) and other organized international consortia have led the explosion of sequencing based biological data and thereby have provided unprecedented access to the largest publicly available genomic, transcriptomic and epigenomic data to date.
These projects have provided amazing opportunities for researchers to interrogate the epigenome of cultured cancer cell lines, normal and tumor fresh tissues with high genomic resolution. However, the use of such data in analyzes, comprises the arduous task of searching, downloading and processing them in a reproducible manner. Furthermore, most bioinformatics tools are designed for specific data types (e.g. expression, epigenetics, genomics) which provides only a partial view of the biological process that takes place.
Performing an integrated analysis of molecular datasets along with clinical information, has been shown to improve the prognostic and predictive accuracy for cancer phenotypes if compared to clinical features alone. This workshop will focus on helping researchers to perform an integrative analysis of both molecular and clinical data available through the TCGA by harnessing open source packages within the Bioconductor platform.
Participants will learn to search and download DNA methylation (epigenetic) and gene expression (transcription) data from the newly created NCI Genomic Data Commons (GDC) portal and prepare them into a Summarized Experiment object. We will introduce the workflow using our recently developed TCGAbiolinks (Colaprico et al. 2016) and if time permitted, we will highlight the Graphics User Interface version (TCGAbiolinksGUI) (Silva et al. 2017).
A second Bioconductor package will also be introduced called ELMER (L. Yao et al. 2015,Chedraoui Silva et al. (2017)) which allows one to identify DNA methylation changes in distal regulatory regions and correlate these signatures with expression of nearby genes to identify transcriptional targets associated with cancer. For these distal probes correlated with a gene, a transcription factor motif analysis is performed followed by expression analysis of transcription factors to infer upstream regulators.
We expect that participants of this workshop will understand the integrative analysis performed by using TCGAbiolinks + ELMER, as well as be able to execute it from the data acquisition process to the final interpretation of the results. The workshop assumes users with an intermediate level of familiarity with R, and basic understanding of tumor biology.
Section | Duration |
---|---|
Introduction | 10 min |
Get data - TCGAbiolinks/TCGAbiolinksGUI packages | 20 min |
Analysis - ELMER package (Slides) | 1 hour |
To install this package and build the vignette run the following commands:
source("http://bioconductor.org/biocLite.R")
useDevel()
biocLite("devtools")
biocLite("BioinformaticsFMRP/Bioc2017.TCGAbiolinks.ELMER",
dependencies = TRUE, build_vignettes = TRUE)
The vignette can be laucnhed with the following command:
library("Bioc2017.TCGAbiolinks.ELMER")
Biobase::openVignette("Bioc2017.TCGAbiolinks.ELMER")
The complete R code for the workshop can be found in this gist.
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Bioc2017.TCGAbiolinks.ELMER_0.0.0.9000
## [2] SummarizedExperiment_1.7.5
## [3] DelayedArray_0.3.19
## [4] matrixStats_0.52.2
## [5] Biobase_2.37.2
## [6] GenomicRanges_1.29.12
## [7] GenomeInfoDb_1.13.4
## [8] IRanges_2.11.12
## [9] S4Vectors_0.15.5
## [10] BiocGenerics_0.23.0
## [11] TCGAbiolinks_2.5.6
## [12] bindrcpp_0.2
## [13] MultiAssayExperiment_1.3.20
## [14] dplyr_0.7.2
## [15] DT_0.2
## [16] ELMER_2.0.1
## [17] ELMER.data_2.0.1
##
## loaded via a namespace (and not attached):
## [1] shinydashboard_0.6.1 R.utils_2.5.0
## [3] RSQLite_2.0 AnnotationDbi_1.39.2
## [5] htmlwidgets_0.9 grid_3.4.1
## [7] trimcluster_0.1-2 BiocParallel_1.11.4
## [9] devtools_1.13.2 DESeq_1.29.0
## [11] munsell_0.4.3 codetools_0.2-15
## [13] withr_2.0.0 colorspace_1.3-2
## [15] BiocInstaller_1.27.2 knitr_1.16
## [17] robustbase_0.92-7 labeling_0.3
## [19] GenomeInfoDbData_0.99.1 KMsurv_0.1-5
## [21] mnormt_1.5-5 hwriter_1.3.2
## [23] bit64_0.9-7 rprojroot_1.2
## [25] downloader_0.4 biovizBase_1.25.1
## [27] ggthemes_3.4.0 EDASeq_2.11.0
## [29] diptest_0.75-7 R6_2.2.2
## [31] doParallel_1.0.10 locfit_1.5-9.1
## [33] AnnotationFilter_1.1.3 flexmix_2.3-14
## [35] reshape_0.8.6 bitops_1.0-6
## [37] assertthat_0.2.0 scales_0.4.1
## [39] nnet_7.3-12 gtable_0.2.0
## [41] ensembldb_2.1.10 rlang_0.1.1
## [43] genefilter_1.59.0 cmprsk_2.2-7
## [45] GlobalOptions_0.0.12 splines_3.4.1
## [47] rtracklayer_1.37.3 lazyeval_0.2.0
## [49] acepack_1.4.1 dichromat_2.0-0
## [51] selectr_0.3-1 broom_0.4.2
## [53] checkmate_1.8.3 yaml_2.1.14
## [55] reshape2_1.4.2 GenomicFeatures_1.29.8
## [57] backports_1.1.0 httpuv_1.3.5
## [59] Hmisc_4.0-3 tools_3.4.1
## [61] psych_1.7.5 ggplot2_2.2.1
## [63] RColorBrewer_1.1-2 Rcpp_0.12.12
## [65] plyr_1.8.4 base64enc_0.1-3
## [67] progress_1.1.2 zlibbioc_1.23.0
## [69] purrr_0.2.2.2 RCurl_1.95-4.8
## [71] prettyunits_1.0.2 ggpubr_0.1.4
## [73] rpart_4.1-11 GetoptLong_0.1.6
## [75] viridis_0.4.0 zoo_1.8-0
## [77] ggrepel_0.6.5 cluster_2.0.6
## [79] magrittr_1.5 data.table_1.10.4
## [81] circlize_0.4.1 survminer_0.4.0
## [83] mvtnorm_1.0-6 whisker_0.3-2
## [85] ProtGenerics_1.9.0 aroma.light_3.7.0
## [87] hms_0.3 mime_0.5
## [89] evaluate_0.10.1 xtable_1.8-2
## [91] XML_3.98-1.9 mclust_5.3
## [93] gridExtra_2.2.1 shape_1.4.2
## [95] compiler_3.4.1 biomaRt_2.33.3
## [97] tibble_1.3.3 R.oo_1.21.0
## [99] htmltools_0.3.6 Formula_1.2-2
## [101] tidyr_0.6.3 geneplotter_1.55.0
## [103] DBI_0.7 matlab_1.0.2
## [105] ComplexHeatmap_1.15.0 MASS_7.3-47
## [107] fpc_2.1-10 BiocStyle_2.5.8
## [109] ShortRead_1.35.1 Matrix_1.2-10
## [111] readr_1.1.1 R.methodsS3_1.7.1
## [113] Gviz_1.21.1 bindr_0.1
## [115] km.ci_0.5-2 pkgconfig_2.0.1
## [117] GenomicAlignments_1.13.4 foreign_0.8-69
## [119] plotly_4.7.1 xml2_1.1.1
## [121] roxygen2_6.0.1 foreach_1.4.3
## [123] annotate_1.55.0 XVector_0.17.0
## [125] rvest_0.3.2 stringr_1.2.0
## [127] VariantAnnotation_1.23.6 digest_0.6.12
## [129] ConsensusClusterPlus_1.41.0 Biostrings_2.45.3
## [131] rmarkdown_1.6 survMisc_0.5.4
## [133] htmlTable_1.9 dendextend_1.5.2
## [135] edgeR_3.19.3 curl_2.8.1
## [137] kernlab_0.9-25 shiny_1.0.3
## [139] Rsamtools_1.29.0 commonmark_1.2
## [141] modeltools_0.2-21 rjson_0.2.15
## [143] nlme_3.1-131 jsonlite_1.5
## [145] viridisLite_0.2.0 limma_3.33.7
## [147] BSgenome_1.45.1 lattice_0.20-35
## [149] httr_1.2.1 DEoptimR_1.0-8
## [151] survival_2.41-3 interactiveDisplayBase_1.15.0
## [153] glue_1.1.1 prabclus_2.2-6
## [155] iterators_1.0.8 bit_1.1-12
## [157] class_7.3-14 stringi_1.1.5
## [159] blob_1.1.0 AnnotationHub_2.9.5
## [161] latticeExtra_0.6-28 memoise_1.1.0
Bernstein, B. E., J. A. Stamatoyannopoulos, J. F. Costello, B. Ren, A. Milosavljevic, A. Meissner, M. Kellis, et al. 2010. “The NIH Roadmap Epigenomics Mapping Consortium.” Nat. Biotechnol. 28 (10): 1045–8.
Chedraoui Silva, Tiago, Simon G. Coetzee, Lijing Yao, Dennis J. Hazelett, Houtan Noushmehr, and Benjamin P. Berman. 2017. “Enhancer Linking by Methylation/Expression Relationships with the R Package Elmer Version 2.” BioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/148726.
Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot, et al. 2016. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of Tcga Data.” Nucleic Acids Research 44 (8): e71. doi:10.1093/nar/gkv1507.
Consortium, ENCODE Project, and others. 2011. “A User’s Guide to the Encyclopedia of Dna Elements (Encode).” PLoS Biol 9 (4): e1001046.
Fingerman, I. M., L. McDaniel, X. Zhang, W. Ratzat, T. Hassan, Z. Jiang, R. F. Cohen, and G. D. Schuler. 2011. “NCBI Epigenomics: a new public resource for exploring epigenomic data sets.” Nucleic Acids Res. 39 (Database issue): D908–912.
Silva, Tiago C., Antonio Colaprico, Catharina Olsen, Gianluca Bontempi, Michele Ceccarelli, Benjamin P. Berman, and Houtan Noushmehr. 2017. “TCGAbiolinksGUI: A Graphical User Interface to Analyze Cancer Molecular and Clinical Data.” BioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/147496.
Weinstein, John N, Eric A Collisson, Gordon B Mills, Kenna R Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, et al. 2013. “The Cancer Genome Atlas Pan-Cancer Analysis Project.” Nature Genetics 45 (10). Nature Publishing Group: 1113–20.
Yao, L, H Shen, PW Laird, PJ Farnham, and BP Berman. 2015. “Inferring Regulatory Element Landscapes and Transcription Factor Networks from Cancer Methylomes.” Genome Biology 16 (1): 105–5.