--- title: "Data for mCSEA package" author: - name: Jordi Martorell-Marugán affiliation: - Bioinformatics and Health Data Science. GENYO, Centre for Genomics and Oncological Research - name: Raúl López-Domínguez affiliation: - Genomics Unit. GENYO, Centre for Genomics and Oncological Research email: raul.lopez@genyo.es - name: Pedro Carmona-Sáez affiliation: - Bioinformatics and Health Data Science. GENYO, Centre for Genomics and Oncological Research email: pedro.carmona@genyo.es package: mCSEAdata date: "`r doc_date()`" abstract: > _mCSEAdata_ package contains the necessary files to run the core analysis in _mCSEA_ package. It also contains example data used by _mCSEA_ to show it's functionality. output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Data for mCSEA package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Package contents ```{r} library(mCSEAdata) data(mcseadata) data(bandTable) ``` Firstly, **betaTest**, **phenoTest** and **exprTest** are the objects necessary to run the examples in _mCSEA_ package. **betaTest** is a matrix with the beta-values of 9241 EPIC probes for 20 samples. **exprTest** is a subset of 100 genes' expression from bone marrows of 10 healthy and 10 leukemia patients. **phenoTest** is a dataframe with the explanatory variable and covariates associated to the samples. ```{r} class(betaTest) dim(betaTest) head(betaTest, 3) class(phenoTest) dim(phenoTest) head(phenoTest, 3) class(exprTest) dim(exprTest) head(exprTest, 3) ``` On the other hand, there are 9 association objects. Each one is a list of features with their associated 450k, EPIC and EPICv2 CpG probes. The features included are promoters (**assocPromoters450k**, **assocPromotersEPIC** and **assocPromotersEPICv2**), gene bodies (**assocGenes450k**, **assocGenesEPIC** and **assocGenesEPICv2**) and CpG islands (**assocCGI450k**, **assocCGIEPIC** and **assocCGIEPICv2**). These objects are internally used by _mCSEA.test_ function in _mCSEA_ package. ```{r} class(assocPromoters450k) length(assocPromoters450k) head(assocPromoters450k, 3) class(assocGenes450k) length(assocGenes450k) head(assocGenes450k, 3) class(assocCGI450k) length(assocCGI450k) head(assocCGI450k, 3) class(assocPromotersEPIC) length(assocPromotersEPIC) head(assocPromotersEPIC, 3) class(assocGenesEPIC) length(assocGenesEPIC) head(assocGenesEPIC, 3) class(assocCGIEPIC) length(assocCGIEPIC) head(assocCGIEPIC, 3) class(assocPromotersEPICv2) length(assocPromotersEPICv2) head(assocPromotersEPICv2, 3) class(assocGenesEPICv2) length(assocGenesEPICv2) head(assocGenesEPICv2, 3) class(assocCGIEPICv2) length(assocCGIEPICv2) head(assocCGIEPICv2, 3) ``` There are also 3 GRanges objects with the locations of 450K, EPIC and EPICv2 probes, used by _mCSEAPlot()_ and _mCSEAIntegrate()_ functions: ```{r, message = FALSE} class(annot450K) head(annot450K, 3) class(annotEPIC) head(annotEPIC, 3) class(annotEPICv2) head(annotEPICv2, 3) ``` Finally, **bandTable** object contains two objects with the information of chromosomes band information and centromer location, one for each genome building: hg19 and hg38 It is used by mCSEAPlot() function to plot the chromosome track. ```{r} head(bandTablehg19) head(bandTablehg38) ``` # Sources * Example objects: + **betaTest** contains simulated beta-values for EPIC platform probes. + **exprTest** contains expression data from Leukemia and healthy patients extracted from `r Biocpkg("leukemiaEset")` package. + **phenoTest** contains arbitrary phenotypes for each sample. * Association objects: They were all constructed from `r Biocpkg("IlluminaHumanMethylation450kanno.ilmn12.hg19")`, `r Biocpkg("IlluminaHumanMethylationEPICanno.ilm10b2.hg19")` and `r Biocpkg("IlluminaHumanMethylationEPICv2anno.20a1.hg38")` packages annotation data. For that purpose, a _RGChannelSet_ object was obtained with `r Biocpkg("minfi")` package and _getAnnotation()_ function was applied to such object in order to get the annotation DataFrame. That was done for both 450k, EPIC and EPICv2 platforms. The annotation DataFrame contains several information about each CpG probe, and we used that information to associate each probe to one or more promoter, gene body or CpG Island following this scheme: Region type | mCSEAdata association objects | Column from association DataFrame used | Column values | Feature name column ----------- | ----------------------------- | -------------------------------------- | ------------- | ------------------- Promoters | assocPromoters450k, assocPromotersEPIC and assocPromotersEPICv2 | UCSC_RefGene_Group | TSS1500, TSS200, 5'UTR, 1stExon or exon_1 | UCSC_RefGene_Name Gene bodies | assocGenes450k, assocGenesEPIC and assocGenesEPICv2 | UCSC_RefGene_Group | Body | UCSC_RefGene_Name CpG Islands | assocCGI450k, assocCGIEPIC and assocCGIEPICv2 | Relation_to_Island | Island, N_Shore, S_Shore, N_Shelf or S_Shelf | Islands_Name For instance, _cg00212031_ probe from 450k platform has the following annotation data in the association DataFrame: UCSC_RefGene_Group | UCSC_RefGene_Name | Relation_to_Island | Islands_Name ------------------ | ----------------- | ------------------ | ------------ TSS200 | TTTY14 | Island | chrY:21238448-21240005 So this probe is associated to TTTY14 promoter in assocPromoters450k object and to chrY:21238448-21240005 CpG Island in assocCGI450k object. * Annotation objects (**annot450K**, **annotEPIC** and **annotEPICv2**): They were both constructed with `r Biocpkg("minfi")` package. A _RGChannelSet_ object was obtained for each platform and _getLocations()_ function was applied to such objects. * bandTablehg19 and bandTablehg38: It was constructed with `r Biocpkg("Gviz")` package, concretely with _IdeogramTrack()_ function. # Session info ```{r sessionInfo, echo=FALSE} sessionInfo() ```