--- title: "Data for mCSEA package" author: - name: Jordi Martorell-Marugán affiliation: - Bioinformatics and Health Data Science. GENYO, Centre for Genomics and Oncological Research - name: Pedro Carmona-Sáez affiliation: - Bioinformatics and Health Data Science. GENYO, Centre for Genomics and Oncological Research email: pedro.carmona@genyo.es package: mCSEAdata date: "`r doc_date()`" abstract: > _mCSEAdata_ package contains the necessary files to run the core analysis in _mCSEA_ package. It also contains example data used by _mCSEA_ to show it's functionality. output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Data for mCSEA package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Package contents ```{r} library(mCSEAdata) data(mcseadata) data(bandTable) ``` Firstly, **betaTest**, **phenoTest** and **exprTest** are the objects necessary to run the examples in _mCSEA_ package. **betaTest** is a matrix with the beta-values of 9241 EPIC probes for 20 samples. **exprTest** is a subset of 100 genes' expression from bone marrows of 10 healthy and 10 leukemia patients. **phenoTest** is a dataframe with the explanatory variable and covariates associated to the samples. ```{r} class(betaTest) dim(betaTest) head(betaTest, 3) class(phenoTest) dim(phenoTest) head(phenoTest, 3) class(exprTest) dim(exprTest) head(exprTest, 3) ``` On the other hand, there are 6 association objects. Each one is a list of features with their associated 450k or EPIC CpG probes. The features included are promoters (**assocPromoters450k** and **assocPromotersEPIC**), gene bodies (**assocGenes450k** and **assocGenesEPIC**) and CpG islands (**assocCGI450k** and **assocCGIEPIC**). These objects are internally used by _mCSEA.test_ function in _mCSEA_ package. ```{r} class(assocPromoters450k) length(assocPromoters450k) head(assocPromoters450k, 3) class(assocGenes450k) length(assocGenes450k) head(assocGenes450k, 3) class(assocCGI450k) length(assocCGI450k) head(assocCGI450k, 3) class(assocPromotersEPIC) length(assocPromotersEPIC) head(assocPromotersEPIC, 3) class(assocGenesEPIC) length(assocGenesEPIC) head(assocGenesEPIC, 3) class(assocCGIEPIC) length(assocCGIEPIC) head(assocCGIEPIC, 3) ``` There are also 2 GRanges objects with the locations of 450K and EPIC probes, used by _mCSEAPlot()_ and _mCSEAIntegrate()_ functions: ```{r, message = FALSE} class(annot450K) head(annot450K, 3) class(annotEPIC) head(annotEPIC, 3) ``` Finally, **bandTable** object contains chromosomes band information and centromer location. It is used by mCSEAPlot() function to plot the chromosome track. ```{r} head(bandTable) ``` # Sources * Example objects: + **betaTest** contains simulated beta-values for EPIC platform probes. + **exprTest** contains expression data from Leukemia and healthy patients extracted from `r Biocpkg("leukemiaEset")` package. + **phenoTest** contains arbitrary phenotypes for each sample. * Association objects: They were all constructed from `r Biocpkg("IlluminaHumanMethylation450kanno.ilmn12.hg19")` and `r Biocpkg("IlluminaHumanMethylationEPICanno.ilm10b2.hg19")` packages annotation data. For that purpose, a _RGChannelSet_ object was obtained with `r Biocpkg("minfi")` package and _getAnnotation()_ function was applied to such object in order to get the annotation DataFrame. That was done for both 450k and EPIC platforms. The annotation DataFrame contains several information about each CpG probe, and we used that information to associate each probe to one or more promoter, gene body or CpG Island following this scheme: Region type | mCSEAdata association objects | Column from association DataFrame used | Column values | Feature name column ----------- | ----------------------------- | -------------------------------------- | ------------- | ------------------- Promoters | assocPromoters450k and assocPromotersEPIC | UCSC_RefGene_Group | TSS1500, TSS200, 5'UTR or 1stExon | UCSC_RefGene_Name Gene bodies | assocGenes450k and assocGenesEPIC | UCSC_RefGene_Group | Body | UCSC_RefGene_Name CpG Islands | assocCGI450k and assocCGIEPIC | Relation_to_Island | Island, N_Shore, S_Shore, N_Shelf or S_Shelf | Islands_Name For instance, _cg00212031_ probe from 450k platform has the following annotation data in the association DataFrame: UCSC_RefGene_Group | UCSC_RefGene_Name | Relation_to_Island | Islands_Name ------------------ | ----------------- | ------------------ | ------------ TSS200 | TTTY14 | Island | chrY:21238448-21240005 So this probe is associated to TTTY14 promoter in assocPromoters450k object and to chrY:21238448-21240005 CpG Island in assocCGI450k object. * Annotation objects (**annot450K** and **annotEPIC**): They were both constructed with `r Biocpkg("minfi")` package. A _RGChannelSet_ object was obtained for each platform and _getLocations()_ function was applied to such objects. * bandTable: It was constructed with `r Biocpkg("Gviz")` package, concretely with _IdeogramTrack()_ function. # Session info ```{r sessionInfo, echo=FALSE} sessionInfo() ```