%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{3. Methylation Arrays -- Lab}
# Methylation Arrays -- Lab
Epigenomics 2014
Author: Martin Morgan (mtmorgan@fhcrc.org)
Date: 24 August, 2014
```{r, echo=FALSE}
knitr::opts_chunk$set(cache=TRUE)
```
```{r, echo=FALSE}
suppressPackageStartupMessages({
require(minfi)
require(minfiData)
})
```
This case study takes a quick tour through the
[minfi](http://bioconductor.org/packages/release/bioc/html/minfi.html)
Bioconductor package. The main goal is to become familiar with the use
of Bioconductor objects and methods; see the
[vignette](http://bioconductor.org/packages/release/bioc/vignettes/epivizr/inst/doc/minfi.pdf)
accompanying the minfi package for more background on analysis of
Illumina arrays for methylation analysis.
Start by attaching the `minfi` and `minfiData` packages. Use
`browseVignettes("minfi")` to access the vignette for additional
background information.
```{r, eval=FALSE}
require(minfi)
require(minfiData)
browseVignettes("minfi")
```
The first step in any work flow is to read in the data. A sample data
set is provided at the following location.
```{r}
baseDir <- system.file("extdata", package = "minfiData")
baseDir
dir(baseDir)
dir(file.path(baseDir, "5723646052"))
```
Of course your own data would be at another location, and you might
enter the path (with 'tab completion') instead of using
`system.file()`. A typical organization is that each 'slide'
(containing 12 arrays) is stored in a separate directory. The
top-level directory contains a `.csv` file describing the samples;
inside each slide directory are IDAT files representing the output of
the Illumina scanner.
Next read in the sample sheet, and then the raw probe-level data. Take
a moment to use R to explore the sample sheet. Read the 'man' page for
`read.450k.sheet` and `read.450k.exp` to see what options are
available.
```{r}
## 'pData'
targets <- read.450k.sheet(baseDir)
head(targets)
## 'raw' probe-level data
RGset <- read.450k.exp(base = baseDir, targets = targets)
```
As a basic quality assessment, visualize the distribution of beta
values across each array, coloring the density functions by sample.
Are there any concerns about the data?
```{r}
## Basic QA -- comparable densities across samples?
densityPlot(RGset, sampGroups = RGset$Sample_Group,
main = "Beta", xlab = "Beta")
```
A _technical artifact_ is that probe intensities differ depending on
their sequence composition, so it is necessary to perform a
'background correction'. Also, the distribution of probe intensities
differ from one another as a consequence of sample preparation steps,
e.g., slightly different initial amounts of DNA from one sample
compared to another. Basic steps in microarray analysis (many of these
steps are shared by other high-throughput assays) are therefore
_background correction_ and between-array _normalization_. Use the
`preprocessIllumina()` command to perform these steps. Are there other
normalization strategies available in this package?
```{r}
## background correction and normalization
## like Illumina Genome Studio (other approaches available)
MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE,
normalize = "controls", reference = 2)
```
Once data are background-corrected and normalized, it is possible to
compare the vector of methylation values of each sample. Use the
`mdsPlot()` function to visualize the multidimensional relationship
between samples in reduced dimensions. Use arguments of `mdsPlot()` to
name and highlight the different sample groups.
```{r}
## How similar are the samples to one another?
mdsPlot(MSet.norm, numPositions = 1000, sampGroups = MSet.norm$Sample_Group,
sampNames = MSet.norm$Sample_Name)
```
Take a portion of the data (the first 100,000 probes), retrieve the
logit-transformed beta values, and then use `dmpFinder()` CpGs
where methylation status is associated with sample group. From the
help page, references, and your own knowledge, any ideas about
`shrinkVar`?
```{r}
## Identify probes with methylation status differing between groups
mset <- MSet.norm[1:100000,]
## logit(beta)
M <- getM(mset, type = "beta", betaThreshold = 0.001)
dmp <- dmpFinder(M, pheno=mset$Sample_Group, type="categorical")
head(dmp)
```
Visualize differential methylation using `plotCpg()`
```{r}
plotCpg(mset, cpg=rownames(dmp)[1], pheno=mset$Sample_Group)
```
Probes interrogate genomic locations. Use `mapToGenome()` to translate
the probe identifiers to genomic coordinates. This transforms `mset`
into an object of class `SummarizedExperiment`. A
`SummarizedExperiment` is similar to an expression set, bui with
`GRanges` to annotate rows, rather than a `data.frame`. Use
`rowData()` to extract the `GRanges` from the `SummarizedExperiment`;
explore it. Add the annotations about differentially expressed probes
to the row data of the `SummarizedExperiment`.
```{r}
## Genomic locations
mset <- mset[rownames(dmp),]
mse <- mapToGenome(mset) # 'SummarizedExperiment'
rowData(mse)
mcols(rowData(mse)) <- cbind(mcols(rowData(mse)), dmp)
```