---
title: "3.4 - Motif enrichment analysis on the selected probes"
output:
html_document:
self_contained: true
number_sections: no
theme: flatly
highlight: tango
mathjax: default
toc: true
toc_float: true
toc_depth: 2
css: style.css
bibliography: bibliography.bib
vignette: >
%\VignetteIndexEntry{"3.4 - Motif enrichment analysis on the selected probes"}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, echo = FALSE,hide=TRUE, message=FALSE, warning=FALSE}
library(ELMER.data)
library(ELMER)
library(DT)
library(dplyr)
library(BiocStyle)
```
# Motif enrichment analysis on the selected probes
## Introduction
This step is to identify enriched motif in a set of probes which is carried out by
function `get.enriched.motif`.
## Description
In order to identify enriched motifs and potential upstream regulatory TFs, all probes with occurring in significant probe-gene pairs are combined for motif enrichment analysis. HOMER (Hypergeometric Optimization of Motif EnRichment) [@heinz2010simple] is used to find motif occurrences in a $\pm 250bp$ region around each probe, using HOCOMOCO (HOmo sapiens COmprehensive MOdel COllection) v11 [@kulakovskiy2016hocomoco]. Transcription factor (TF) binding models are available at https://hocomoco.autosome.org/downloads. HOCOMOCO is the most comprehensive TFBS database and is consistently updated, marking an improvement over ELMER version 1.
For each probe set tested (i.e. the set of all probes occurring in significant probe-gene pairs), we quantify enrichments using Fisher's exact test (where $a$ is the number of probes within the selected probe set that contains one or more motif occurrences; $b$ is the number of probes within the selected probe set that do not contain a motif occurrence; $c$ and $d$ are the same counts within
the entire array probe set drawn from the same set of distal-only probes using the same definition as the primary analysis) and multiple testing correction with the Benjamini-Hochberg procedure [@fisher].
A probe set was considered significantly enriched
for a particular motif if the 95\% confidence interval of the Odds Ratio was greater than $1.1$ (specified by option `lower.OR`, $1.1$ is default), the motif
occurred at least 10 times (specified by option `min.incidence`, $10$ is default) in
the probe set and $FDR < 0.05$.
# Function arguments
Main get.pair arguments
| Argument | Description |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| data | A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually. |
| probes | A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated. |
| lower.OR | A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference). |
| min.incidence | A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default. |
# Example of use
```{r,eval=TRUE, message=FALSE, warning = FALSE,results = "hide"}
# Load results from previous sections
mae <- get(load("mae.rda"))
sig.diff <- read.csv("result/getMethdiff.hypo.probes.significant.csv")
pair <- read.csv("result/getPair.hypo.pairs.significant.csv")
head(pair) # significantly hypomethylated probes with putative target genes
# Identify enriched motif for significantly hypomethylated probes which
# have putative target genes.
enriched.motif <- get.enriched.motif(
data = mae,
probes = pair$Probe,
dir.out = "result",
label = "hypo",
min.incidence = 10,
lower.OR = 1.1
)
```
```{r,eval=TRUE, message=FALSE, warning = FALSE}
names(enriched.motif) # enriched motifs
head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.
# get.enriched.motif automatically save output files.
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif.
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif")
# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf")
```
# Bibliography