The purpose of this quick start is to introduce the four newly implemented functions, toRanges
, annoGO
, annotatePeakInBatch
, and addGeneIDs
in the new version of the ChIPpeakAnno. With those wrapper functions, the annotation of ChIP-Seq peaks becomes streamlined into four major steps:
1 Read peak data with toGRanges
2 Generate annotation data with toGRanges
3 Annotate peaks with annotatePeakInBatch
4 Add additional information with addGeneIDs
Most of time user can use the default settings of the arguments of those functions. This makes the annotation pipeline straightforward and easy to use.
Note that the version of the annotation data must match with the genome used for mapping because the coordinates may differ for different genome releases. For example, if you are using Mus_musculus.v103 for mapping, you’d best also use EnsDb.Mmusculus.v103 for annotation. For more information about how to prepare the annotation data, please refer ?getAnnotation.
GRanges
with toGRanges
## First, load the ChIPpeakAnno package
library(ChIPpeakAnno)
path <- system.file("extdata", "Tead4.broadPeak", package="ChIPpeakAnno")
peaks <- toGRanges(path, format="broadPeak")
peaks[1:2]
## GRanges object with 2 ranges and 4 metadata columns:
## seqnames ranges strand | score signalValue pValue
## <Rle> <IRanges> <Rle> | <integer> <numeric> <numeric>
## peak12338 chr2 175473-176697 * | 206 668.42 -1
## peak12339 chr2 246412-246950 * | 31 100.23 -1
## qValue
## <numeric>
## peak12338 -1
## peak12339 -1
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
toGRanges
library(EnsDb.Hsapiens.v75)
annoData <- toGRanges(EnsDb.Hsapiens.v75)
annoData[1:2]
## GRanges object with 2 ranges and 1 metadata column:
## seqnames ranges strand | gene_name
## <Rle> <IRanges> <Rle> | <character>
## ENSG00000223972 chr1 11869-14412 + | DDX11L1
## ENSG00000227232 chr1 14363-29806 - | WASH7P
## -------
## seqinfo: 273 sequences from 2 genomes (hg19, GRCh37)
annotatePeakInBatch
## keep the seqnames in the same style
seqlevelsStyle(peaks) <- seqlevelsStyle(annoData)
## do annotation by nearest TSS
anno <- annotatePeakInBatch(peaks, AnnotationData=annoData)
anno[1:2]
## GRanges object with 2 ranges and 13 metadata columns:
## seqnames ranges strand | score
## <Rle> <IRanges> <Rle> | <integer>
## peak12338.ENSG00000227061 chr2 175473-176697 * | 206
## peak12339.ENSG00000143727 chr2 246412-246950 * | 31
## signalValue pValue qValue peak
## <numeric> <numeric> <numeric> <character>
## peak12338.ENSG00000227061 668.42 -1 -1 peak12338
## peak12339.ENSG00000143727 100.23 -1 -1 peak12339
## feature start_position end_position
## <character> <integer> <integer>
## peak12338.ENSG00000227061 ENSG00000227061 197569 202605
## peak12339.ENSG00000143727 ENSG00000143727 264140 278283
## feature_strand insideFeature distancetoFeature
## <character> <character> <numeric>
## peak12338.ENSG00000227061 + upstream -22096
## peak12339.ENSG00000143727 + upstream -17728
## shortestDistance fromOverlappingOrNearest
## <integer> <character>
## peak12338.ENSG00000227061 20872 NearestLocation
## peak12339.ENSG00000143727 17190 NearestLocation
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
# A pie chart can be used to demonstrate the overlap features of the peaks.
pie1(table(anno$insideFeature))