Introduction

Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is an alternative or complementary technique to MNase-seq, DNase-seq, and FAIRE-seq for chromatin accessibility analysis. The results obtained from ATAC-seq are similar to those from DNase-seq and FAIRE-seq. ATAC-seq is gaining popularity because it does not require cross-linking, has higher signal to noise ratio, requires a much smaller amount of biological material and is faster and easier to perform, compared to other techniques1.

To help researchers quickly assess the quality of ATAC-seq data, we have developed the ATACseqQC package for easily making diagnostic plots following the published guidelines1. In addition, it has functions to preprocess ATACseq data for subsequent peak calling.

Quick start

Here is an example using ATACseqQC with a subset of published ATAC-seq data1. Currently, only bam input file format is supported.

First install ATACseqQC and other packages required to run the examples. Please note that the example dataset used here is from human. To run analysis with dataset from a different species or differnt assembly, please install the corresponding BSgenome, TxDb and phastCons. For example, to analyze mouse data aligned to mm10, please install BSgenome.Mmusculus.UCSC.mm10, TxDb.Mmusculus.UCSC.mm10.knownGene and phastCons60way.UCSC.mm10. Please note that phstCons60way.UCSC.mm10 is optional, which can be obtained according to the vignettes of GenomicScores.

library(BiocManager)
BiocManager::install(c("ATACseqQC", "ChIPpeakAnno", "MotifDb", "GenomicAlignments",
           "BSgenome.Hsapiens.UCSC.hg19", "TxDb.Hsapiens.UCSC.hg19.knownGene",
           "phastCons100way.UCSC.hg19"))
## load the library
library(ATACseqQC)
## input the bamFile from the ATACseqQC package 
bamfile <- system.file("extdata", "GL1.bam", 
                        package="ATACseqQC", mustWork=TRUE)
bamfile.labels <- gsub(".bam", "", basename(bamfile))

IGV snapshot

Source code of IGVSnapshot function is available in extdata folder. To call the function, please try

source(system.file("extdata", "IGVSnapshot.R", package = "ATACseqQC"))

Estimate the library complexity

#bamQC(bamfile, outPath=NULL)
estimateLibComplexity(readsDupFreq(bamfile))

Fragment size distribution

First, there should be a large proportion of reads with less than 100 bp, which represents the nucleosome-free region. Second, the fragment size distribution should have a clear periodicity, which is evident in the inset figure, indicative of nucleosome occupacy (present in integer multiples).

## generate fragement size distribution
fragSize <- fragSizeDist(bamfile, bamfile.labels)