derfinderPlot 1.34.0
R
is an open-source statistical environment which can be easily modified to enhance its functionality via packages. derfinderPlot is a R
package available via the Bioconductor repository for packages. R
can be installed on any operating system from CRAN after which you can install derfinderPlot by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("derfinderPlot")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
derfinderPlot is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. A derfinderPlot user is not expected to deal with those packages directly but will need to be familiar with derfinder and for some plots with ggbio.
If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.
As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R
and Bioconductor
have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the derfinder
or derfinderPlot
tags and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.
We hope that derfinderPlot will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation("derfinderPlot")
## To cite package 'derfinderPlot' in publications use:
##
## Collado-Torres L, Jaffe AE, Leek JT (2017). _derfinderPlot: Plotting
## functions for derfinder_. doi:10.18129/B9.bioc.derfinderPlot
## <https://doi.org/10.18129/B9.bioc.derfinderPlot>,
## https://github.com/leekgroup/derfinderPlot - R package version
## 1.34.0, <http://www.bioconductor.org/packages/derfinderPlot>.
##
## Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B,
## Irizarry RA, Leek JT, Jaffe AE (2017). "Flexible expressed region
## analysis for RNA-seq with derfinder." _Nucl. Acids Res._.
## doi:10.1093/nar/gkw852 <https://doi.org/10.1093/nar/gkw852>,
## <http://nar.oxfordjournals.org/content/early/2016/09/29/nar.gkw852>.
##
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
derfinderPlot (Collado-Torres, Jaffe, and Leek, 2017) is an addon package for derfinder (Collado-Torres, Nellore, Frazee, Wilks, Love, Langmead, Irizarry, Leek, and Jaffe, 2017) with functions that allow you to visualize the results.
While the functions in derfinderPlot assume you generated the data with derfinder, they can be used with other GRanges
objects properly formatted.
The functions in derfinderPlot are:
plotCluster()
is a tailored ggbio (Yin, Cook, and Lawrence, 2012) plot that shows all the regions in a cluster (defined by distance). It shows the base-level coverage for each sample as well as the mean for each group. If these regions overlap any known gene, the gene and the transcript annotation is displayed.plotOverview()
is another tailored ggbio (Yin, Cook, and Lawrence, 2012) plot showing an overview of the whole genome. This plot can be useful to observe if the regions are clustered in a subset of a chromosome. It can also be used to check whether the regions match predominantly one part of the gene structure (for example, 3’ overlaps).plotRegionCoverage()
is a fast plotting function using R
base graphics that shows the base-level coverage for each sample inside a specific region of the genome. If the region overlaps any known gene or intron, the information is displayed. Optionally, it can display the known transcripts. This function is most likely the easiest to use with GRanges
objects from other packages.As an example, we will analyze a small subset of the samples from the BrainSpan Atlas of the Human Brain (BrainSpan, 2011) publicly available data.
We first load the required packages.
## Load libraries
suppressPackageStartupMessages(library("derfinder"))
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
library("derfinderData")
library("derfinderPlot")
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
For this example, we created a small table with the relevant phenotype data for 12 samples: 6 from fetal samples and 6 from adult samples. We chose at random a brain region, in this case the primary auditory cortex (core) and for the example we will only look at data from chromosome 21. Other variables include the age in years and the gender. The data is shown below.
library("knitr")
## Get pheno table
pheno <- subset(brainspanPheno, structure_acronym == "A1C")
## Display the main information
p <- pheno[, -which(colnames(pheno) %in% c(
"structure_acronym",
"structure_name", "file"
))]
rownames(p) <- NULL
kable(p, format = "html", row.names = TRUE)
gender | lab | Age | group | |
---|---|---|---|---|
1 | M | HSB114.A1C | -0.5192308 | fetal |
2 | M | HSB103.A1C | -0.5192308 | fetal |
3 | M | HSB178.A1C | -0.4615385 | fetal |
4 | M | HSB154.A1C | -0.4615385 | fetal |
5 | F | HSB150.A1C | -0.5384615 | fetal |
6 | F | HSB149.A1C | -0.5192308 | fetal |
7 | F | HSB130.A1C | 21.0000000 | adult |
8 | M | HSB136.A1C | 23.0000000 | adult |
9 | F | HSB126.A1C | 30.0000000 | adult |
10 | M | HSB145.A1C | 36.0000000 | adult |
11 | M | HSB123.A1C | 37.0000000 | adult |
12 | F | HSB135.A1C | 40.0000000 | adult |
We can load the data from derfinderData (Collado-Torres, Jaffe, and Leek, 2023) by first identifying the paths to the BigWig files with derfinder::rawFiles()
and then loading the data with derfinder::fullCoverage()
.
## Determine the files to use and fix the names
files <- rawFiles(system.file("extdata", "A1C", package = "derfinderData"),
samplepatt = "bw", fileterm = NULL
)
names(files) <- gsub(".bw", "", names(files))
## Load the data from disk
system.time(fullCov <- fullCoverage(files = files, chrs = "chr21"))
## 2023-04-25 16:52:51.956415 fullCoverage: processing chromosome chr21
## 2023-04-25 16:52:51.974605 loadCoverage: finding chromosome lengths
## 2023-04-25 16:52:52.008233 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB103.bw
## 2023-04-25 16:52:52.244951 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB114.bw
## 2023-04-25 16:52:52.453885 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB123.bw
## 2023-04-25 16:52:52.612772 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB126.bw
## 2023-04-25 16:52:52.723077 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB130.bw
## 2023-04-25 16:52:52.8591 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB135.bw
## 2023-04-25 16:52:52.969808 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB136.bw
## 2023-04-25 16:52:53.097058 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB145.bw
## 2023-04-25 16:52:53.232105 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB149.bw
## 2023-04-25 16:52:53.3848 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB150.bw
## 2023-04-25 16:52:53.499089 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB154.bw
## 2023-04-25 16:52:53.645381 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.17-bioc/R/site-library/derfinderData/extdata/A1C/HSB178.bw
## 2023-04-25 16:52:53.804252 loadCoverage: applying the cutoff to the merged data
## 2023-04-25 16:52:53.842834 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
## user system elapsed
## 1.881 0.057 1.959
Alternatively, since the BigWig files are publicly available from BrainSpan (see here), we can extract the relevant coverage data using derfinder::fullCoverage()
. Note that as of rtracklayer 1.25.16 BigWig files are not supported on Windows: you can find the fullCov
object inside derfinderData to follow the examples.
## Determine the files to use and fix the names
files <- pheno$file
names(files) <- gsub(".A1C", "", pheno$lab)
## Load the data from the web
system.time(fullCov <- fullCoverage(files = files, chrs = "chr21"))
Once we have the base-level coverage data for all 12 samples, we can construct the models. In this case, we want to find differences between fetal and adult samples while adjusting for gender and a proxy of the library size.
## Get some idea of the library sizes
sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
## 2023-04-25 16:52:53.931901 sampleDepth: Calculating sample quantiles
## 2023-04-25 16:52:54.066912 sampleDepth: Calculating sample adjustments
## Define models
models <- makeModels(sampleDepths,
testvars = pheno$group,
adjustvars = pheno[, c("gender")]
)
Next, we can find candidate differentially expressed regions (DERs) using as input the segments of the genome where at least one sample has coverage greater than 3. In this particular example, we chose a low theoretical F-statistic cutoff and used 20 permutations.
## Filter coverage
filteredCov <- lapply(fullCov, filterData, cutoff = 3)
## 2023-04-25 16:52:54.383926 filterData: originally there were 48129895 rows, now there are 90023 rows. Meaning that 99.81 percent was filtered.
## Perform differential expression analysis
suppressPackageStartupMessages(library("bumphunter"))
system.time(results <- analyzeChr(
chr = "chr21", filteredCov$chr21,
models, groupInfo = pheno$group, writeOutput = FALSE,
cutoffFstat = 5e-02, nPermute = 20, seeds = 20140923 + seq_len(20)
))
## 2023-04-25 16:52:55.606505 analyzeChr: Pre-processing the coverage data
## 2023-04-25 16:52:57.348607 analyzeChr: Calculating statistics
## 2023-04-25 16:52:57.351911 calculateStats: calculating the F-statistics
## 2023-04-25 16:52:57.551916 analyzeChr: Calculating pvalues
## 2023-04-25 16:52:57.552291 analyzeChr: Using the following theoretical cutoff for the F-statistics 5.31765507157871
## 2023-04-25 16:52:57.553561 calculatePvalues: identifying data segments
## 2023-04-25 16:52:57.561697 findRegions: segmenting information
## 2023-04-25 16:52:58.578086 findRegions: identifying candidate regions
## 2023-04-25 16:52:58.637298 findRegions: identifying region clusters
## 2023-04-25 16:52:58.794623 calculatePvalues: calculating F-statistics for permutation 1 and seed 20140924
## 2023-04-25 16:52:58.961293 findRegions: segmenting information
## 2023-04-25 16:52:58.986481 findRegions: identifying candidate regions
## 2023-04-25 16:52:59.050169 calculatePvalues: calculating F-statistics for permutation 2 and seed 20140925
## 2023-04-25 16:52:59.22494 findRegions: segmenting information
## 2023-04-25 16:52:59.248339 findRegions: identifying candidate regions
## 2023-04-25 16:52:59.294041 calculatePvalues: calculating F-statistics for permutation 3 and seed 20140926
## 2023-04-25 16:52:59.450518 findRegions: segmenting information
## 2023-04-25 16:52:59.476429 findRegions: identifying candidate regions
## 2023-04-25 16:52:59.525787 calculatePvalues: calculating F-statistics for permutation 4 and seed 20140927
## 2023-04-25 16:52:59.693424 findRegions: segmenting information
## 2023-04-25 16:52:59.717341 findRegions: identifying candidate regions
## 2023-04-25 16:52:59.76445 calculatePvalues: calculating F-statistics for permutation 5 and seed 20140928
## 2023-04-25 16:52:59.92518 findRegions: segmenting information
## 2023-04-25 16:52:59.949241 findRegions: identifying candidate regions
## 2023-04-25 16:52:59.994784 calculatePvalues: calculating F-statistics for permutation 6 and seed 20140929
## 2023-04-25 16:53:00.152121 findRegions: segmenting information
## 2023-04-25 16:53:00.187271 findRegions: identifying candidate regions
## 2023-04-25 16:53:00.232915 calculatePvalues: calculating F-statistics for permutation 7 and seed 20140930
## 2023-04-25 16:53:00.381112 findRegions: segmenting information
## 2023-04-25 16:53:00.404989 findRegions: identifying candidate regions
## 2023-04-25 16:53:00.450619 calculatePvalues: calculating F-statistics for permutation 8 and seed 20140931
## 2023-04-25 16:53:00.61495 findRegions: segmenting information
## 2023-04-25 16:53:00.638739 findRegions: identifying candidate regions
## 2023-04-25 16:53:00.685417 calculatePvalues: calculating F-statistics for permutation 9 and seed 20140932
## 2023-04-25 16:53:00.833884 findRegions: segmenting information
## 2023-04-25 16:53:00.857607 findRegions: identifying candidate regions
## 2023-04-25 16:53:00.911461 calculatePvalues: calculating F-statistics for permutation 10 and seed 20140933
## 2023-04-25 16:53:01.059691 findRegions: segmenting information
## 2023-04-25 16:53:01.083647 findRegions: identifying candidate regions
## 2023-04-25 16:53:01.129537 calculatePvalues: calculating F-statistics for permutation 11 and seed 20140934
## 2023-04-25 16:53:01.292764 findRegions: segmenting information
## 2023-04-25 16:53:01.316476 findRegions: identifying candidate regions
## 2023-04-25 16:53:01.3632 calculatePvalues: calculating F-statistics for permutation 12 and seed 20140935
## 2023-04-25 16:53:01.524999 findRegions: segmenting information
## 2023-04-25 16:53:01.558654 findRegions: identifying candidate regions
## 2023-04-25 16:53:01.60663 calculatePvalues: calculating F-statistics for permutation 13 and seed 20140936
## 2023-04-25 16:53:01.762706 findRegions: segmenting information
## 2023-04-25 16:53:01.78673 findRegions: identifying candidate regions
## 2023-04-25 16:53:01.833941 calculatePvalues: calculating F-statistics for permutation 14 and seed 20140937
## 2023-04-25 16:53:02.005339 findRegions: segmenting information
## 2023-04-25 16:53:02.02975 findRegions: identifying candidate regions
## 2023-04-25 16:53:02.076789 calculatePvalues: calculating F-statistics for permutation 15 and seed 20140938
## 2023-04-25 16:53:02.234826 findRegions: segmenting information
## 2023-04-25 16:53:02.259419 findRegions: identifying candidate regions
## 2023-04-25 16:53:02.316828 calculatePvalues: calculating F-statistics for permutation 16 and seed 20140939
## 2023-04-25 16:53:02.468816 findRegions: segmenting information
## 2023-04-25 16:53:02.492605 findRegions: identifying candidate regions
## 2023-04-25 16:53:02.538512 calculatePvalues: calculating F-statistics for permutation 17 and seed 20140940
## 2023-04-25 16:53:02.700124 findRegions: segmenting information
## 2023-04-25 16:53:02.724091 findRegions: identifying candidate regions
## 2023-04-25 16:53:02.772057 calculatePvalues: calculating F-statistics for permutation 18 and seed 20140941
## 2023-04-25 16:53:02.933803 findRegions: segmenting information
## 2023-04-25 16:53:02.967156 findRegions: identifying candidate regions
## 2023-04-25 16:53:03.013512 calculatePvalues: calculating F-statistics for permutation 19 and seed 20140942
## 2023-04-25 16:53:03.169222 findRegions: segmenting information
## 2023-04-25 16:53:03.193487 findRegions: identifying candidate regions
## 2023-04-25 16:53:03.240727 calculatePvalues: calculating F-statistics for permutation 20 and seed 20140943
## 2023-04-25 16:53:03.414072 findRegions: segmenting information
## 2023-04-25 16:53:03.438164 findRegions: identifying candidate regions
## 2023-04-25 16:53:03.509835 calculatePvalues: calculating the p-values
## 2023-04-25 16:53:03.586116 analyzeChr: Annotating regions
## No annotationPackage supplied. Trying org.Hs.eg.db.
## Loading required package: org.Hs.eg.db
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Getting TSS and TSE.
## Getting CSS and CSE.
## Getting exons.
## Annotating genes.
## ...
## user system elapsed
## 55.698 1.303 57.004
## Quick access to the results
regions <- results$regions$regions
## Annotation database to use
suppressPackageStartupMessages(library("TxDb.Hsapiens.UCSC.hg19.knownGene"))
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
plotOverview()
Now that we have obtained the main results using derfinder, we can proceed to visualizing the results using derfinderPlot. The easiest to use of all the functions is plotOverview()
which takes a set of regions and annotation information produced by bumphunter::matchGenes()
.
Figure 1 shows the candidate DERs colored by whether their q-value was less than 0.10 or not.
## Q-values overview
plotOverview(regions = regions, annotation = results$annotation, type = "qval")
## 2023-04-25 16:53:52.761346 plotOverview: assigning chromosome lengths from hg19!
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
Figure 2 shows the candidate DERs colored by the type of gene feature they are nearest too.
## Annotation overview
plotOverview(
regions = regions, annotation = results$annotation,
type = "annotation"
)
## 2023-04-25 16:53:54.337801 plotOverview: assigning chromosome lengths from hg19!
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
In this particular example, because we are only using data from one chromosome the above plot is not as informative as in a real case scenario. However, with this plot we can quickly observe that nearly all of the candidate DERs are inside an exon.
plotRegionCoverage()
The complete opposite of visualizing the candidate DERs at the genome-level is to visualize them one region at a time. plotRegionCoverage()
allows us to do this quickly for a large number of regions.
Before using this function, we need to process more detailed information using two derfinder functions: annotateRegions()
and getRegionCoverage()
as shown below.
## Get required information for the plots
annoRegs <- annotateRegions(regions, genomicState$fullGenome)
## 2023-04-25 16:53:55.279237 annotateRegions: counting
## 2023-04-25 16:53:55.365254 annotateRegions: annotating
regionCov <- getRegionCoverage(fullCov, regions)
## 2023-04-25 16:53:55.494981 getRegionCoverage: processing chr21
## 2023-04-25 16:53:55.5522 getRegionCoverage: done processing chr21
Once we have the relevant information we can proceed to plotting the first 10 regions. In this case, we will supply plotRegionCoverage()
with the information it needs to plot transcripts overlapping these 10 regions (Figures ??, ??, ??, ??, ??, ??, ??, ??, ??, ??).
## Plot top 10 regions
plotRegionCoverage(
regions = regions, regionCoverage = regionCov,
groupInfo = pheno$group, nearestAnnotation = results$annotation,
annotatedRegions = annoRegs, whichRegions = 1:10, txdb = txdb, scalefac = 1,
ask = FALSE, verbose = FALSE
)