---
title: "Lab 1.6: Annotation Resources"
output:
BiocStyle::html_document:
toc: true
vignette: >
% \VignetteIndexEntry{Lab 1.6: Annotation Resources}
% \VignetteEngine{knitr::rmarkdown}
---
```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```
```{r setup, echo=FALSE, warning=FALSE}
knitr::opts_chunk$set(
eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE"))
)
options(max.print=1000)
suppressPackageStartupMessages({
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
library(GenomicRanges)
library(biomaRt)
library(rtracklayer)
library(Gviz)
library(AnnotationHub)
})
```
Original Authors: Martin Morgan, Sonali Arora
Presenting Authors: [Martin Morgan][], [Lori Shepherd][]
Date: 22 July, 2019
Back: [Monday labs](lab-1-intro-to-r-bioc.html)
[Martin Morgan]: mailto: Martin.Morgan@RoswellPark.org
[Lori Shepherd]: mailto: Lori.Shepherd@RoswellPark.org
**Objective**: Learn about _Bioconductor_ resources for gene and
genome annotation.
**Lessons learned**:
- Use `org.*` packages for mapping between gene symbols.
- Use `TxDb.*` and `ensembldb` (`EnsDb.*`) packages for working with gene
models.
- Use `AnnotationHub` to easily obtain select consortium-level resources
- Access `biomaRt` and other internet-based resources for highly
flexible annotation.
- Use `VariantAnnotation` and `VariantFiltering` for annotating SNPs.
# Gene annotation
## Data packages
Organism-level ('org') packages contain mappings between a central
identifier (e.g., Entrez gene ids) and other identifiers (e.g. GenBank
or Uniprot accession number, RefSeq id, etc.). The name of an org
package is always of the form `org...db`
(e.g. [org.Sc.sgd.db][]) where `` is a 2-letter abbreviation of
the organism (e.g. `Sc` for *Saccharomyces cerevisiae*) and `` is
an abbreviation (in lower-case) describing the type of central
identifier (e.g. `sgd` for gene identifiers assigned by the
*Saccharomyces* Genome Database, or `eg` for Entrez gene ids). The
"How to use the '.db' annotation packages" vignette in the
[AnnotationDbi][] package (org packages are only one type of ".db"
annotation packages) is a key reference. The '.db' and most other
Bioconductor annotation packages are updated every 6 months.
Annotation packages usually contain an object named after the package
itself. These objects are collectively called `AnnotationDb` objects,
with more specific classes named `OrgDb`, `ChipDb` or `TranscriptDb`
objects. Methods that can be applied to these objects include
`cols()`, `keys()`, `keytypes()` and `select()`. Common operations
for retrieving annotations are summarized in the table.
| Category | Function | Description |
|------------|---------------------------------------|------------------------------------------------------------------|
| Discover | `columns()` | List the kinds of columns that can be returned |
| | `keytypes()` | List columns that can be used as keys |
| | `keys()` | List values that can be expected for a given keytype |
| | `select()` | Retrieve annotations matching `keys`, `keytype` and `columns` |
| Manipulate | `setdiff()`, `union()`, `intersect()` | Operations on sets |
| | `duplicated()`, `unique()` | Mark or remove duplicates |
| | `%in%`, `match()` | Find matches |
| | `any()`, `all()` | Are any `TRUE`? Are all? |
| | `merge()` | Combine two different \Robject{data.frames} based on shared keys |
| `GRanges*` | `transcripts()`, `exons()`, `cds()` | Features (transcripts, exons, coding sequence) as `GRanges`. |
| | `transcriptsBy()` , `exonsBy()` | Features group by gene, transcript, etc., as `GRangesList`. |
| | `cdsBy()` | |
## Internet resources
A short summary of select Bioconductor packages enabling web-based
queries is in following Table.
| Package | Description |
|-----------------------------------------------------|-------------------------------------------|
| [AnnotationHub][] | Ensembl, Encode, dbSNP, UCSC data objects |
| [biomaRt](http://biomart.org) | Ensembl and other annotations |
| [PSICQUIC](https://code.google.com/p/psicquic) | Protein interactions |
| [uniprot.ws](http://uniprot.org) | Protein annotations |
| [KEGGREST](http://www.genome.jp/kegg) | KEGG pathways |
| [SRAdb](http://www.ncbi.nlm.nih.gov/sra) | Sequencing experiments. |
| [rtracklayer](http://genome.ucsc.edu) | genome tracks. |
| [GEOquery](http://www.ncbi.nlm.nih.gov/geo/) | Array and other data |
| [ArrayExpress](http://www.ebi.ac.uk/arrayexpress/) | Array and other data |
## Exercises
**Exercise 1**: This exercise illustrates basic use of the `select'
interface to annotation packages.
1. Install and attach the [org.Hs.eg.db][] annotation package; it
contains 'symbol mapping' information for _Homo sapiens_, based on
NCBI 'Entrez' identifiers.
```{r}
library(org.Hs.eg.db)
```
2. Take a quick look at a summary of data in this package
```{r}
org.Hs.eg.db
```
3. The idea is that there are `keytypes()` that can be mapped to
different `columns()`; `keys()` can be used to see available
keys. Explore the package to see what sorts of information is
available, e.g.,
```{r}
keytypes(org.Hs.eg.db)
columns(org.Hs.eg.db)
head(keys(org.Hs.eg.db, "SYMBOL"))
```
4. There are two basic ways of extracting data from an `org.*` package
-- `mapIds()` to create a 1:1 mapping between key and a single
column, and `select()` (it's often necessary to specify this
function directly, to avoid a conflict with dplyr, as
`AnnotationDbi::select()`). Explore these functions, e.g.,
```{r}
set.seed(123)
egid <- sample(keys(org.Hs.eg.db), 6)
mapIds(org.Hs.eg.db, egid, "SYMBOL", "ENTREZID")
AnnotationDbi::select(
org.Hs.eg.db, egid, c("SYMBOL", "ENSEMBL", "GENENAME"), "ENTREZID"
)
```
5. Some key - column mappings are 1:many, e.g., Entrez ID `"3812"`
maps to 44 Ensembl Ids. What does `mapIds()` return when mapping
Entrez ID `"3812"` to Ensembl ids? Use the additional argument
`multiVals = "CharacterList"` to explore further. Compare results
to those returned by `select()`.
```{r}
egid <- "3812"
mapIds(org.Hs.eg.db, egid, "ENSEMBL", "ENTREZID")
mapIds(
org.Hs.eg.db, egid, "ENSEMBL", "ENTREZID",
multiVals = "CharacterList"
)
AnnotationDbi::select(
org.Hs.eg.db, egid, c("SYMBOL", "ENSEMBL"),
multiVals = "CharacterList"
)
```
6. It seems like it might often be useful to use the tidyverse on
return values from `mapIds()` and `select()`; explore this usage
```{r, message=FALSE}
library(tidyverse)
egid <- keys(org.Hs.eg.db) # all ENTREZIDs
mapIds(org.Hs.eg.db, egid, "SYMBOL", "ENTREZID") %>%
as_tibble() %>%
rownames_to_column("ENTREZID")
AnnotationDbi::select(
org.Hs.eg.db, egid, c("SYMBOL", "GO", "GENENAME"), "ENTREZID"
) %>% as_tibble()
```
**Exercise 2**: [biomaRt][].
Internet access required for this exercise
1. Explore the Biomart web site https://www.ensembl.org/biomart for
retrieving all kinds of genomic annotations.
Start by choosing a database (e.g., 'Ensembl Genes 92'), dataset
(e.g., 'Human genes (GRCh38.p12)'), filter (e.g., 'GENE' / 'Input
external reference' / 'Gene stable id' and enter
'ENSG00000000003'), attributes (default is ok), then press
'Results' to map from Ensembl identifier to transcript identifier.
2. Install (if necessary) and load the [biomaRt][] package. Use
`listMarts()` to see availble databases, `useMart()` to select the
mart you're interested in.
```{r}
library(biomaRt)
head(listMarts())
mart <- useMart("ENSEMBL_MART_ENSEMBL")
```
3. Use `listDatasets()` and `useDataset()` to select the _Homo
sapiens_ gene dataset.
```{r}
head(listDatasets(mart))
dataset <- useDataset("hsapiens_gene_ensembl", mart)
```
4. Use `listFilters()` to see available filters. The filter is the
type of data that you are querying with. Choose one.
```{r}
head(listFilters(dataset))
filters <- "ensembl_gene_id" # see `listFilters()`
```
5. Use `listAttrbutes()` to see available attributes. Attributes
represent the information you'd like to retrieve. Choose some!
```{r}
head(listAttributes(dataset))
attrs <- c("ensembl_gene_id", "hgnc_symbol") # see `listAttributes()`
```
6. Create a character vector of Ensembl gene ids, compose and execute
the query, transforming the result to a tibble.
```{r}
ids <- c(
"ENSG00000000003", "ENSG00000000005", "ENSG00000000419",
"ENSG00000000457", "ENSG00000000460", "ENSG00000000938"
)
tbl <- getBM(attrs, filters, ids, dataset) %>% as_tibble()
tbl
```
**Exercise 3**: [KEGGREST][]
Internet access required for this exercise
1. Explore the KEGG web site https://www.genome.jp/kegg/ KEGG is a
database of information on pathways.
2. Load the [KEGGREST][] package and discover available databases
```{r}
library(KEGGREST)
KEGGREST::listDatabases()
```
3. Use `keggList()` to query the pathway database for human pathways;
present the result as a tibble
```{r}
hsa_pathways <- keggList("pathway", "hsa") %>%
tibble(pathway = names(.), description = .)
hsa_pathways
```
4. Use `keggLink()` to recover the genes in each pathway.
```{r}
hsa_path_eg <- keggLink("pathway", "hsa") %>%
tibble(pathway = ., egid = sub("hsa:", "", names(.)))
hsa_path_eg
hsa_path_eg %>% group_by(pathway) %>% summarize(genes = list(egid))
```
5. Update the `hsa_path_eg` table to include information on gene
symbol and Ensembl id from the `org.Hs.eg.db` package. Retrieve the
relevant information using `mapIds()`. How would you deal with
entrez gene ids that map to multiple Ensembl ids?
```{r}
hsa_kegg_anno <- hsa_path_eg %>%
mutate(
symbol = mapIds(org.Hs.eg.db, egid, "SYMBOL", "ENTREZID"),
ensembl = mapIds(org.Hs.eg.db, egid, "ENSEMBL", "ENTREZID")
)
```
6. Use `left_join()` to append pathway descriptions to the
`hsa_kegg_anno` table.
```{r}
left_join(hsa_kegg_anno, hsa_pathways)
```
[KEGGREST]: https://bioconductor.org/packages/KEGGREST
# Genome annotation
There are a diversity of packages and classes available for
representing large genomes. Several include:
- `TxDb.*` and `EnsDb.*` For transcript and other genome / coordinate
annotation.
- [BSgenome][] For whole-genome representation. See
`available.genomes()` for pre-packaged genomes, and the vignette
'How to forge a BSgenome data package' in the
- [Homo.sapiens][] For integrating 'TxDb*' and 'org.*' packages.
- 'SNPlocs.*' For model organism SNP locations derived from dbSNP.
- `FaFile()` ([Rsamtools][]) for accessing indexed FASTA files.
- [ensemblVEP][] Variant effect scores.
## Transcript annotation packages
Genome-centric packages are very useful for annotations involving
genomic coordinates. It is straight-forward, for instance, to discover
the coordinates of coding sequences in regions of interest, and from
these retrieve corresponding DNA or protein coding sequences. Other
examples of the types of operations that are easy to perform with
genome-centric annotations include defining regions of interest for
counting aligned reads in RNA-seq experiments and retrieving DNA
sequences underlying regions of interest in ChIP-seq analysis, e.g.,
for motif characterization.
## _rtracklayer_
The [rtracklayer][] package allows us to query the UCSC genome
browser, as well as providing `import()` and `export()` functions for
common annotation file formats like GFF, GTF, and BED. The exercise
below illustrates some of the functionality of [rtracklayer][].
## Exercises
**Exercise 4**: `TxDb.*` packages
1. Install and attach the [TxDb.Hsapiens.UCSC.hg38.knownGene][]
package. This contains the gene models for _Homo sapiens_ based on
the 'hg38' build of the human genome, using gene annotations in the
UCSC 'knownGene' annotation track; TxDb's for more recent builds
and for different annotation tracks are available. Take a look at a
summary of the package, and create an alias for easy typing
```{r}
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
TxDb.Hsapiens.UCSC.hg38.knownGene
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
```
2. The main purpose of this package is to provide genomic coordinates
of genomic features such as `exons()`, coding sequences (`cds()`),
`transcripts()` and `genes()`. Explore, for example,
```{r}
ex <- exons(txdb)
ex
library(ggplot2)
qplot(log10(width(ex)))
ex[ which.max(width(ex)) ]
```
3. Extract all genes, and then keep only the 'standard' chromosomes
1:22, X, Y, and M. Use `table()` of `seqnames()` to determine how
many genes are on each chromosome. Also do this in a dplyr way;
note that the `seqnames(gn)` need to be coerced with `as.factor()`.
```{r}
gn <- genes(txdb)
length(gn)
std <- paste0("chr", c(1:22, "X", "Y", "M"))
seqlevels(gn, pruning.mode = "coarse") <- std
length(gn)
seqlevels(gn)
table( seqnames(gn) )
tibble(chr = as.factor(seqnames(gn))) %>%
group_by(chr) %>%
summarize(n = n())
```
4. `exonsBy()` groups exons by gene or transcript; extract exons
grouped by gene. (Challenging!) can you identify genes with exons
on different chromosomes? Are there any of these genes on the
standard chromosomes?
```{r}
exByGn <- exonsBy(txdb, "gene")
##
trans <- lengths(unique(seqnames(exByGn)))
table( trans )
seqnames( exByGn[ trans > 1 ] )
##
std <- paste0("chr", c(1:22, "X", "Y", "M"))
unames <- unique(seqnames(exByGn[ trans > 1 ]))
transstd <- all(unames %in% std)
unames[transstd]
```
5. The previous exercise indicated that gene `"22947"` has exons on
both chromosomes 4 and 10. Find out more about this gene using the
[org.Hs.eg.db][] package and by searching for the gene symbol on
the NCBI web site.
```{r}
egid <- "22947"
AnnotationDbi::select(
org.Hs.eg.db, egid, c("SYMBOL", "GENENAME"), "ENTREZID"
)
```
```{r, eval = FALSE}
url <- paste0("https://www.ncbi.nlm.nih.gov/gene/", egid)
browseURL(url)
```
6. Note that the `TxDb.*` packages also support `keytypes()`,
`columns()`, and `select()` for mapping between exon, cds,
transcript, and gene identifiers.
**Exercise 5**: `BSgenome.*` packages
1. Install (if necessary) and load the [BSgenome.Hsapiens.UCSC.hg38][]
package, containing the entire sequence of the hg38 build of _Homo
sapiens_. Check out it's contents, and create a simple alias.
```{r}
library(BSgenome.Hsapiens.UCSC.hg38)
BSgenome.Hsapiens.UCSC.hg38
hg38 <- BSgenome.Hsapiens.UCSC.hg38
```
2. Genomic sequence can be retrieved by chromosome, e.g.,
`hg38[["chr1"]]`, or by genomic range, e.g., `getSeq(hg38,
GRanges("chr1:1000000-2000000"))`. Retrieve your favorite chunk(s)
of DNA and calculate GC content.
```{r}
dna <- getSeq(hg38, GRanges("chr1:1000000-2000000"))
letterFrequency(dna, "GC", as.prob=TRUE)
```
3. Use the `org.*`, `TxDb.*`, and `BSgenome.*` packages to retrieve
the BRCA1 exon DNA sequence.
```{r}
brca1_egid <- mapIds(org.Hs.eg.db, "BRCA1", "ENTREZID", "SYMBOL")
brca1_exons <- exonsBy(txdb, "gene")[[brca1_egid]]
getSeq(hg38, brca1_exons)
```
**Exercise 6**
This exercise uses annotation resources to go from a gene symbol
'BRCA1' through to the genomic coordinates of each transcript
associated with the gene, and finally to the DNA sequences of the
transcripts. This can be achieved using an `EnsDb` package along with
a [BSgenome][] package, or with a combination of `TxDb`, [Homo.sapiens][]
and [BSgenome][] packages. We will focus here on the former approach.
0. Use AnnotationHub to discover and retrieve a current Ensembl
annotation ('EnsDb') for _Homo sapiens_.
1. Use the `cdsBy()` function to retrieve the genomic coordinates of all coding
sequences for the gene 'BRCA1' from the [EnsDb.Hsapiens.v86][] package. To
retrieve only data for the specified gene, submit either a `GenenameFilter`
or a filter formula/expression to the function's `filter` parameter. This
avoids to extract the coding region for all genes, which takes a long time.
4. Visualize the transcripts in genomic coordinates using the [Gviz][]
package to construct a `GeneRegionTrack`, and plotting it using
`plotTracks()`.
5. Use the [Bsgenome.Hsapiens.UCSC.hg38][] package and
`extractTranscriptSeqs()` function to extract the DNA sequence of
each transcript.
**Solution**
Retrieve the coding sequences grouped by transcript for the gene of interest and
verify that each coding sequence is a multiple of 3.
```{r edb-brca1-cds, message = FALSE}
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
brca1cds <- cdsBy(edb, by = "tx", filter = ~ genename == "BRCA1")
class(brca1cds)
length(brca1cds)
brca1cds[[1]] # exons in cds
cdswidth <- width(brca1cds) # width of each exon
all((sum(cdswidth) %% 3) == 0) # sum within cds, modulus 3
```
The CDS for some transcripts is not of the expected length, how come?
Get the transcript ID of the first transcript that does have a CDS of
the wrong size and look this transcript up in the Ensembl genome
browser (http://www.ensembl.org).
```{r edb-brca1-cds-wrongsize}
tx_cds_fail <- names(brca1cds)[(sum(cdswidth) %% 3) != 0]
length(tx_cds_fail)
tx_cds_fail[1]
```
In the description of the transcript it says *CDS 5'
incomplete*. Thus, in addition to known protein coding transcripts,
Ensembl provides annotations for transcripts known to be targeted for
nonsense mediated mRNA decay or that have incomplete CDS. Such
transcripts would however not be listed in e.g. the
[TxDb.Hsapiens.UCSC.hg38.knownGene][] package.
Next we visualize the BRCA1 transcripts using [Gviz][] (this package has an
excellent vignette, `vignette("Gviz")`)
```{r edb-brca1-Gviz, message=FALSE}
library(Gviz)
## Use the function from the ensembldb package to extract the data in the
## format suitable for Gviz
grt <- getGeneRegionTrackForGviz(edb, filter = ~genename == "BRCA1")
plotTracks(list(GenomeAxisTrack(), GeneRegionTrack(grt)))
```
Extract the coding sequences of each transcript. `EnsDb` databases provide
annotations from Ensembl and use hence Ensembl style chromosome names (such as
"Y") while the `BSgenome` package is based on UCSC annotations that use a naming
style that prepends a "chr" to each chromosome name (e.g. "chrY"). Change thus
the `seqlevelsStyle` from the default UCSC chromosome naming to Ensembl naming
style.
```{r edb-cds-to-seq}
library(BSgenome.Hsapiens.UCSC.hg19)
genome <- BSgenome.Hsapiens.UCSC.hg19
## Change the seqlevelsStyle from UCSC to Ensembl
seqlevelsStyle(genome) <- "Ensembl"
tx_seq <- extractTranscriptSeqs(genome, brca1cds)
tx_seq
```
We can also inspect the CDS sequence for the transcripts with incomplete
CDS. Many of them do not start with a start codon hence indicating that the CDS
is incomplete on their 5' end.
```{r edb-fail-cds}
tx_seq[tx_cds_fail]
```
Intron coordinates can be identified by first calculating the range of
the genome (from the start of the first exon to the end of the last
exon) covered by each transcript, and then taking the (algebraic) set
difference between this and the genomic coordinates covered by each
exon
```{r edb-introns}
introns <- psetdiff(unlist(range(brca1cds)), brca1cds)
```
Retrieve the intronic sequences with `getSeq()` (these are *not*
assembled, the way that `extractTranscriptSeqs()` assembles exon
sequences into mature transcripts); note that introns start and end
with the appropriate acceptor and donor site sequences.
Unfortunately, UCSC and Ensembl do also use different names for the genome
assembly. Change the genome name for the `introns` object to matche the one from
the `genome` object.
```{r edb-intron-seqs}
unique(genome(genome))
genome(introns)
## Change the genome name on introns to match the one from the
## BSgenome package
genome(introns) <- c(`17` = unique(genome(genome)))
seq <- getSeq(genome, introns)
names(seq)
seq[["ENST00000352993"]] # 20 introns
```
**Exercise 7**
Internet access required for this exercise
Here we use [rtracklayer][] to retrieve estrogen receptor binding
sites identified across cell lines in the ENCODE project. We focus on
binding sites in the vicinity of a particularly interesting region.
1. Define our region of interest by creating a `GRanges` instance with
appropriate genomic coordinates. Our region corresponds to 10Mb up-
and down-stream of a particular gene.
2. Create a session for the UCSC genome browser
3. Query the UCSC genome browser for ENCODE estrogen receptor
ERalphaa transcription marks; identifying the
appropriate track, table, and transcription factor requires
biological knowledge and detective work.
4. Visualize the location of the binding sites and their scores;
annotate the mid-point of the region of interest.
**Solution**
Define the region of interest
```{r rtracklayer-roi}
library(GenomicRanges)
roi <- GRanges("chr10", IRanges(92106877, 112106876, names="ENSG00000099194"))
```
Create a session
```{r rtracklayer-session, eval=FALSE}
library(rtracklayer)
session <- browserSession()
```
Query the UCSC for a particular track, table, and transcription
factor, in our region of interest
```{r rtracklayer-marks, eval=FALSE}
trackName <- "wgEncodeRegTfbsClusteredV2"
tableName <- "wgEncodeRegTfbsClusteredV2"
trFactor <- "ERalpha_a"
ucscTable <- getTable(ucscTableQuery(session, track=trackName,
range=roi, table=tableName, name=trFactor))
```
Visualize the result
```{r rtracklayer-plot, fig.height=3, eval=FALSE}
plot(score ~ chromStart, ucscTable, pch="+")
abline(v=start(roi) + (end(roi) - start(roi) + 1) / 2, col="blue")
```
# AnnotationHub
[AnnotationHub][] is a data base of large-scale whole-genome
resources, e.g., regulatory elements from the Roadmap Epigenomics
project, Ensembl GTF and FASTA files for model and other organisms,
and the NHLBI [grasp2db][] data base of GWAS results. There are many interesting ways in which these resources can be used. Examples include
- Easily access and import Roadmap Epigenomics files.
- 'liftOver' genomic range-based annotations from one coordinate
system (e.g, hg38) to another (e.g., GRCh 38);
- Create TranscriptDb and BSgenome-style annotation resources 'on the
fly' for a diverse set of organisms.
- Programmatically access the genomic coordiantes of clinically
relevant variants cataloged in dbSNP.
Unfortunately, [AnnotationHub][] makes extensive use of internet
resources and so we will not pursue it in this course; see the
vignettes that come with the pacakge, for instance
[_AnnotationHub_ HOW-TOs][AH-howto].
[AH-howto]: http://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html
# Annotating variants
_Bioconductor_ provides facilities for reading VCF files. These work
very well with the annotation resources described above, so for
instance it is straight-forward to identify variants in coding or
other regions of interest.
To develop a sense of the capabilities available, work through the
[VariantAnnotation][] vignette 'Introduction to Variant Annotation',
and the [VariantFiltering][] vignette.
[AnnotationDbi]: http://bioconductor.org/packages/AnnotationDbi
[AnnotationHub]: http://bioconductor.org/packages/AnnotationHub
[BSgenome]: http://bioconductor.org/packages/release/BSgenome
[Bsgenome.Hsapiens.UCSC.hg38]: http://bioconductor.org/packages/Bsgenome.Hsapiens.UCSC.hg38
[grasp2db]: http://bioconductor.org/packages/release/grasp2db
[Gviz]: http://bioconductor.org/packages/release/Gviz
[Homo.sapiens]: http://bioconductor.org/packages/release/Homo.sapiens
[Rsamtools]: http://bioconductor.org/packages/Rsamtools
[TxDb.Hsapiens.UCSC.hg38.knownGene]: http://bioconductor.org/packages/TxDb.Hsapiens.UCSC.hg38.knownGene
[VariantAnnotation]: http://bioconductor.org/packages/VariantAnnotation
[VariantFiltering]: http://bioconductor.org/packages/VariantFiltering
[biomaRt]: http://bioconductor.org/packages/biomaRt
[org.Hs.eg.db]: http://bioconductor.org/packages/org.Hs.eg.db
[org.Sc.sgd.db]: http://bioconductor.org/packages/org.Sc.sgd.db
[rtracklayer]: http://bioconductor.org/packages/release/rtracklayer
[EnsDb.Hsapiens.v86]: http://bioconductor.org/packages/EnsDb.Hsapiens.v86
# End matter
## Session Info
```{r}
sessionInfo()
```
## Acknowledgements
Research reported in this tutorial was supported by the National Human
Genome Research Institute and the National Cancer Institute of the
National Institutes of Health under award numbers U41HG004059 and
U24CA180996.
This project has received funding from the European Research Council
(ERC) under the European Union's Horizon 2020 research and innovation
programme (grant agreement number 633974)