SGSeq
Leonard D Goldstein [1, 2]
[1] Department of Bioinformatics and Computational Biology,
[2] Department of Molecular Biology,
Genentech, Inc., South San Francisco, CA, USA.
The SGSeq package provides a framework for analyzing annotated and novel splice events from RNA-seq data. SGSeq predicts exons and splice junctions from reads aligned against a reference genome and assembles them into a genome-wide splice graph. Splice events are identified from the graph and quantified using reads spanning event boundaries. This workshop provides an introduction to SGSeq functionality, including splice event detection, quantification, annotation and visualization. The first part of the workshop illustrates a complete SGSeq workflow for analyzing a gene of interest starting from BAM files. The second part of the workshop covers exercises based on a genomewide data set previously processed with SGSeq.
library(SGSeq)
The first part of this workshop illustrates an analysis of paired-end RNA-seq data from four tumor and four normal colorectal samples, which are part of a data set published in [Seshagiri et al. 2012] (#seshagiri). For the purpose of this vignette, we created BAM files that only include reads mapping to a single gene of interest (FBXO31).
When starting a new project, SGSeq requires information on the samples to be analyzed. This information can be provided as a data.frame, which must include columns sample_name (specificying a unique name for each sample) and file_bam (specifying the location of the BAM file). Function getBamInfo can be used to extract additional library information from BAM files, including paired-end status, median read length, median insert size and the total number of read alignments. This information must be obtained once initially and can then be used for all subsequent analyses. It is essential that BAM files are generated using a splice-aware alignment program that generates the custom tag ‘XS’ indicating the direction of transcription for spliced reads. In the following, we work with a data.frame si generated from the original (complete) BAM files with function getBamInfo.
si
## sample_name file_bam paired_end read_length frag_length lib_size
## 1 N1 N1.bam TRUE 75 293 12405197
## 2 N2 N2.bam TRUE 75 197 13090179
## 3 N3 N3.bam TRUE 75 206 14983084
## 4 N4 N4.bam TRUE 75 207 15794088
## 5 T1 T1.bam TRUE 75 284 14345976
## 6 T2 T2.bam TRUE 75 235 15464168
## 7 T3 T3.bam TRUE 75 259 15485954
## 8 T4 T4.bam TRUE 75 247 15808356
The following code block sets the correct BAM file paths in the sample information for this vignette.
path <- system.file("extdata", package = "SGSeq")
si$file_bam <- file.path(path, "bams", si$file_bam)
We use the UCSC knownGene table as reference annotation, which is available as a Bioconductor annotation package TxDb.Hsapiens.UCSC.hg19.knownGene. We retain transcripts on chromosome 16, where the FBXO31 gene is located, and change chromosome names in the annotation to match chromosome names in the BAM files.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
txdb <- keepSeqlevels(txdb, "chr16")
seqlevelsStyle(txdb) <- "NCBI"
To work with transcript annotation in the SGSeq framework, we first extract exons and splice junctions from the TxDb object using function convertToTxFeatures. We only retain features overlapping the FBXO31 gene (genomic coordinates of the FBXO31 gene are stored in the GRanges object gr).
txf_ucsc <- convertToTxFeatures(txdb)
## Warning in convertToTxFeatures(txdb): Merged adjacent exons
txf_ucsc <- txf_ucsc[txf_ucsc %over% gr]
txf_ucsc
## TxFeatures object with 23 ranges and 0 metadata columns:
## seqnames ranges strand type
## <Rle> <IRanges> <Rle> <factor>
## [1] 16 [87362942, 87365116] - L
## [2] 16 [87365116, 87367492] - J
## [3] 16 [87367492, 87367892] - I
## [4] 16 [87367892, 87368910] - J
## [5] 16 [87368910, 87369063] - I
## ... ... ... ... ...
## [19] 16 [87417011, 87417394] - F
## [20] 16 [87417628, 87417700] - U
## [21] 16 [87423343, 87423454] - I
## [22] 16 [87423454, 87425689] - J
## [23] 16 [87425689, 87425708] - F
## txName geneName
## <CharacterList> <CharacterList>
## [1] uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## [2] uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## [3] uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## [4] uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## [5] uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## ... ... ...
## [19] uc002fjw.3 79791
## [20] uc021tmi.1
## [21] uc010vot.2 79791
## [22] uc010vot.2 79791
## [23] uc010vot.2 79791
## -------
## seqinfo: 1 sequence from hg19 genome
SGSeq makes extensive use of the Bioconductor infrastructure for genomic ranges ([Lawrence et al. 2013] (#lawrence)). The TxFeatures class shown above extends the GRanges class with additional metadata columns. Column type can take values
Columns txName and geneName indicate the transcript and gene each feature derives from. Note that a feature can belong to more than one transcript. Accordingly these columns can store multiple values for each feature.
Metadata columns can be accessed using accessor functions named after the columns they access (e.g. use function type to obtain feature type).
If transcript annotation is not available as a TxDb object, function convertToTxFeatures can construct TxFeatures from a GRangesList of exons grouped by transcript (see Exercises (#exercises) below).
Exons stored as TxFeatures can be overlapping (e.g. due to alternative splice sites) resulting in ambiguities (e.g. when when trying to assign reads to individual exons). We therefore partition exonic regions into disjoint exon bins. Splice junctions and disjoint exon bins uniquely determine a genome-wide splice graph ([Heber et al. 2002] (#heber)). To store splice graph features, SGSeq implements the SGFeatures class.
sgf_ucsc <- convertToSGFeatures(txf_ucsc)
sgf_ucsc
## SGFeatures object with 42 ranges and 0 metadata columns:
## seqnames ranges strand type splice5p splice3p
## <Rle> <IRanges> <Rle> <factor> <logical> <logical>
## [1] 16 [87362942, 87365116] - E TRUE FALSE
## [2] 16 [87365116, 87365116] - A <NA> <NA>
## [3] 16 [87365116, 87367492] - J <NA> <NA>
## [4] 16 [87367492, 87367492] - D <NA> <NA>
## [5] 16 [87367492, 87367892] - E TRUE TRUE
## ... ... ... ... ... ... ...
## [38] 16 [87423343, 87423454] - E TRUE TRUE
## [39] 16 [87423454, 87423454] - A <NA> <NA>
## [40] 16 [87423454, 87425689] - J <NA> <NA>
## [41] 16 [87425689, 87425689] - D <NA> <NA>
## [42] 16 [87425689, 87425708] - E FALSE TRUE
## featureID geneID txName
## <integer> <integer> <CharacterList>
## [1] 1 1 uc002fjv.3,uc002fjw.3,uc010vot.2
## [2] 2 1 uc002fjv.3,uc002fjw.3,uc010vot.2
## [3] 3 1 uc002fjv.3,uc002fjw.3,uc010vot.2
## [4] 4 1 uc002fjv.3,uc002fjw.3,uc010vot.2
## [5] 5 1 uc002fjv.3,uc002fjw.3,uc010vot.2
## ... ... ... ...
## [38] 38 1 uc010vot.2
## [39] 39 1 uc010vot.2
## [40] 40 1 uc010vot.2
## [41] 41 1 uc010vot.2
## [42] 42 1 uc010vot.2
## geneName
## <CharacterList>
## [1] 79791
## [2] 79791
## [3] 79791
## [4] 79791
## [5] 79791
## ... ...
## [38] 79791
## [39] 79791
## [40] 79791
## [41] 79791
## [42] 79791
## -------
## seqinfo: 1 sequence from hg19 genome
Similar to TxFeatures, SGFeatures extends the GRanges class with additional metadata columns. Column type for an SGFeatures object takes values
By convention, splice donor and acceptor sites correspond to exonic positions immediately upstream and downstream of the intron, respectively. Note that splice sites are redundant in the sense that they are determined by the splice junctions included in the SGFeatures object. When assigning read counts to each feature (see below), counts for exons and splice junctions are based on structurally compatible reads. In the case of splice donor and acceptor sites, counts indicate the number of reads that extend across the spliced boundary (i.e. overlapping the splice site, as well as the flanking intronic position). Splice sites are included in the SGFeatures object since splice site counts are subsequently used for splice variant quantification.
SGFeatures includes additional metadata columns not included in TxFeatures. spliced5p and spliced3p indicate whether exon bins have a mandatory splice at the 5\(^\prime\) and 3\(^\prime\) boundary, respectively. This information is used to determine whether a read is structurally compatible with an exon bin, as well as to determine whether an exon bin is consistent with an annotated transcript.
Column featureID provides a unique identifier for each feature, while columnn geneID indicates the unique connected component of the splice graph a feature belongs to.
Both TxFeatures and SGFeatures objects can be exported to BED files using function exportFeatures.
We can now start analyzing the RNA-seq data at the FBXO31 gene locus. We first perform an analysis based on annotated transcripts. The following example converts the transcript features into splice graph features and obtains counts of compatible RNA-seq reads for each feature and each sample.
sgfc_ucsc <- analyzeFeatures(si, features = txf_ucsc)
## Process features...
## Obtain counts...
sgfc_ucsc
## class: SGFeatureCounts
## dim: 42 8
## metadata(0):
## assays(2): counts FPKM
## rownames: NULL
## rowRanges metadata column names(0):
## colnames(8): N1 N2 ... T3 T4
## colData names(6): sample_name file_bam ... frag_length lib_size
analyzeFeatures returns an object of class SGFeatureCounts, which extends the RangedSummarizedExperiment class from the SummarizedExperiment package. SGFeatureCounts contains the sample information as colData, splice graph features as rowRanges and assays counts and FPKM, which store structurally compatible counts and FPKMs, respectively. Accessor functions colData, rowRanges, counts and FPKM can be used to access the data.
Compatible FPKMs for splice graph features can be visualized with function plotFeatures. plotFeatures generates a two-panel figure with a splice graph shown in the top panel and a heatmap illustrating expression levels of individual features in the bottom panel. For customization of plotFeatures output, see section Visualization (#visualization). The plotting function invisibly returns a data.frame with information on splice graph features, including genomic coordinates.
df <- plotFeatures(sgfc_ucsc, geneID = 1)
df
## id name type featureID color
## 1 E1 E:16:87425689-87425708:- E 42 black
## 2 E2 E:16:87423343-87423454:- E 38 black
## 3 E3 E:16:87417011-87417394:- E 35 black
## 4 E4 E:16:87393973-87394561:- E 33 black
## 5 E5 E:16:87393901-87393972:- E 29 black
## 6 E6 E:16:87380780-87380856:- E 25 black
## 7 E7 E:16:87377204-87377371:- E 21 black
## 8 E8 E:16:87376483-87376557:- E 17 black
## 9 E9 E:16:87369761-87369870:- E 13 black
## 10 E10 E:16:87368910-87369063:- E 9 black
## 11 E11 E:16:87367492-87367892:- E 5 black
## 12 E12 E:16:87362942-87365116:- E 1 black
## 13 J1 J:16:87423454-87425689:- J 40 black
## 14 J2 J:16:87393972-87423343:- J 32 black
## 15 J3 J:16:87393972-87417011:- J 31 black
## 16 J4 J:16:87380856-87393901:- J 27 black
## 17 J5 J:16:87377371-87380780:- J 23 black
## 18 J6 J:16:87376557-87377204:- J 19 black
## 19 J7 J:16:87369870-87376483:- J 15 black
## 20 J8 J:16:87369063-87369761:- J 11 black
## 21 J9 J:16:87367892-87368910:- J 7 black
## 22 J10 J:16:87365116-87367492:- J 3 black
Note that the splice graph derived from annotated transcripts includes three alternative transcript start sites (TSSs). However, the heatmap indicates that the first TSS is not used in the samples in our data set.
Instead of relying on existing annotation, SGSeq can predict features from BAM files directly. The following code block predicts splice graph features with read evidence in our data set.
sgfc_pred <- analyzeFeatures(si, which = gr)
## Predict features...
## Process features...
## Obtain counts...
For interpretability, we annotate predicted features with respect to transcripts included in the UCSC knownGene table. The annotate function assigns compatible transcripts to each feature and stores them in metadata column txName. Metadata column geneName behaves transitively, meaning all features belonging to the same connected component of the splice graph (with identical geneID) have the same value for geneName. This behavior makes it easy to identify unannotated features (with empty txName) that belong to an annotated gene (non-empty geneName).
sgfc_pred <- annotate(sgfc_pred, txf_ucsc)
Predicted splice graph features and compatible FPKMs can be visualized as previously. Splice graph features with missing annotation can be highlighted using argument color_novel.
df <- plotFeatures(sgfc_pred, geneID = 1, color_novel = "red")
df
## id name type featureID color
## 1 E1 E:16:87417011-87417348:- E 38 black
## 2 E2 E:16:87393901-87393972:- E 34 black
## 3 E3 E:16:87392017-87392103:- E 30 black
## 4 E4 E:16:87380780-87380856:- E 25 black
## 5 E5 E:16:87377204-87377371:- E 21 black
## 6 E6 E:16:87376483-87376557:- E 17 black
## 7 E7 E:16:87369761-87369870:- E 13 black
## 8 E8 E:16:87368910-87369063:- E 9 black
## 9 E9 E:16:87367492-87367892:- E 5 black
## 10 E10 E:16:87362930-87365116:- E 1 black
## 11 J1 J:16:87393972-87417011:- J 36 black
## 12 J2 J:16:87392103-87393901:- J 32 black
## 13 J3 J:16:87380856-87393901:- J 28 black
## 14 J4 J:16:87380856-87392017:- J 27 black
## 15 J5 J:16:87377371-87380780:- J 23 black
## 16 J6 J:16:87376557-87377204:- J 19 black
## 17 J7 J:16:87369870-87376483:- J 15 black
## 18 J8 J:16:87369063-87369761:- J 11 black
## 19 J9 J:16:87367892-87368910:- J 7 black
## 20 J10 J:16:87365116-87367492:- J 3 black
Note that most exons and splice junctions predicted from the RNA-seq data are consistent with transcripts in the UCSC knownGene table (shown in gray). However, in contrast to the previous figure, the predicted gene model does not include parts of the splice graph that are not expressed in our data set. Also, an unannotated exon (E3, shown in red) was discovered from the RNA-seq data, which is expressed in three of the four normal colorectal samples (N2, N3, N4).
Instead of considering the complete splice graph of a gene, we can focus our analysis on individual splice events. In the SGSeq framework, the splice graph is a directed acyclic graph with nodes corresponding to transcript starts, ends and splice sites, and edges corresponding to disjoint exon bins and splice junctions, directed from 5\(^\prime\) to the 3\(^\prime\) end. A splice event is defined by a start node and an end node connected by two or more paths and no intervening nodes with all paths intersecting. SGSeq identifies splice events recursively from the graph, and estimates relative usage of splice variants based on compatible reads spanning the event boundaries. The following example identifies splice events from the splice graph and obtains representative counts for each splice variant.
sgvc_pred <- analyzeVariants(sgfc_pred)
## Find segments...
## Find variants...
## Annotate variants...
sgvc_pred
## class: SGVariantCounts
## dim: 2 8
## metadata(0):
## assays(5): countsVariant5p countsTotal5p countsVariant3p
## countsTotal3p variantFreq
## rownames(2): 1 2
## rowRanges metadata column names(16): from to ... variantType
## variantName
## colnames(8): N1 N2 ... T3 T4
## colData names(6): sample_name file_bam ... frag_length lib_size
analyzeVariants returns an SGVariantCounts object. Similar to SGFeatureCounts, SGVariantCounts extends the RangedSummarizedExperiment class. SGVariantCounts contains sample information as colData and SGVariants as rowRanges. Assay variantFreq stores estimates of relative usage for each splice variant and sample. Accessor functions colData, rowRanges and variantFreq can be used to access the data. Information on splice variants is stored in metadata columns in the SGVariants object and can be accessed as follows.
mcols(sgvc_pred)
## DataFrame with 2 rows and 16 columns
## from to type featureID segmentID
## <character> <character> <character> <character> <character>
## 1 D:16:87393901:- A:16:87380856:- J 28 4
## 2 D:16:87393901:- A:16:87380856:- JEJ 32,30,27 2
## closed3p closed5p geneID eventID variantID featureID5p
## <logical> <logical> <integer> <integer> <integer> <IntegerList>
## 1 TRUE TRUE 1 1 1 28
## 2 TRUE TRUE 1 1 2 32
## featureID3p txName geneName
## <IntegerList> <CharacterList> <CharacterList>
## 1 28 uc002fjv.3,uc002fjw.3,uc010vot.2 79791
## 2 27 79791
## variantType variantName
## <CharacterList> <character>
## 1 SE:S 79791_1_1/2_SE
## 2 SE:I 79791_1_2/2_SE
Splice variants and estimates of relative usage can be visualized with function plotVariants.
plotVariants(sgvc_pred, eventID = 1, color_novel = "red")
plotVariants generates a two-panel figure similar to plotFeatures. The splice graph in the top panel illustrates the selected splice event. In this example, the splice event consists of two splice variants that correspond to a skip or inclusion of the unannotated exon. The heatmap illustrates estimates of relative usage for each splice variant. We observe that samples N2, N3 and N4 show evidence for both transcripts that include the exon as well as transcripts that skip the exon. The remaining samples show little evidence for exon inclusion.
Functions plotFeatures and plotVariants support many options for customizing figures. Note that the splice graph in the top figure panel is plotted by function plotSpliceGraph, which can be called directly.
plotFeatures includes multiple alternative arguments for selecting features to be displayed. The following code illustrates three different ways of selecting and plotting the splice graph and expression levels for FBXO31 (Entrez ID 79791).
plotFeatures(sgfc_pred, geneID = 1)
plotFeatures(sgfc_pred, geneName = "79791")
plotFeatures(sgfc_pred, which = gr)
By default, the heatmap generated by plotFeatures displays splice junctions. Alternatively, exon bins, or both exon bins and splice junctions can be displayed.
plotFeatures(sgfc_pred, geneID = 1, include = "junctions")
plotFeatures(sgfc_pred, geneID = 1, include = "exons")
plotFeatures(sgfc_pred, geneID = 1, include = "both")
Argument toscale controls which parts of the gene model are drawn to scale.
plotFeatures(sgfc_pred, geneID = 1, toscale = "gene")
plotFeatures(sgfc_pred, geneID = 1, toscale = "exon")
plotFeatures(sgfc_pred, geneID = 1, toscale = "none")
Heatmaps allow the visualization of expression values summarized for splice junctions and exon bins. Alternatively, per-base read coverages and splice junction counts can be visualized with function plotCoverage.
par(mfrow = c(5, 1), mar = c(1, 3, 1, 1))
plotSpliceGraph(rowRanges(sgfc_pred), geneID = 1, toscale = "none", color_novel = "red")
for (j in 1:4) {
plotCoverage(sgfc_pred[, j], geneID = 1, toscale = "none")
}
Functions analyzeFeatures and analyzeVariants wrap multiple analysis steps for convenience. Alternatively, the functions performing individual steps can be called directly. For example, the previous analysis using de novo prediction can be performed as follows.
txf <- predictTxFeatures(si, gr)
sgf <- convertToSGFeatures(txf)
sgf <- annotate(sgf, txf_ucsc)
sgfc <- getSGFeatureCounts(si, sgf)
sgv <- findSGVariants(sgf)
## Find segments...
## Find variants...
## Annotate variants...
sgvc <- getSGVariantCounts(sgv, sgfc)
predictTxFeatures and getSGFeatureCounts can be run on individual samples (e.g. for distribution across a high-performance computing cluster). predictTxFeatures predicts features for each sample, merges features across samples and finally performs filtering and processing of predicted terminal exons. When using predictTxFeatures for individual samples, with predictions intended to be merged at a later point in time, run predictTxFeatures with argument min_overhang = NULL to suppress processing of terminal exons. Then predictions can subsequently be merged and processed with functions mergeTxFeatures and processTerminalExons, respectively.
Exercise 1 Construct a TxFeatures object for a transcript with three exons. What happens if you add a second transcript with exons that are shared or overlapping with exons in the first transcript? What happens if you convert the TxFeatures object to an SGFeatures object?
tx_1 <- GRangesList(tx_1 = GRanges("1", IRanges(c(101, 301, 501), c(200, 400, 600)), "+"))
tx_2 <- GRangesList(tx_2 = GRanges("1", IRanges(c(101, 351, 701), c(200, 400, 800)), "+"))
txf_1 <- convertToTxFeatures(tx_1)
txf_2 <- convertToTxFeatures(tx_2)
txf <- convertToTxFeatures(c(tx_1, tx_2))
sgf <- convertToSGFeatures(txf)
par(mfrow = c(1, 1))
plotSpliceGraph(sgf)
The following exercises are based on genome-wide predictions obtained from paired-end RNA-seq data generated as part of the Illumina Body Map 2.0 ([Farrell et al. 2014] (#farrell)). The data were processed as shown in the following code block (the code will not run as BAM files are not available).
sgfc_IBM <- analyzeFeatures(si_IBM, alpha = 5, psi = 0.2, beta = 0.2, gamma = 0.2)
sgvc_IBM <- analyzeVariants(sgfc_IBM, min_denominator = 20)
exclude <- eventID(sgvc_IBM)[!closed5p(sgvc_IBM) | !closed3p(sgvc_IBM)]
sgvc_IBM <- sgvc_IBM[!eventID(sgvc_IBM) %in% exclude, ]
We load the previously obtained SGSeq predictions sgfc_IBM and sgvc_IBM.
data(sgfc_IBM, sgvc_IBM, package = "SGSeqBioC2015")
Exercise 2 Annotate the SGFeatureCounts and SGVariantCounts objects with respect to transcripts included in the UCSC knownGene table (this may take a few minutes).
txdb <- restoreSeqlevels(txdb)
seqlevelsStyle(txdb) <- "NCBI"
txdb <- keepSeqlevels(txdb, c(1:22, "X", "Y"))
txf_ucsc <- convertToTxFeatures(txdb)
sgfc_IBM <- annotate(sgfc_IBM, txf_ucsc)
sgvc_IBM <- annotate(sgvc_IBM, txf_ucsc)
Exercise 3 Plot the gene model for gene KIFAP3 (Entrez ID 22920). Which parts of the predicted gene model are unannotated and what tissues are they expressed in? Inspect expression levels for both splice junctions and exons.
plotFeatures(sgfc_IBM, geneName = "22920", color_novel = "red")
plotFeatures(sgfc_IBM, geneName = "22920", color_novel = "red", include = "exons")
Exercise 4 Plot the unannotated splice event for gene KIFAP3.
mcols(sgvc_IBM)[any(geneName(sgvc_IBM) == "22920"), ]
plotVariants(sgvc_IBM, eventID = 552, color_novel = "red")
Exercise 5 (Difficult) Find genes with the most highly expressed unannotated cassette exons. Can you interpret the predicted splice graph for the top genes?
selected <- which(elementLengths(txName(sgvc_IBM)) == 0 & any(variantType(sgvc_IBM) == "SE:I"))
variants <- rowRanges(sgvc_IBM)[selected]
features <- unlist(variants)
exons <- features[type(features) == "E"]
exons_FPKM <- FPKM(sgfc_IBM)[match(featureID(exons), featureID(sgfc_IBM)), ]
exons_FPKM_max <- apply(exons_FPKM, 1, max)
geneIDs <- geneID(exons)[order(exons_FPKM_max, decreasing = TRUE)]
plotFeatures(sgfc_IBM, geneID = geneIDs[1], color_novel = "red")
Seshagiri S, Stawiski EW, Durinck S, Modrusan Z, Storm EE, Conboy CB, Chaudhuri S, Guan Y, Janakiraman V, Jaiswal BS, Guillory J, Ha C, Dijkgraaf GJP, Stinson J, Gnad F, Huntley MA, Degenhardt JD, Haverty PM, Bourgon R, Wang W, Koeppen H, Gentleman R, Starr TK, Zhang Z, Largaespada DA, Wu TD, de Sauvage FJ. “Recurrent R-spondin fusions in colon cancer.” Nature 488, 660–664, 2012.
Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. “Software for Computing and Annotating Genomic Ranges.” PLOS Computational Biology 9, e1003118, 2013.
Heber S, Alekseyev M, Sze S, Tang H, Pevzner PA. “Splicing graphs and EST assembly problem.” Bioinformatics 18 Suppl 1, S181–188, 2002.
Farrell CM, O’Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. “Current status and new features of the Consensus Coding Sequence database”. Nucleic Acids Research 42(Database issue):D865-72, 2014.
sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] XVector_0.9.1
## [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.3
## [3] GenomicFeatures_1.21.13
## [4] AnnotationDbi_1.31.17
## [5] Biobase_2.29.1
## [6] SGSeq_1.3.14
## [7] GenomicRanges_1.21.16
## [8] GenomeInfoDb_1.5.8
## [9] IRanges_2.3.14
## [10] S4Vectors_0.7.10
## [11] BiocGenerics_0.15.3
## [12] knitr_1.10.5
## [13] BiocStyle_1.7.4
##
## loaded via a namespace (and not attached):
## [1] igraph_1.0.1 magrittr_1.5
## [3] zlibbioc_1.15.0 GenomicAlignments_1.5.11
## [5] BiocParallel_1.3.34 stringr_1.0.0
## [7] tools_3.2.1 SummarizedExperiment_0.3.2
## [9] DBI_0.3.1 lambda.r_1.1.7
## [11] futile.logger_1.4.1 htmltools_0.2.6
## [13] yaml_2.1.13 digest_0.6.8
## [15] rtracklayer_1.29.12 formatR_1.2
## [17] futile.options_1.0.0 bitops_1.0-6
## [19] biomaRt_2.25.1 RCurl_1.95-4.7
## [21] RSQLite_1.0.0 evaluate_0.7
## [23] rmarkdown_0.7 stringi_0.5-5
## [25] Biostrings_2.37.2 Rsamtools_1.21.14
## [27] XML_3.98-1.3