Example of importing tRNAdb output as GRanges
tRNA 1.22.0
The tRNA package provides access to tRNA feature information for subsetting
and visualization. Visualization functions are implemented to compare feature
parameters of multiple tRNA sets and to correlate them to additional data.
As input the package expects a GRanges object with certain metadata columns.
The following columns are required: tRNA_length, tRNA_type,
tRNA_anticodon, tRNA_seq, tRNA_str, tRNA_CCA.end. The tRNA_str column
must contain a valid dot bracket annotation. For more details please have a look
at the vignette of the Structstrings package.
To work with the tRNA package, tRNA information can be retrieved or loaded
into a R session in a number of ways:
GRanges object can be constructed manually containing the required
colums mentioned above.import.tRNAscanAsGRanges() from the tRNAscanImport packageimport.tRNAdb() from the tRNAdbImport packageFor the examples in this vignette a number of predefined GRanges objects are
loaded.
library(tRNA)
library(Structstrings)
data("gr", package = "tRNA")To retrieve the sequences for individual tRNA structure elements the functions
gettRNAstructureGRanges or gettRNAstructureSeqs can be used. Several
optional arguments can be used to modify the result (See
?gettRNAstructureSeqs).
# just get the coordinates of the anticodonloop
gettRNAstructureGRanges(gr, structure = "anticodonLoop")## $anticodonLoop
## IRanges object with 299 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   TGG        31        37         7
##   TGC        32        38         7
##   CAA        31        37         7
##   AGA        31        37         7
##   TAA        31        37         7
##   ...       ...       ...       ...
##   CAT        32        38         7
##   GAA        31        37         7
##   TTA        31        37         7
##   TAC        32        38         7
##   CAT        32        38         7gettRNAstructureSeqs(gr, joinFeatures = TRUE, structure = "anticodonLoop")## $anticodonLoop
## RNAStringSet object of length 299:
##       width seq                                             names               
##   [1]     7 UUUGGGU                                         TGG
##   [2]     7 CUUGCAA                                         TGC
##   [3]     7 UUCAAGC                                         CAA
##   [4]     7 UUAGAAA                                         AGA
##   [5]     7 CUUAAGA                                         TAA
##   ...   ... ...
## [295]     7 CUCAUAA                                         CAT
## [296]     7 UUGAAGA                                         GAA
## [297]     7 UUUUAGU                                         TTA
## [298]     7 UUUACAC                                         TAC
## [299]     7 GUCAUGA                                         CATIn addition, the sequences can be returned already joined to get a fully blank
padded set of sequences. The boundaries of the individual structures is returned
as metadata of the RNAStringSet object.
seqs <- gettRNAstructureSeqs(gr[1L:10L], joinCompletely = TRUE)
seqs## RNAStringSet object of length 10:
##      width seq
##  [1]    85 GGGCGUGUGGUC-UAGU-GGUAU-GAUUCUCGC...------GCCUGGGUUCAAUUCCCAGCUCGCCCC
##  [2]    85 GGGCACAUGGCGCAGUU-GGU-AGCGCGCUUCC...------GCAUCGGUUCGAUUCCGGUUGCGUCCA
##  [3]    85 GGUUGUUUGGCC-GAGC-GGUAA-GGCGCCUGA...AA-GAUGCAAGAGUUCGAAUCUCUUAGCAACCA
##  [4]    85 GGCAACUUGGCC-GAGU-GGUAA-GGCGAAAGA...U-GCCCGCGCAGGUUCGAGUCCUGCAGUUGUCG
##  [5]    85 GGAGGGUUGGCC-GAGU-GGUAA-GGCGGCAGA...UUGUCCGCGCGAGUUCGAACCUCGCAUCCUUCA
##  [6]    85 GCGGAUUUAGCUCAGUU-GGG-AGAGCGCCAGA...------GCCUGUGUUCGAUCCACAGAAUUCGCA
##  [7]    85 GGUCUCUUGGCC-CAGUUGGUAA-GGCACCGUG...------ACAGCGGUUCGAUCCCGCUAGAGACCA
##  [8]    85 GCGCAAGUGGUUUAGU--GGU-AAAAUCCAACG...-------CCCCGGUUCGAUUCCGGGCUUGCGCA
##  [9]    85 GGCAACUUGGCC-GAGU-GGUAA-GGCGAAAGA...U-GCCCGCGCAGGUUCGAGUCCUGCAGUUGUCG
## [10]    85 GCUUCUAUGGCC-AAGUUGGUAA-GGCGCCACA...------ACAUCGGUUCAAAUCCGAUUGGAAGCA# getting the tRNA structure boundaries
metadata(seqs)[["tRNA_structures"]]## IRanges object with 15 ranges and 0 metadata columns:
##                           start       end     width
##                       <integer> <integer> <integer>
##   acceptorStem.prime5         1         7         7
##               Dprime5         8         9         2
##          DStem.prime5        10        13         4
##                 Dloop        14        23        10
##          DStem.prime3        24        27         4
##                   ...       ...       ...       ...
##          TStem.prime5        61        65         5
##                 Tloop        66        72         7
##          TStem.prime3        73        77         5
##   acceptorStem.prime3        78        84         7
##         discriminator        85        85         1Be aware, that gettRNAstructureGRanges and gettRNAstructureSeqs might not be
working as expected, if the tRNA sequences in questions are armless or deviate
drastically from the canonical tRNA model. The functions in the tRNA packages
were thouroughly tested using human mitochondrial tRNA and other tRNAs missing
certain features. However, for fringe cases results may differ. If you encounter
such a case, please report it with an example.
Structure information of the tRNA can be queried for subsetting using several
functions. For the following examples the functions hasAccpeptorStem and
hasDloop are used.
gr[hasAcceptorStem(gr, unpaired = TRUE)]
# mismatches and bulged are subsets of unpaired
gr[hasAcceptorStem(gr, mismatches = TRUE)]
gr[hasAcceptorStem(gr, bulged = TRUE)]
# combination of different structure parameters
gr[hasAcceptorStem(gr, mismatches = TRUE) & 
     hasDloop(gr, length = 8L)]Please have a look at the man page ?hasAccpeptorStem for all available
subsetting functions.
To get an overview of tRNA features and compare different datasets, the function
gettRNAFeaturePlots is used. It accepts a named GRangesList as input.
Internally it will calculate a list of features values based on the functions
mentioned above and the data contained in the mcols of the GRanges objects.
# load tRNA data for E. coli and H. sapiens
data("gr_eco", package = "tRNA")
data("gr_human", package = "tRNA")
# get summary plots
grl <- GRangesList(Sce = gr,
                   Hsa = gr_human,
                   Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)plots$length
Figure 1: tRNA length
plots$tRNAscan_score
Figure 2: tRNAscan-SE scores
plots$gc
Figure 3: tRNA GC content
plots$tRNAscan_intron
Figure 4: tRNAs with introns
plots$variableLoop_length
Figure 5: Length of the variable loop
To access the results without generating plots, use the function
gettRNASummary.
To check whether features correlate with additional scores, optional arguments
can be added to gettRNAFeaturePlots or used from the score column of the
GRanges objects. For the first case a list of scores with the same dimensions
as the GRangesList object has to be provided as the argument scores. For the
latter case, just set the argument plotScore = TRUE.
# score column will be used
plots <- gettRNAFeaturePlots(grl, plotScores = TRUE)plots <- gettRNAFeaturePlots(grl,
                             scores = list(runif(length(grl[[1L]]),0L,100L),
                                           runif(length(grl[[2L]]),0L,100L),
                                           runif(length(grl[[3L]]),0L,100L)))plots$length
Figure 6: tRNA length and score correlation
plots$variableLoop_length
Figure 7: variable loop length and score correlation
Since all plots returned by the functions mentioned above are ggplot2 objects,
they can be modified manually and changed to suit your needs.
plots$length$layers <- plots$length$layers[c(-1L,-2L)]
plots$length + ggplot2::geom_boxplot()
Figure 8: Customized plot switching out the point and violin plot into a boxplot
In addition, the data of the plots can be accessed directly.
head(plots$length$data)The colours of the plots can be customized directly on creation with the following options.
options("tRNA_colour_palette")## $tRNA_colour_palette
## [1] "Set1"options("tRNA_colour_yes")## $tRNA_colour_yes
## [1] "green"options("tRNA_colour_no")## $tRNA_colour_no
## [1] "red"To retrieve detailed information on the base pairing the function
gettRNABasePairing() is used. Internally this will construct a
DotBracketStringSet from the tRNA_str column, if this column does not
already contain a DotBracketStringSet. It is then passed on to the
Structstrings::getBasePairing function.
A valid DotBracket annotation is expected to contain only pairs of <>{}[]()
and the . character (Please note the orientation. For <> the orientation is
variable, since the tRNAscan files use the >< annotation by default. However
upon creation of a DotBracketStringSet this annotation will be converted).
head(gettRNABasePairing(gr)[[1L]])## DotBracketDataFrame with 6 rows and 4 columns
##         pos   forward   reverse   character
##   <integer> <integer> <integer> <character>
## 1         1         1        70           <
## 2         2         2        69           <
## 3         3         3        68           <
## 4         4         4        67           <
## 5         5         5        66           <
## 6         6         0         0           .head(getBasePairing(gr[1L]$tRNA_str)[[1L]])## DotBracketDataFrame with 6 rows and 4 columns
##         pos   forward   reverse   character
##   <integer> <integer> <integer> <character>
## 1         1         1        70           <
## 2         2         2        69           <
## 3         3         3        68           <
## 4         4         4        67           <
## 5         5         5        66           <
## 6         6         0         0           .The loop ids for the structure elements can be retrieved with the
gettRNALoopIDs() function, which relies on the Structstrings::getLoopIndices
function. (For more details, please have a look at the ?getLoopIndices)
gettRNALoopIDs(gr)[[1L]]##  [1]  1  2  3  4  5  5  6  6  6  7  8  9  9  9  9  9  9  9  9  9  9  9  8  7  6
## [26] 10 11 12 13 14 14 14 14 14 14 14 14 14 13 12 11 10  6  6  6  6 15 16 17 18
## [51] 19 19 19 19 19 19 19 19 19 18 17 16 15  6  5  5  4  3  2  1  0getLoopIndices(gr[1L]$tRNA_str)## LoopIndexList of length 1
## [[""]] 1 2 3 4 5 5 6 6 6 7 8 9 9 9 9 9 ... 19 19 19 18 17 16 15 6 5 5 4 3 2 1 0sessionInfo()## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tRNA_1.22.0          Structstrings_1.20.0 Biostrings_2.72.0   
##  [4] XVector_0.44.0       GenomicRanges_1.56.0 GenomeInfoDb_1.40.0 
##  [7] IRanges_2.38.0       S4Vectors_0.42.0     BiocGenerics_0.50.0 
## [10] BiocStyle_2.32.0    
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9              utf8_1.2.4              generics_0.1.3         
##  [4] stringi_1.8.3           digest_0.6.35           magrittr_2.0.3         
##  [7] RColorBrewer_1.1-3      evaluate_0.23           grid_4.4.0             
## [10] bookdown_0.39           fastmap_1.1.1           jsonlite_1.8.8         
## [13] tinytex_0.50            BiocManager_1.30.22     httr_1.4.7             
## [16] fansi_1.0.6             UCSC.utils_1.0.0        scales_1.3.0           
## [19] jquerylib_0.1.4         cli_3.6.2               rlang_1.1.3            
## [22] crayon_1.5.2            munsell_0.5.1           withr_3.0.0            
## [25] cachem_1.0.8            yaml_2.3.8              tools_4.4.0            
## [28] dplyr_1.1.4             colorspace_2.1-0        ggplot2_3.5.1          
## [31] GenomeInfoDbData_1.2.12 vctrs_0.6.5             R6_2.5.1               
## [34] magick_2.8.3            lifecycle_1.0.4         zlibbioc_1.50.0        
## [37] stringr_1.5.1           Modstrings_1.20.0       pkgconfig_2.0.3        
## [40] bslib_0.7.0             pillar_1.9.0            gtable_0.3.5           
## [43] Rcpp_1.0.12             glue_1.7.0              highr_0.10             
## [46] xfun_0.43               tibble_3.2.1            tidyselect_1.2.1       
## [49] knitr_1.46              farver_2.1.1            htmltools_0.5.8.1      
## [52] rmarkdown_2.26          labeling_0.4.3          compiler_4.4.0