%\VignetteIndexEntry{Overlap encodings} %\VignetteDepends{pasillaBamSubset, GenomicRanges, Rsamtools, GenomicFeatures} %\VignetteKeywords{sequence, sequencing, alignments} %\VignettePackage{GenomicRanges} \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage[margin=0.65in]{geometry} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\Bioconductor}{\software{Bioconductor}} \SweaveOpts{keep.source=TRUE} \title{Overlap encodings} \author{Herv\'e Pag\`es} \date{Last modified: May 2012; Compiled: \today} \begin{document} \maketitle <>= options(width=100) @ \begin{center} \begin{minipage}{0.8\textwidth} WARNING (May 8, 2012): This vignette is incomplete and was still a WORK IN PROGRESS at the time \Rpackage{GenomicRanges} 1.8 was released (as part of Bioconductor 2.10, released in April 2012). All the work on {\it overlap encodings} is now happening in the {\bf devel} version of the \Rpackage{GenomicRanges} package (\Rpackage{GenomicRanges} 1.9, part of Bioconductor 2.11, at the time of this writing). To use {\it overlap encodings} and related tools, and to get an updated version of this vignette, it is {\bf strongly} recommended that you use Bioconductor 2.11. In particular please make sure that you always use the {\bf latest} version of the \Rpackage{GenomicRanges} package (version 1.9.14 at the time of this writing, expect frequent updates). In other words, this vignette is superseded by the vignette found in \Rpackage{GenomicRanges} 1.9 (part of Bioconductor 2.11). It won't be updated. See \Rcode{?useDevel} in the \Rpackage{BiocInstaller} package for how to use the devel version of Bioconductor (2.11 at the time of this writing). See \Rcode{?biocLite} in the \Rpackage{BiocInstaller} package for how to update all the installed packages. \end{minipage} \end{center} \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the context of an RNA-seq experiment, encoding the overlaps between the aligned reads and the transcripts can be used for detecting those overlaps that are ``compatible'' with the splicing of the transcript. Various tools are provided in the \Rpackage{IRanges} and \Rpackage{GenomicRanges} packages for working with {\it overlap encodings}. In this vignette, we illustrate the use of these tools on real RNA-seq data. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Example 1: With single-end reads} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Load the reads and transcripts} We start by loading some aligned reads into a \Rclass{GappedAlignments} object. The reads are stored in a BAM file ({\tt untreated1\_chr4.bam}) located in the \Rpackage{pasillaBamSubset} data package. This file contains single-end reads from an RNA-seq experiment and aligned against the dm3 genome (see \Rcode{?untreated1\_chr4} in the \Rpackage{pasillaBamSubset} package for more information about those reads): <>= library(pasillaBamSubset) untreated1_chr4() @ We use the \Rfunction{readGappedAlignments} function defined in the \Rpackage{GenomicRanges} package to read the BAM file into a \Rclass{GappedAlignments} object. It's probably a good idea to get rid of the PCR or optical duplicate (flag bit 0x400 in the SAM format, see the SAM Spec \footnote{\url{http://samtools.sourceforge.net/}} for the details), as well as reads not passing quality controls (flag bit 0x200 in the SAM format). We do this by creating a \Rclass{ScanBamParam} object that we pass to \Rcode{readGappedAlignments} (see \Rcode{?ScanBamParam} in the \Rpackage{Rsamtools} package for the details). Note that we also use \Rcode{use.names=TRUE} in order to load the {\it query template names} (QNAME field in the SAM format) from the BAM file (\Rcode{readGappedAlignments} will use them to set the names of the returned object): <>= library(GenomicRanges) library(Rsamtools) flag0 <- scanBamFlag(isDuplicate=FALSE, isValidVendorRead=TRUE) param0 <- ScanBamParam(flag=flag0) gal14 <- readGappedAlignments(untreated1_chr4(), use.names=TRUE, param=param0) @ Our reads can have up to 2 gaps (a gap corresponds to an N operation in the CIGAR): <>= head(gal14) table(ngap(gal14)) @ We also need to retrieve the dm3 transcripts and their exons from UCSC, and extract the exons grouped by transcript in a \Rclass{GRangesList} object. IMPORTANT: The reference genome of the transcripts must be {\bf exactly} the same as the reference genome used to align the reads (note that this is a general rule, not only when working with overlap encodings): <>= library(GenomicFeatures) dm3_refGene_txdb <- makeTranscriptDbFromUCSC(genome="dm3", tablename="refGene") exbytx <- exonsBy(dm3_refGene_txdb, by="tx") @ We check that all the exons in any given transcript belong to the same chromosome and strand. Knowing that our set of transcripts is free of this kind of trans-splicing events will allow us some significant simplifications during the downstream analysis \footnote{Dealing with trans-splicing events is not covered in this document}. A quick and easy way to check this is to take advantage of the fact that \Rcode{seqnames} and \Rcode{strand} return \Rclass{RleList} objects. So we can extract the number of Rle runs for each transcript and make sure it's always 1: <>= table(elementLengths(runLength(seqnames(exbytx)))) table(elementLengths(runLength(strand(exbytx)))) @ Therefore the strand of any given transcript is unambiguously defined and can be extracted with: <>= exbytx_strand <- unlist(runValue(strand(exbytx)), use.names=FALSE) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Find and encode the overlaps} We are ready to compute the overlaps with the \Rfunction{findOverlaps} function. Note that the strand of the queries produced by the RNA-seq experiment is typically unknown so we use \Rcode{ignore.strand=TRUE}: <>= ov14 <- findOverlaps(gal14, exbytx, ignore.strand=TRUE) @ However, the {\it overlap encodings} are strand sensitive so we will compute them twice, once for the original alignments (i.e. the alignments of the original queries), and once again for the ``flipped alignments'' (i.e. the alignments of the flipped queries). We extract the ranges of the original and flipped alignments in 2 \Rclass{GRangesList} objects with: <>= grl14o <- grglist(gal14, order.as.in.query=TRUE) grl14f <- flipQuery(grl14o) @ and encode their overlaps with the transcripts: <>= ovenc14o <- encodeOverlaps(grl14o, exbytx, hits=ov14) ovenc14f <- encodeOverlaps(grl14f, exbytx, hits=ov14) @ \Rcode{ovenc14o} and \Rcode{ovenc14f} are 2 \Rclass{OverlapsEncodings} objects of the same length as \Rclass{Hits} object \Rcode{ov14}. For each hit in \Rcode{ov14}, we have 2 corresponding encodings, one in \Rcode{ovenc14o} and one in \Rcode{ovenc14f}, but only one of them encodes a hit between alignment ranges and exon ranges that are on the same strand. We use the \Rfunction{selectEncodingWithCompatibleStrand} function to merge them into a single \Rclass{OverlapsEncodings} of the same length. For each hit in \Rcode{ov14}, this selects the encoding corresponding to alignment ranges and exon ranges with compatible strand: <>= grl14o_strand <- unlist(runValue(strand(grl14o)), use.names=FALSE) ovenc14 <- selectEncodingWithCompatibleStrand(ovenc14o, ovenc14f, grl14o_strand, exbytx_strand, hits=ov14) ovenc14 @ As a convenience, the 2 above calls to \Rfunction{encodeOverlaps} + merging step can be replaced by a single call to \Rfunction{encodeOverlaps} on either \Rcode{grl14f} or \Rcode{grl14o} with \Rcode{flip.query.if.wrong.strand=TRUE}: <>= ovenc14_again <- encodeOverlaps(grl14o, exbytx, hits=ov14, flip.query.if.wrong.strand=TRUE) stopifnot(identical(ovenc14_again, ovenc14)) @ Unique encodings in \Rcode{ovenc14}: <>= unique_ovenc14 <- levels(encoding(ovenc14)) length(unique_ovenc14) head(unique_ovenc14) ovenc14_table <- table(encoding(ovenc14)) tail(sort(ovenc14_table)) @ Encodings are sort of cryptic but utilities are provided to extract specific meaning from them. Use of these utilities is covered in the next three subsections. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Detect overlaps showing ``compatibility'' with the transcript} We are interested in a particular type of overlap where the read overlaps the transcript in a ``compatible'' way, that is, in a way compatible with the splicing of the transcript. The \Rfunction{isCompatibleWithSplicing} function can be used on an \Rclass{OverlapEncodings} object to detect this type of overlap. Note that \Rfunction{isCompatibleWithSplicing} can also be used on a character vector or factor. \Rcode{ovenc14} contains 7 unique encodings ``compatible'' with the splicing of the transcript: <>= sort(ovenc14_table[isCompatibleWithSplicing(unique_ovenc14)]) @ Encodings \Rcode{"1:i:"} (403826 occurences in \Rcode{ovenc14}), \Rcode{"2:jm:af:"} (68914 occurences in \Rcode{ovenc14}), and \Rcode{"3:jmm:agm:aaf:"} (438 occurences in \Rcode{ovenc14}), correspond to the following overlaps: \begin{itemize} \item \Rcode{"1:i:"} \begin{verbatim} - read (no gap): oooooooo - transcript: ... >>>>>>>>>>>>>> ... \end{verbatim} \item \Rcode{"2:jm:af:"} \begin{verbatim} - read (1 gap): ooooo---ooo - transcript: ... >>>>>>>>> >>>>>>>>> ... \end{verbatim} \item \Rcode{"3:jmm:agm:aaf:"} \begin{verbatim} - read (2 gaps): oo---ooooo---o - transcript: ... >>>>>>>> >>>>> >>>>>>> ... \end{verbatim} \end{itemize} For clarity, only the exons involved in the overlap are represented. The transcript can of course have more upstream and downstream exons, which is denoted by the ... on the left side (5' end) and right side (3' end) of each drawing. Note that the exons represented in the 2nd and 3rd drawings are consecutive in the transcript. Encodings \Rcode{"1:f:"} and \Rcode{"1:j:"} are variations of the situation described by encoding \Rcode{"1:i:"}. For \Rcode{"1:f:"}, the first aligned base of the read (or flipped read) is aligned with the first base of the exon. For \Rcode{"1:j:"}, the last aligned base of the read (or flipped read) is aligned with the last base of the exon: \begin{itemize} \item \Rcode{"1:f:"} \begin{verbatim} - read (no gap): oooooooo - transcript: ... >>>>>>>>>>>>>> ... \end{verbatim} \item \Rcode{"1:j:"} \begin{verbatim} - read (no gap): oooooooo - transcript: ... >>>>>>>>>>>>>> ... \end{verbatim} \end{itemize} <>= ov14_is_compat <- isCompatibleWithSplicing(ovenc14) table(ov14_is_compat) # 476124 "compatible" overlaps @ Number of ``compatible'' overlaps per alignment in \Rcode{gal14}: <>= gal14_ncompat <- tabulate(queryHits(ov14)[ov14_is_compat], nbins=length(gal14)) elementMetadata(gal14)$ncompat <- gal14_ncompat head(gal14) table(gal14_ncompat) @ Number of alignments in \Rcode{gal14} with at least 1 ``compatible'' overlap: <>= sum(gal14_ncompat != 0) @ Number of ``compatible'' overlaps per transcript in \Rcode{exbytx}: <>= exbytx_ncompat14 <- tabulate(subjectHits(ov14)[ov14_is_compat], nbins=length(exbytx)) names(exbytx_ncompat14) <- names(exbytx) tail(table(exbytx_ncompat14)) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Detect overlaps showing ``almost compatibility'' with the transcript} In many aspects, ``compatible'' overlaps correspond to an ideal situation but in practise many overlaps don't fall into that category. Here we are interested in a less perfect type of overlap where the read overlaps the transcript in a way that {\it would} be ``compatible'' if 1 or more exons were removed from the transcript. In that case we say that the overlap is ``almost compatible'' with the transcript. The \Rfunction{isCompatibleWithSkippedExons} function can be used on an \Rclass{OverlapEncodings} object to detect this type of overlap. Note that \Rfunction{isCompatibleWithSkippedExons} can also be used on a character vector of factor. \Rcode{ovenc14} contains 7 unique encodings ``almost compatible'' with the splicing of the transcript: <>= sort(ovenc14_table[isCompatibleWithSkippedExons(unique_ovenc14)]) @ Encodings \Rcode{"2:jm:am:af:"} (696 occurences in \Rcode{ovenc14}), \Rcode{"2:jm:am:am:af:"} (114 occurences in \Rcode{ovenc14}), and \Rcode{"3:jmm:agm:aam:aaf:"} (18 occurences in \Rcode{ovenc14}), correspond to the following overlaps: \begin{itemize} \item \Rcode{"2:jm:am:af:"} \begin{verbatim} - read (1 gap): ooooo----------ooo - transcript: ... >>>>>>> >>>> >>>>>>>> ... \end{verbatim} \item \Rcode{"2:jm:am:am:af:"} \begin{verbatim} - read (1 gap): ooooo------------------ooo - transcript: ... >>>>>>> >>>> >>>>> >>>>>>>> ... \end{verbatim} \item \Rcode{"3:jmm:agm:aam:aaf:"} \begin{verbatim} - read (2 gaps): oo---oooo-----------oo - transcript: ... >>>>>>> >>>> >>>>> >>>>>>>> ... \end{verbatim} \end{itemize} <>= ov14_is_almostcompat <- isCompatibleWithSkippedExons(ovenc14) table(ov14_is_almostcompat) # 837 "almost compatible" overlaps @ Number of ``almost compatible'' overlaps per alignment in \Rcode{gal14}: <>= gal14_nalmostcompat <- tabulate(queryHits(ov14)[ov14_is_almostcompat], nbins=length(gal14)) elementMetadata(gal14)$nalmostcompat <- gal14_nalmostcompat head(gal14) table(gal14_nalmostcompat) @ Number of alignments in \Rcode{gal14} with at least 1 ``almost compatible'' overlap: <>= sum(gal14_nalmostcompat != 0) @ Number of ``almost compatible'' overlaps per transcript in \Rcode{exbytx}: <>= exbytx_nalmostcompat14 <- tabulate(subjectHits(ov14)[ov14_is_almostcompat], nbins=length(exbytx)) names(exbytx_nalmostcompat14) <- names(exbytx) table(exbytx_nalmostcompat14) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Combining results of \Rcode{isCompatibleWithSplicing()} and \Rcode{isCompatibleWithSkippedExons()} to detect novel splice junctions} An alignment in \Rcode{gal14} with ``almost compatible'' hits but no ``compatible'' hits suggests the presence of one or more transcripts that are not in our annotations. First we extract the index of those alignments: <>= aln_shows_nov_splice_jct <- gal14_nalmostcompat != 0L & gal14_ncompat == 0L head(which(aln_shows_nov_splice_jct)) @ We make this an index into \Rcode{ov14} (Hits object): <>= is_nov_splice_jct <- queryHits(ov14) %in% which(aln_shows_nov_splice_jct) @ We intersect with \Rcode{is\_almost\_compat} to keep only the overlaps of interest: <>= is_nov_splice_jct <- is_nov_splice_jct & ov14_is_almostcompat @ For each overlap of interest, we extract the ranks of the skipped exons (we use a list for this as there might be more than 1 skipped exon per overlap): <>= skpexrk <- extractSkippedExonRanks(ovenc14)[is_nov_splice_jct] table(elementLengths(skpexrk)) @ Finally, we split \Rcode{skpexrk} by transcript TxDb internal id: <>= names(skpexrk) <- queryHits(ov14)[is_nov_splice_jct] f <- names(exbytx)[subjectHits(ov14)[is_nov_splice_jct]] tx2skpexrk <- split(skpexrk, f) @ \Rcode{tx2skpexrk} is a named list of named lists of integer vectors. The first level of names (outer names) are transcript TxDb internal ids and the second level of names (inner names) are alignment indices into \Rcode{gal14}: <>= head(names(tx2skpexrk)) # transcript TxDb internal ids @ Transcript \Rcode{"10"} has 7 hits. All of them skip exons 9 and 10: <>= tx2skpexrk[["10"]] @ Transcript \Rcode{"58"} has 4 hits. Two of them skip exon 2, one of them skips exons 2 to 6, and one of them skips exon 10: <>= tx2skpexrk[["58"]] @ A few words about the interpretation of \Rcode{tx2skpexrk}: Because of how we've conducted this analysis, the aligments reported in \Rcode{tx2skpexrk} are guaranteed to NOT have any ``compatible'' overlaps with other known transcripts. All we can say, for example in the case of transcript \Rcode{"10"}, is that the 7 reported hits that skip exons 9 and 10 show evidence of one or more unknown transcripts with a splice junction that corresponds to the gap between exons 8 and 11. But without further analysis, we can't make any assumption about the exons structure of those unknown transcripts. In particular, we cannot assume the existence of an unknown transcript made of the same exons as transcript \Rcode{"10"} minus exons 9 and 10! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Example 2: With paired-end reads} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Load the reads and transcripts} We start by loading some aligned paired-end reads into a \Rclass{GappedAlignmentPairs} object. The reads are stored in a BAM file ({\tt untreated3\_chr4.bam}) located in the \Rpackage{pasillaBamSubset} data package. This file contains paired-end reads from an RNA-seq experiment and aligned against the dm3 genome (see \Rcode{?untreated3\_chr4} in the \Rpackage{pasillaBamSubset} package for more information about those reads): <>= untreated3_chr4() @ We use the \Rfunction{readGappedAlignmentPairs} function to read the BAM file into a \Rclass{GappedAlignmentPairs} object: <>= galp34 <- readGappedAlignmentPairs(untreated3_chr4(), use.names=TRUE, param=param0) head(galp34) @ The \Rcode{show} method for \Rclass{GappedAlignmentPairs} objects displays two {\tt ranges} columns, one for the {\it first} alignment in the pair (the left column), and one for the {\it last} alignment in the pair (the right column). The {\tt strand} column corresponds to the strand of the {\it first} alignment. <>= head(first(galp34)) head(last(galp34)) @ According to the SAM format specifications, the aligner is expected to mark each alignment pair as {\it proper} or not (flag bit 0x2 in the SAM format). The SAM Spec only says that a pair is {\it proper} if the {\it first} and {\it last} alignments in the pair are ``properly aligned according to the aligner''. So the exact criteria used for setting this flag is left to the aligner. We use \Rcode{isProperPair} to extract this flag from the \Rclass{GappedAlignmentPairs} object: <>= table(isProperPair(galp34)) @ Even though we could do {\it overlap encodings} with the full object, we keep only the {\it proper} pairs for our downstream analysis: <>= galp34 <- galp34[isProperPair(galp34)] @ For the transcript, we'll reuse the \Rcode{exbytx} object obtained previously. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Find and encode the overlaps} Let's compute the overlaps: <>= ov34 <- findOverlaps(galp34, exbytx, ignore.strand=TRUE) @ and encode them: <>= grl34 <- grglist(galp34, order.as.in.query=TRUE) ovenc34 <- encodeOverlaps(grl34, exbytx, hits=ov34, flip.query.if.wrong.strand=TRUE) ovenc34 @ Unique encodings in \Rcode{ovenc34}: <>= unique_ovenc34 <- levels(encoding(ovenc34)) length(unique_ovenc34) head(unique_ovenc34) ovenc34_table <- table(encoding(ovenc34)) tail(sort(ovenc34_table)) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Detect overlaps showing ``compatibility'' with the transcript} \Rcode{ovenc34} contains 13 unique encodings ``compatible'' with the splicing of the transcript: <>= sort(ovenc34_table[isCompatibleWithSplicing(unique_ovenc34)]) @ Encodings \Rcode{"1{-}{-}1:i{-}{-}i:"} (89039 occurences in \Rcode{ovenc34}), \Rcode{"2{-}{-}1:jm{-}{-}m:af{-}{-}i:"} (2825 occurences in \Rcode{ovenc34}), \Rcode{"1{-}{-}2:i{-}{-}jm:a{-}{-}af:"} (2339 occurences in \Rcode{ovenc34}), and \Rcode{"1{-}{-}1:i{-}{-}m:a{-}{-}i:"} (342 occurences in \Rcode{ovenc34}), correspond to the following overlaps: \begin{itemize} \item \Rcode{"1{-}{-}1:i{-}{-}i:"} \begin{verbatim} - paired-end read (no gap on the first end, no gap on the last end): oooo oooo - transcript: ... >>>>>>>>>>>>>>>> ... \end{verbatim} \item \Rcode{"2{-}{-}1:jm{-}{-}m:af{-}{-}i:"} \begin{verbatim} - paired-end read (1 gap on the first end, no gap on the last end): ooo---o oooo - transcript: ... >>>>>>>> >>>>>>>>>>> ... \end{verbatim} \item \Rcode{"1{-}{-}2:i{-}{-}jm:a{-}{-}af:"} \begin{verbatim} - paired-end read (no gap on the first end, 1 gap on the last end): oooo oo---oo - transcript: ... >>>>>>>>>>>>>> >>>>>>>>> ... \end{verbatim} \item \Rcode{"1{-}{-}1:i{-}{-}m:a{-}{-}i:"} \begin{verbatim} - paired-end read (no gap on the first end, no gap on the last end): oooo oooo - transcript: ... >>>>>>>>> >>>>>>> ... \end{verbatim} \end{itemize} Note: switch use of ``first'' and ``last'' above if the read was flipped. <>= ov34_is_compat <- isCompatibleWithSplicing(ovenc34) table(ov34_is_compat) # 95801 "compatible" overlaps @ Number of ``compatible'' overlaps per alignment pair in \Rcode{galp34}: <>= galp34_ncompat <- tabulate(queryHits(ov34)[ov34_is_compat], nbins=length(galp34)) elementMetadata(galp34)$ncompat <- galp34_ncompat head(galp34) table(galp34_ncompat) @ Number of alignment pairs in \Rcode{galp34} with at least 1 ``compatible'' overlap: <>= sum(galp34_ncompat != 0) @ Number of ``compatible'' overlaps per transcript in \Rcode{exbytx}: <>= exbytx_ncompat34 <- tabulate(subjectHits(ov34)[ov34_is_compat], nbins=length(exbytx)) names(exbytx_ncompat34) <- names(exbytx) tail(table(exbytx_ncompat34)) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{\Rcode{sessionInfo()}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= sessionInfo() @ \end{document}