For an overview of the design principles and use of Bioconductor sequence classes, see Lawrence et al., 2013, Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9(8): doi:10.1371/journal.pcbi.1003118
For an overview of select high-throughput sequence packages in Bioconductor, see Intermediate Sequence Analysis 2013 section 3.3.
Classes
IRanges()
, GRanges()
metadata()
), and on
individual elements (e.g., mcols()
)Vector()
, e.g., length()
,
subset, etc.*List()
, e.g., IntegerList()
, GRangesList()
IntegerList()
is a list where all elements
are integer vectorsinteger()
with
partitioning vector.DataFrame()
Rle()
Methods
Users
DNAString()
, DNAStringSet()
classes
XString()
, XStringSet
classes.Users
XStringSet
as basis for coordinating short reads
and quality scores.DNAString
to represent whole genome sequences.GAlignments()
, GAlignmentsList()
, GAlignmentPairs()
SummarizedExperiment()
VCF()
ShortRead – FASTQ files
ShortReadQ()
– Reads and their quality scoresimport()
, so complexity hidden from userVariantAnnotation readVcf()
, filterVcf()
. Manage large data by::
ScanVcfParam()
.ScanVcfParam()
readInfo()
, readGeno()
TabixFile(<...>,
yieldSize=10000)
and a paradigm liketbx <- open(TabixFile(fl, yieldSize=10000))
repeat({
vcf <- readVcf(tbx, "hg19") ## up to 10000 records
if (length(vcf) == 0)
break ## all done
## do work
}
close(tbx)
filterVcf()
Rsamtools BamFile()
and TabixFile()
to open and iterate
through BAM and Tabix files
ScanBamParam()
;
iterate through large files using yieldSize
argument of
BamFile()
.readGAlignmentsFromBam()
, readGAlignmentsListFromBam()
ShortRead FastqStreamer()
, FastqSampler()
, readFastq()
yield()
on an instance created by FastqStreamer()
yield()
onan instance created with FastqSampler()
Annotated o metadata -- Vector o many methods (showMethods(class="Vector", where=search())) -- Rle -- List -- SimpleList -- DataFrame -- Simple*List, e.g., SimpleNumericList -- CompressedList (IRanges package) -- Compressed*List, e.g., CompressedNumericList -- Ranges -- IRanges -- ... *StringSet, e.g., DNAStringSet -- GenomicRanges -- GRanges (GenomicRanges package) -- ... *String, e.g., DNAString (Biostrings package) o transcribe, reverseComplement, pairwiseAligment SummarizedExperiment (GenomicRanges package) -- VCF (VariantAnnotation package; readVcf) ShortReadQ