Bioconductor: Analysis and comprehension of high-throughput genomic data
Packages, vignettes, work flows
Package installation and use
A package needs to be installed once, using the instructions on the package landing page (e.g., DESeq2).
source("https://bioconductor.org/biocLite.R")
biocLite(c("DESeq2", "org.Hs.eg.db"))
biocLite()
installs Bioconductor, CRAN, and github packages.
Once installed, the package can be loaded into an R session
library(GenomicRanges)
and the help system queried interactively, as outlined above:
help(package="GenomicRanges")
vignette(package="GenomicRanges")
vignette(package="GenomicRanges", "GenomicRangesHOWTOs")
?GRanges
Goals
What a few lines of R has to say
x <- rnorm(1000)
y <- x + rnorm(1000)
df <- data.frame(X=x, Y=y)
plot(Y ~ X, df)
fit <- lm(Y ~ X, df)
anova(fit)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 1001.14 1001.14 1013 < 2.2e-16 ***
## Residuals 998 986.27 0.99
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
abline(fit)
Classes and methods – “S3”
data.frame()
Creates an instance or object
plot()
, lm()
, anova()
, abline()
: methods defined on generics to transform instances
Discovery and help
class(fit)
methods(class=class(fit))
methods(plot)
?"plot"
?"plot.formula"
tab completion!
Bioconductor classes and methods – “S4”
Example: working with DNA sequences
library(Biostrings)
dna <- DNAStringSet(c("AACAT", "GGCGCCT"))
reverseComplement(dna)
## A DNAStringSet instance of length 2
## width seq
## [1] 5 ATGTT
## [2] 7 AGGCGCC
data(phiX174Phage)
phiX174Phage
## A DNAStringSet instance of length 6
## width seq names
## [1] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA Genbank
## [2] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA RF70s
## [3] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA SS78
## [4] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA Bull
## [5] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA G97
## [6] 5386 GAGTTTTATCGCTTCCATGACGCAGAAGTTAAC...TTCGATAAAAATGATTGGCGTATCCAACCTGCA NEB03
letterFrequency(phiX174Phage, "GC", as.prob=TRUE)
## G|C
## [1,] 0.4476420
## [2,] 0.4472707
## [3,] 0.4472707
## [4,] 0.4470850
## [5,] 0.4472707
## [6,] 0.4470850
Discovery and help
class(dna)
?"DNAStringSet-class"
?"reverseComplement,DNAStringSet-method"
Experimental design
Wet-lab sequence preparation (figure from http://rnaseq.uoregon.edu/)
(Illumina) Sequencing (Bentley et al., 2008, doi:10.1038/nature07517)