Martin Morgan
February 3, 2015
R is a programming language
Your own functions
fun <- function(n) {
x <- rnorm(n)
y <- x + rnorm(n, sd=.5)
lm(y ~ x)
}
Iteration
## 'side effects'
for (i in 1:10)
message(i)
## result for subsequent computation
sapply(1:10, function(i) {
sum(rnorm(1000))
})
lapply()
, apply()
, mapply() / Map()
.Conditional execution
sapply(1:10, function(i) {
if (sum(rnorm(1000)) > 0) {
ans <- "hi!"
} else {
ans <- "low!"
}
ans
})
Scripts contain a combination of data and transformations. Often the transformations are idiosyncratic, and rely heavily on functions provided by various packages. Sometimes the script contains a useful chunk of code that could be reused in different places. Examples we've encountered in this course might include GC-content of DNA sequences (or is there a function for that already? check out Biostrings!) and creating a simple 'map' from one type of annotation to another.
It is easy and beneficial to create a package.
What is an R package?
Simple directory structure of required files. Minimal:
MyPackage
|-- DESCRIPTION (title, author, version, etc.)
|-- NAMESPACE (imports / exports)
|-- R (R function definitions)
|-- gc.R
|-- annoHelper.R
|-- man (help pages)
|-- MyPackage.Rd
|-- gc.Rd
|-- annoHelper.Rd
Check out the Rstudio package wizard!
Write a function that takes a DNAStringSet
and returns the GC content.
Modify the function using a conditional statement to work whether
provided a DNAString
or a DNAStringSet
. Test the function.
Save the function in a file on your AMI.
Write a function that takes as its argument Ensembl gene identifiers
(like the rownames()
of the SummarizedExperiment object in the
RNASeq vignette yesterday) and uses the select()
method and
annotation package org.Hs.eg.db to return a named
character vector, where the names of the vector are the Ensembl
identifiers and the values are the corresponding gene SYMBOLs. Adopt
some simple-to-implement policy for handling Ensembl identifiers that
map to more than one gene symbol. Save this function to another file
Use the RStudio wizard to create a package from the files containing your GC-content and annotation-helper functions.