Why bother?
GRanges
for describing regions of interest
in sequencing experimentsMotivation
'Is a' versus 'has a'
Starting our class definition
matrix()
with row- and column data.frame()
annotationsConstruct with setClass
; returns a simple generating function.
.AnnMat <- setClass("AnnMat",
representation(matData="matrix", rowData="data.frame",
colData="data.frame"))
contains=
argument;
multiple in inheritance possible.am0 <- .AnnMat()
am1 <- .AnnMat(
matData=matrix(1:10, 2, dimnames=list(letters[1:2], LETTERS[1:5])),
rowData=data.frame(roi_id=1:2),
colData=data.frame(
sample_id=1:5,
treatment=c("A", "A", "B", "B", "B")))
A simple method: accessors (“getters”)
Definition of generics
setGeneric("rowData", function(x, ...) standardGeneric("rowData"))
## [1] "rowData"
setGeneric("colData", function(x, ...) standardGeneric("colData"))
## [1] "colData"
setGeneric("matData", function(x, ...) standardGeneric("matData"))
## [1] "matData"
Then methods implemented of class-specific methods on the generics
setMethod("rowData", "AnnMat", function(x, ...) x@rowData)
## [1] "rowData"
setMethod("colData", "AnnMat", function(x, ...) x@colData)
## [1] "colData"
setMethod("matData", "AnnMat", function(x, ...) x@matData)
## [1] "matData"
A simple method: dim
and dimnames
Discover existing generic, for signature
getGeneric("dim")
## standardGeneric for "dim" defined from package "base"
##
## function (x)
## standardGeneric("dim", .Primitive("dim"))
## <bytecode: 0x103cb4558>
## <environment: 0x103c9ced8>
## Methods may be defined for arguments: x
## Use showMethods("dim") for currently available ones.
setMethod("dim", "AnnMat", function(x) dim(matData(x)))
## [1] "dim"
setMethod("dimnames", "AnnMat", function(x) dimnames(matData(x)))
## [1] "dimnames"
nrow()
and ncol()
(and rownames()
and
colnames()
) for free! Why is that?dim(am1)
## [1] 2 5
nrow(am1)
## [1] 2
A simple method: show
getGeneric("show")
showMethods("show", where=search())
Our implementation: brief summary, with an eye toward re-use by derived classes. Avoid direct slot access
setMethod("show", "AnnMat", function(object)
{
cat("class:", class(object), "\n")
cat("dim:", dim(object), "\n")
cat("rowData names():", names(rowData(object)), "\n")
cat("colData names():", names(colData(object)), "\n")
})
## [1] "show"
am1
## class: AnnMat
## dim: 2 5
## rowData names(): roi_id
## colData names(): sample_id treatment
Question Is there a better overall philosophy for show
?
A more complicated method: updating ('replacement', 'setter') methods
Provide the illusion and simple syntax for in-place modification
A familiar example: update the value of a column in a data.frame
df <- data.frame(x=1:5, y=5:1)
df[,"x"] <- log(df$x)
df[,"x"]
as
[.data.frame
, “the subset method for data.frame”, and df[,"x"]
<- value
as
[<-.data.frame
“the subset-replace method for data.frame”.[<-.data.frame
head(get("[<-.data.frame"))
##
## 1 function (x, i, j, value)
## 2 {
## 3 if (!all(names(sys.call()) %in% c("", "value")))
## 4 warning("named arguments are discouraged")
## 5 nA <- nargs()
## 6 if (nA == 4L) {
R's parser translates df[,"x"] <- value
to
[<-.data.frame(x, , “x”, value) and actually modifies (a copied,
if necessary) first argument.
Replacement methods, e.g., matData<-
, signature takes the object
to be updated, additional optional arguments, and the value to
update the argument with
setGeneric("matData<-", function(x, ..., value)
standardGeneric("matData<-"))
## [1] "matData<-"
Dispatch on one or both of x
, value
Impelement as a method that dispatches on both the object and value, updates the slot, and returns the updated object.
setReplaceMethod("matData", c("AnnMat", "matrix"),
function(x, ..., value)
{
x@matData <- value
x
})
## [1] "matData<-"
Exercise: walk through how that assignment in the body works.
Another replacement method, for dimnames<-
(the generic already
exists; what is it?)
setReplaceMethod("dimnames", c("AnnMat", "list"),
function(x, value)
{
dimnames(matData(x)) <- value
value
})
## [1] "dimnames<-"
Exercise: walk through how that assignment in the body of the method works
Hey neat, we get rownames<-
and colnames<-
for free!
A more complicated operation: validity
Constraints on row, column and matrix dimensions: all must be equal
validity
argument to setClass
, or setValidity()
function call.
Validity function is weird
Evaluated frequently, so needs to be efficient / light-weight
setValidity("AnnMat", function(object) {
msg <- NULL
if (nrow(rowData(object)) != nrow(matData(object)))
msg <- c(msg,
"number of rowData rows and matData rows differ")
if (nrow(colData(object)) != ncol(matData(object)))
msg <- c(msg,
"number of colData rows and matData columns differ")
if (is.null(msg)) TRUE else msg
})
## Class "AnnMat" [in ".GlobalEnv"]
##
## Slots:
##
## Name: matData rowData colData
## Class: matrix data.frame data.frame
In action:
.AnnMat(matData=matrix(1:10, 2),
rowData=data.frame(roi_id=1:2),
colData=data.frame(sample_id=1:5))
## class: AnnMat
## dim: 2 5
## rowData names(): roi_id
## colData names(): sample_id
cat(try({
.AnnMat(matData=matrix(1:10, 2),
rowData=data.frame(roi_id=1:5),
colData=data.frame(sample_id=1:2))
}))
## Error in validObject(.Object) :
## invalid class "AnnMat" object: 1: number of rowData rows and matData rows differ
## invalid class "AnnMat" object: 2: number of colData rows and matData columns differ
A more complicated method: subsetting
Why do we need this? Part of the informal matrix 'API' expected by a user
Discovery
getGeneric("[")
## standardGeneric for "[" defined from package "base"
##
## function (x, i, j, ..., drop = TRUE)
## standardGeneric("[", .Primitive("["))
## <bytecode: 0x102bac148>
## <environment: 0x102c2b190>
## Methods may be defined for arguments: x, i, j, drop
## Use showMethods("[") for currently available ones.
Possible methods multiply – x
times i
times j
; e.g., i
could be
integer, logical, character, …
One approach – facade of methods that do minimal work to translate into a common base function
Special variable classes: ANY
, missing
Exploit default initialize
function, which acts as a copy constructor
that updates slots in its first argument with values provided by named
arguments.
Pass '…' to allow derived classes to use this method
setMethod("[", c("AnnMat", "ANY", "ANY"),
function(x, i, j, ..., drop=TRUE)
{
## FIXME: warn user about ignoring 'drop'?
initialize(x, matData=matData(x)[i, j, drop=FALSE],
rowData=rowData(x)[i,,drop=FALSE],
colData=colData(x)[,j,drop=FALSE], ...)
})
## [1] "["
setMethod("[", c("AnnMat", "ANY", "missing"),
function(x, i, j, ..., drop=TRUE)
{
initialize(x, matData=matData(x)[i,,drop=FALSE],
rowData=rowData(x)[i,,drop=FALSE])
})
## [1] "["
setMethod("[", c("AnnMat", "missing", "ANY"),
function(x, i, j, ..., drop=TRUE)
{
initialize(x, matData=matData(x)[,j,drop=FALSE],
colData=colData(x)[j,,drop=FALSE])
})
## [1] "["
setMethod("[", c("AnnMat", "missing", "missing"),
function(x, i, j, ..., drop=TRUE)
{
initialize(x, ...)
})
## [1] "["
Exercise: create some simple unit tests for these methods.
Seems 'good enough' for numeric or logical indexes, what about character?
What else have we agreed to in the matrix API?