Motivation and relevance
“Objects” provide a way to encapsulate complex inter-related data into managable structures accessible in a consistent way. This reduces book-keeping errors while enabling advanced analysis.
Objects are pervasive in R (e.g., data.frame()
and lm()
return
objects).
Using formally defined objects increases interoperability between packages, while making it relatively easy to apply established concepts to derived (new but similar) data types.
Bioconductor makes extensive use of formal objects. These are reviewed with an eye to understanding the design and principles of their implementation.
Example stats::lm()
Objects
Construct objects by fitting a linear model. Constructor is the
function lm()
.
x <- rnorm(1000)
df <- data.frame(x=x, y=x + rnorm(length(x), sd=.5))
fit <- lm(y ~ x, df)
fit
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Coefficients:
## (Intercept) x
## 0.0126 0.9932
Class usually based on a list()
with a class
attribute.
is.list(fit)
## [1] TRUE
names(fit)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
str(head(fit, 3))
## List of 3
## $ coefficients: Named num [1:2] 0.0126 0.9932
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
## $ residuals : Named num [1:1000] -0.7298 -0.0367 0.0978 0.337 -0.4586 ...
## ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
## $ effects : Named num [1:1000] -1.295 31.878 0.12 0.359 -0.437 ...
## ..- attr(*, "names")= chr [1:1000] "(Intercept)" "x" "" "" ...
Class inheritance via vector of named classes, class=c("derived",
"base")
.
No independent class defintion or guarantee of object structure
Generic and methods
Generic defined by a plain-old-function with body UseMethod
, e.g.,
print
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x102685788>
## <environment: namespace:base>
Method defined by function composed by the generic + “.” + class,
e.g., the anova
method for objects of class lm
head(print.lm)
##
## 1 function (x, digits = max(3L, getOption("digits") - 3L), ...)
## 2 {
## 3 cat("\\nCall:\\n", paste(deparse(x$call), sep = "\\n", collapse = "\\n"),
## 4 "\\n\\n", sep = "")
## 5 if (length(coef(x))) {
## 6 cat("Coefficients:\\n")
Discovery
lm
: methods(class="lm")
methods("anova")
getAnywhere()
or triple-colon resolution
stats:::anova.loess
Exercise: a minimal class. Implement a minimal class MyClass
and a print
method that provides a tidy summary of the
object. Hints: structure()
with argument class
, list()
,
class()
, cat
to print.
Exercise: a simple data base. Implement a minimal class containing people and their occupations. Implement functions and methods to create and print the class, and to subset the class.
Exercise a minimal class. Use setClass
to implement a simple
class that represents people and their occupation. Use setMethod
to
write a show
method to display the class in a reasonable way,
including when the class has a very large number of elements.
.Empl <- setClass("Empl",
representation(person="character", job="character"))
setMethod(show, "Empl", function(object) {
len <- length(object@person)
cat("class: ", class(object), " (n =", len, ")\n", sep="")
cat("person:", head(object@person), if (len > 6) "...", "\n")
cat("job:", head(object@job), if (len > 6) "...", "\n")
})
## [1] "show"
## attr(,"package")
## [1] "methods"
.Empl()
## class: Empl (n =0)
## person:
## job:
.Empl(person=c("Xavier", "Melanie", "Octavio"),
job=c("Leader", "Innovator", "Doer"))
## class: Empl (n =3)
## person: Xavier Melanie Octavio
## job: Leader Innovator Doer
.Empl(person=LETTERS, job=letters)
## class: Empl (n =26)
## person: A B C D E F ...
## job: a b c d e f ...
Example: ShortRead::FastqStreamer()
library(ShortRead)
example(FastqStreamer)
Why a reference class?
Some design decisions:
Simple interface: construct with FastqStreamer()
, invoke with
yield()
.
Class methods are not meant for end-user access