Copy on change
It seems like R variables do not share any memory
x <- y <- 1:5
x
## [1] 1 2 3 4 5
y
## [1] 1 2 3 4 5
x[1] <- 2L
x
## [1] 2 2 3 4 5
y
## [1] 1 2 3 4 5
x <- 1:5
fun = function(z) { z[1] = 2L; z }
fun(x)
## [1] 2 2 3 4 5
x
## [1] 1 2 3 4 5
R uses a ‘named’ concept, rather than strict ‘reference counting’
In action
x = 1:5
.Internal(inspect(x))
## @480c138 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
@41de7d8
: memory address.13 INTSXP
: integer ‘S-expression’ (more later).[NAM(1)]
: One symbol (x
) references the location in memory.x = y = 1:5
.Internal(inspect(x))
## @48f2f18 13 INTSXP g0c3 [MARK,NAM(2)] (len=5, tl=0) 1,2,3,4,5
.Internal(inspect(y))
## @48f2f18 13 INTSXP g0c3 [MARK,NAM(2)] (len=5, tl=0) 1,2,3,4,5
x
, y
point to same location in memory[NAM(2)]
: (at least) two symbols reference the location‘Copy-on-change’: when updating a vector, make a copy if NAM() == 2
.
## copy on change
x = y = 1:5
.Internal(inspect(x))
## @3f94c98 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
x[1] = 2L
.Internal(inspect(x))
## @3cf5db0 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
.Internal(inspect(y))
## @3f94c98 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
## no copy necessary
x = 1:5
.Internal(inspect(x))
## @3bf3838 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
x[1] = 2L
.Internal(inspect(x))
## @3aef798 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
Garbage collection (approximate description)
SEXP
) allocated by the system to a pool of available S-expressionsx <- 1
moves the SEXP from allocated to ‘in use’ pool, level 0 (the 0 in g0c4
of .Internal(inspect(x))
)rm(x)
, a symbol no longer references the SEXP and it is a candidate for re-use.gc()
(manual, or more typically triggered automatically when R runs out of SEXP in the available pool)
Note: SEXP belong to ‘generations’ and classes
y = 1
fun = function() { x <- 1; 2 }
fun(); gc() # scans x, y; collects x, ages y
## [1] 2
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 503648 26.9 940480 50.3 940480 50.3
## Vcells 939506 7.2 1941543 14.9 1199408 9.2
fun(); gc() # scans & collects x, does not visit y
## [1] 2
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 503747 27.0 940480 50.3 940480 50.3
## Vcells 939776 7.2 1941543 14.9 1199408 9.2
Resources: R Internals manual e.g., the write barrier, Luke Tierney’s notes; memory.c.
Performance consequences
Some common use patterns involve a lot of copying!
c()
, rbind()
, cbind()
in a loopdata.frame()
, list()
, or S4 object.Use .Internal(inspect())
to follow memory allocated to x
in the following loop.
x <- integer()
n <- 5
for (i in 1:n)
x <- c(x, i)
Answer:
n = 5
x = integer(); .Internal(inspect(x))
for (i in 1:n) {
x <- c(x, i)
.Internal(inspect(x))
}
x = integer(); .Internal(inspect(x))
for (i in 1:n) {
x[i] = i
.Internal(inspect(x))
}
x
If an integer occupies about 8 bytes of memory, and each iteration in the loop copies the current allocation into the new allocation, how many bytes are copied in a loop with n
iterations?
Interpret the output of tracemem()
for memory allocations when updating a value in a data.frame()
. How much of the data.frame
is being copied?
df <- mtcars
tracemem(df) # The overall data.frame
## [1] "<0x45779e0>"
tracemem(df[[1]]) # The specific column
## [1] "<0x1766760>"
df[1, 1]
## [1] 21
df[1, 1] = 22.1
## tracemem[0x45779e0 -> 0x44fcd68]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x44fcd68 -> 0x447ce60]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x447ce60 -> 0x447d058]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x1766760 -> 0x264a210]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
df[2, 2] = 22.2
## tracemem[0x447d058 -> 0x4464e78]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x4464e78 -> 0x4464f20]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x4464f20 -> 0x4465070]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
Compare this to memory allocations when modifying a similarly sized matrix
m <- m1 <- matrix(0, nrow(mtcars), ncol(mtcars))
tracemem(m) # A matrix is a vector with dims attributes
## [1] "<0x477afd0>"
m[1, 1] = 22.1
## tracemem[0x477afd0 -> 0x45b3820]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
m[2, 1] = 22.2
And to a ‘vectorized’ update
df <- mtcars
tracemem(df); tracemem(df[[1]])
## [1] "<0x45779e0>"
## [1] "<0x1766760>"
df[1:2, 1] <- c(22.1, 22.2)
## tracemem[0x45779e0 -> 0x3f57e58]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x3f57e58 -> 0x3f57f00]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x3f57f00 -> 0x3f58050]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
## tracemem[0x1766760 -> 0x3d04bd0]: [<-.data.frame [<- eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
m <- m1 <- matrix(0, nrow(mtcars), ncol(mtcars))
tracemem(m) # A matrix is a vector with dims attributes
## [1] "<0x4615310>"
m[1:2, 1] <- c(22.1, 22.2)
## tracemem[0x4615310 -> 0x3cd6510]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> vweave_rmarkdown <Anonymous> <Anonymous> do.call <Anonymous>
Here’s a simple function, implemented poorly
f <- function(n) {
x <- integer()
for (i in 1:n)
x <- c(x, i)
x
}
Correct
identical()
– very strictall.equal()
– allows for numerical approximationUnit tests!
library(testthat)
test_that("f() is correct", {
expect_identical(f(5), 1:5)
})
Robust
Edge cases, e.g., length 0 iterations, NA
values, … Here we add some additional tests, and put them in a function for easy reuse.
tests <- function(f) {
test_that("f() is correct and robust", {
expect_identical(f(5), 1:5)
expect_identical(f(1), 1L)
expect_identical(f(0), integer(0))
})
}
Our code fails the unit tests, because 1:n
fails with n == 0
tryCatch(tests(f), error = conditionMessage)
## [1] "Test failed: 'f() is correct and robust'\n* f(0) not identical to integer(0).\nLengths differ: 2 vs 0\n"
Revised code, robust to common use case
f1 = function(n) {
x <- integer()
for (i in seq_len(n))
x = c(x, i)
x
}
tests(f1)
Fast
system.time()
, but can be considerable run-to-run variationlibrary(microbenchmark)
res <- microbenchmark(f1(100), f1(200), f1(400), f1(800), f1(1600))
plot(res)
Scales quadratically, because of copying. Solutions?
f2 <- function(n) {
sapply(seq_len(n), function(i) i)
}
tests(f2) # fails! n == 0
f2a <- function(n) {
if (n == 0) {
integer()
} else {
sapply(seq_len(n), function(i) i)
}
}
tests(f2a)
f3 <- function(n) {
vapply(seq_len(n), function(i) i, integer(1))
}
test_that("f() is correct", {
expect_identical(f3(5), 1:5)
expect_identical(f3(1), 1L)
expect_identical(f3(0), integer(0))
})
tests(f3)
f4 <- function(n) {
x <- integer(n)
for (i in seq_len(n))
x[[i]] = i
x
}
tests(f4)
f5 <- seq_len
tests(f5)
n <- 1000
microbenchmark(f1(n), f2a(n), f3(n), f4(n), f5(n))
Environments
Current environment
environment()
## <environment: R_GlobalEnv>
surprisingly central to R
env <- new.env()
env[["foo"]] <- 123
env[["foo"]]
## [1] 123
get("foo", env) # equivalent (not really!)
## [1] 123
ls(env)
## [1] "foo"
name
or symbol
in Rconstant-time lookup
‘reference’ semantics (almost all else in R is ‘copy-on-change’ semantics)
another <- env
env[["bar"]] <- 456
another[["bar"]]
## [1] 456
All environments have a ‘parent’ environment
parent.env(env)
## <environment: R_GlobalEnv>
emptyenv()
new.env()
)Environment can be chained together as a linked list
top <- new.env(parent = emptyenv())
mid <- new.env(parent = top)
bot <- new.env(parent = mid)
Symbol (key) look-up
top[["top"]] <- 1
mid[["mid"]] <- 2
bot[["bot"]] <- 3
Symbols not found in the reference enviroment are looked for in the parent environment
get("bot", bot)
## [1] 3
get("mid", bot)
## [1] 2
tryCatch({
get("mid", bot, inherits=FALSE) # restricts search to current env
}, error = conditionMessage)
## [1] "object 'mid' not found"
bot[["mid"]] # NULL; `[[` restricts to current env
## NULL
Assignment is to the current environment
bot[["mid"]] <- 5
bot[["mid"]]
## [1] 5
mid[["mid"]]
## [1] 2
Functions are intimately connected to environments
fun <- function() {
environment()
}
fun() # an environment defines the body of a function
## <environment: 0x593b1a8>
fun() # new function, so new environment
## <environment: 0x59a57f0>
What happens to the environment in a function?
If a function body has an environment, there must be a parent environment
fun <- function() {
env <- environment()
list(env, parent.env(env))
}
fun()
## [[1]]
## <environment: 0x4bb9690>
##
## [[2]]
## <environment: R_GlobalEnv>
fun <- function() {
fun1 <- function() {
env1 <- environment() # envrionment created by fun1
par1 <- parent.env(env1) # parent environment
list(env1 = env1, par1 = par1)
}
env <- environment() # environment created by fun
par <- parent.env(env)
list(env = env, par = par, fun1=fun1)
}
fun()
## $env
## <environment: 0x3d44050>
##
## $par
## <environment: R_GlobalEnv>
##
## $fun1
## function ()
## {
## env1 <- environment()
## par1 <- parent.env(env1)
## list(env1 = env1, par1 = par1)
## }
## <environment: 0x3d44050>
The parent of a function’s environment is the environment in which it was defined (different from the environment from which it was called)
Lookup, assignment (<-
) and assignment to the parent(s) environment (<<-
)
fun = function() {
x0 <- 0
x1 <- 0
fun1 <- function() {
y <- 1 # create local variable y
x1 <<- 2 # assign to first occurrence of 'x' in parent env(s)
z <<- 3 # no z in parent env(s), so created in top env
c(x0 = x0, x1 = x1, y = y, z = z)
}
result <- fun1()
list(x0 = x0, x1 = x1, result = result)
}
This is called lexical scope
Package namespaces, imports, and search()
The `search() path
search()
## [1] ".GlobalEnv" "package:microbenchmark"
## [3] "package:testthat" "package:BiocStyle"
## [5] "package:stats" "package:graphics"
## [7] "package:grDevices" "package:utils"
## [9] "package:datasets" "package:methods"
## [11] "Autoloads" "package:base"
library(BiocGenerics)
: loading and attaching a package
Package NAMESPACE is loaded into the current R session
An environment containing exported symbols from the package is attached to the R search path
‘Attaching’ a package means that the .GlobalEnv
’s parent points to the new package exported symbol environment. The new package exported symbol environment points to the package previously pointed to by .GlobalEnv
A package consists of a NAMESPACE
.GlobalEnv
.Symbol resolution in packages follows lexical scope
.GlobalEnv
, then …Evaluation, lazy and non-standard evaluation
Exercise: bank account
Congratulations, you own a bank! Here’s a function that creates an account, and provides access to it
account <- function() {
balance <- 0
available <- function() {}
deposit <- function(amt) {}
withdraw <- function(amt) {}
list(available = available, deposit = deposit, withdraw = withdraw)
}
Answer:
account <- function() {
balance <- 0
available <- function() {
balance
}
deposit <- function(amt) {
balance <<- balance + amt
balance
}
withdraw <- function(amt) {
if (amt > balance)
stop("insufficient funds")
deposit(-amt)
}
list(available = available, deposit = deposit, withdraw = withdraw)
}
Implement for the following test case, using the concept of lexical scope:
library(testthat)
test_that("I understand lexical scope", {
acct1 <- account()
expect_equal(acct1$deposit(10), 10)
expect_equal(acct1$deposit(12), 22)
expect_equal(acct1$available(), 22)
expect_equal(acct1$withdraw(20), 2)
expect_error(acct1$withdraw(20), "insufficient funds")
})
What happens when you get a second customer?
test_that("functions create local scope", {
acct1 <- account(); acct1$deposit(10)
acct2 <- account(); acct2$deposit(20)
expect_equal(acct1$available(), 10)
expect_equal(acct2$available(), 20)
})
Answer:
account <- function() {
balance <- 0
available <- function() {
balance
}
deposit <- function(amt) {
balance <<- balance + amt
funs
}
withdraw <- function(amt) {
if (amt > balance)
stop("insufficient funds")
deposit(-amt)
}
funs = list(
available = available, deposit = deposit, withdraw = withdraw
)
funs
}
Exercise: bank account 2
What changes to implementation would be require to allow
test_that("endomorphisms rock", {
expect_equal(account()$deposit(10)$withdraw(5)$available(), 5)
})
Exercise: package-local options
Implement ‘package-local’ options that the user can set via functions. Do this by using local()
to create and populate an environment that contains the package-local variables as well as accessor functions get()
and set()
to retrieve / assign variables, e.g., tolerance
. Here’s a start, and some unit tests
.myoptions <- local({
tolerance = 1e-4
get = function() {
tolerance
}
set = function(value) {
}
## ...
})
## setTolerance = ...
Answer:
.myoptions <- local({
tolerance = 1e-4
get = function() tolerance
list(
get = get,
set = function(value) {
ovalue <- get()
tolerance <<- value
ovalue
}
)
})
getTolerance <- .myoptions$get
setTolerance <- .myoptions$set
Here are some unit tests
test_that("local options can be set", {
expect_equal(getTolerance(), 1e-4)
## setTolerance() returns previous value, to ease resetting
expect_equal(otol <- setTolerance(1e-5), 1e-4)
expect_equal(setTolerance(otol), 1e-5)
})
f <- function(x) {
## ...
g()
}
g <- function(x) {
## ...
if (log(x) < 0) {
TRUE
} else {
FALSE
}
}
Invoke traceback()
after error.
f(-1)
traceback()
debugonce(g)
: enter the ‘browser’ when the function is invoked. See ?browser
for what can be done. Variant: debug(g)
… undebug(g)
.
debugonce(g)
f(-1)
## Browser[2]>
## ...
browser()
. Edit the source code (not very practical for complex or package code)
g <- function(x) {
## ...
browser()
if (log(x) < 0) {
TRUE
} else {
FALSE
}
}
options(error=recover)
. When done, options(error=NULL)
. Actually, any function accepting 0 arguments is possible for recover, e.g., options(error=traceback)
.
f(-1)
## Error: all(x >= 0) is not TRUE
##
## Enter a frame number, or 0 to exit
##
## 1: f(-1)
## 2: #2: stopifnot(all(x >= 0))
##
## Selection: 2
## Called from: f(-1)
## Browse[1]> c
##
## Enter a frame number, or 0 to exit
##
## 1: f(-1)
## 2: #2: stopifnot(all(x >= 0))
##
## Selection:
trace()
– especially useful for tracing execution (e.g., tracer = quote(print(argname))
), and for S4 methods (argument signature=
)
## like debug
trace(g, tracer=browser)
f(-1)
trace(g, quote({ print(x); print(log(x)) }))
for (i in runif(1000, -1, 100)) f(i)
trace(g, quote(if (x < 0) browser()))
for (i in runif(1000, -1, 100)) f(i)
library(GenomicRanges)
showMethods("findOverlaps")
selectMethod("findOverlaps", c("GRanges", "GRanges"))
trace(findOverlaps, browser, signature=c("GenomicRanges", "GenomicRanges"))
Found a bug, now what?
Recover more gracefully. (Somehow not a good solution – would rather avoid the problem than treat the symptom).
g <- function(x) {
## ...
tryCatch({
if (log(x) < 0)
TRUE
else
FALSE
}, error = function(e) {
NA
})
Program more cleverly / robustly – in our trivial example, the if()
is irrelevant, and introduces a cryptic error message!
g <- function(x) {
## ...
log(x) < 0 # NA, instead of error from if (NA) (!)
Best: assert pre-conditions (and program more cleverly).
f <- function(x) {
stopifnot(is.numeric(x), length(x) == 1, !is.na(x), x >= 0)
## ...
g(x)
}
Structure: A simple directory with specific files and structure
Creation
devtools::create("MyPackage")
Development
devtools::load_all()
to reduce the edit-install-run loop.devtools::use_testthat()
and devtools::test()
for testthat unit tests.devtools::document()
(roxygen2) for documentation.devtools::test()
(run unit tests); devtools::check()
devtools::install()
N.B.
Final arbiter:
R CMD build MyPackage # source for version 0.99.1 of package
R CMD check MyPackage_0.99.1.tar.gz
Mangement
S-expressions
numeric()
, logical()
, etc), lists, environments, functions, etc., are represented by C struct
sstruct
is called an ‘S-expression’, represented by the symbol SEXP
(which I pronounce ‘S-exp’)SEXP
is defined in Rinternals.hSEXP
is polymorhpic – the interpretation of the struct depends on type information contained in the SEXP
itself. Types are enumerated as INTSXP
, REALSXP
, …
STRSXP
: character()
, each element of a character vector is a CHARSXP
.VECSXP
: list()
; LISTSXP
is a ‘pair-list’ primarily used to represent function arguments.EXTPTRSXP
: an R reference to arbitrary data that is not represented in R, e.g., an C++ class.SEXP
contain sexpinfo, with named, garbage collect, marked, and other status, as well as actual data (e.g., the integers of an integer()
vector.SEXP
data structure. Several styles
LENGTH()
, Rf_length()
Rf_*
style.From R to C
.Internal()
, .External()
, .C()
, .Call()
, ….C()
: R does some work to coerce R types, e.g., integer()
to C types, int *
; useful for many light-weight purposes.
character()
and more complicated objects difficult to deal with..Call()
: R arguments are represented as SEXP
SEXP
in C++ idioms.The public API
When to worry?
.C()
.Call()
Create a package and ‘src’ directory. Add the LinkingTo:
tag to indicate that the package is going to link to (C++) header files provided by the BH (’Boost Headers’) package.
devtools::create("/tmp/uuid", list(LinkingTo = "BH"))
## Creating package 'uuid' in '/tmp'
## No DESCRIPTION found. Creating with values:
## Package: uuid
## Title: What the Package Does (one line, title case)
## Version: 0.0.0.9000
## Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
## Description: What the package does (one paragraph).
## Depends: R (>= 3.4.0)
## License: What license is it under?
## Encoding: UTF-8
## LazyData: true
## LinkingTo: BH
## * Creating `uuid.Rproj` from template.
## * Adding `.Rproj.user`, `.Rhistory`, `.RData` to ./.gitignore
dir.create("/tmp/uuid/src")
Write unit tests
devtools::use_testthat("/tmp/uuid")
## * Adding testthat to Suggests
## * Creating `tests/testthat`.
## * Creating `tests/testthat.R` from template.
cat('
context("uuid")
test_that("uuid returns different values", {
expect_true(is.character(uuid()))
expect_true(length(uuid()) == 1L)
uu <- unique(unlist(replicate(100, uuid())))
expect_identical(length(uu), 100L)
})',
file = "/tmp/uuid/tests/testthat/test_uuid.R"
)
Implement the ‘C++’ layer
cat('
#include <boost/uuid/uuid_generators.hpp>
#include <boost/uuid/uuid_io.hpp>
static boost::uuids::random_generator uuid_generator =
boost::uuids::random_generator();
std::string uuid_generate()
{
return boost::uuids::to_string(uuid_generator());
}
',
file = "/tmp/uuid/src/uuid.cpp"
)
Implement the interface from C++ to R’s C layer.
cat('
#include <Rinternals.h>
SEXP uuid()
{
return mkString(uuid_generate().c_str());
}
',
file = "/tmp/uuid/src/uuid.cpp",
append = TRUE
)
Implement the interface from C to R.
cat('
#include <R_ext/Rdynload.h>
extern "C" {
static const R_CallMethodDef callMethods[] = {
{".uuid", (DL_FUNC) &uuid, 0},
{NULL, NULL, 0}
};
void R_init_uuid(DllInfo *info)
{
R_registerRoutines(info, NULL, callMethods, NULL, NULL);
}
}',
file = "/tmp/uuid/src/uuid.cpp",
append = TRUE
)
Implement the end-user API in R.
cat("
#' @useDynLib uuid, .registration = TRUE
#' @export
uuid <- function()
.Call(.uuid)
",
file = "/tmp/uuid/R/uuid.R"
)
Document, test, install, use!
devtools::document("/tmp/uuid")
## Updating uuid documentation
## Loading uuid
## Re-compiling uuid
## '/home/mtmorgan/bin/R-3-4-branch/bin/R' --no-site-file --no-environ \
## --no-save --no-restore --quiet CMD INSTALL '/tmp/uuid' \
## --library='/tmp/RtmpPZvmxO/devtools_install_20c366d77c59' --no-R \
## --no-data --no-help --no-demo --no-inst --no-docs --no-exec \
## --no-multiarch --no-test-load
##
## Updating roxygen version in /tmp/uuid/DESCRIPTION
## Writing NAMESPACE
devtools::test("/tmp/uuid")
## Loading uuid
## Testing uuid
## uuid: ...
##
## DONE ========================================================================
devtools::install("/tmp/uuid")
## Installing uuid
## '/home/mtmorgan/bin/R-3-4-branch/bin/R' --no-site-file --no-environ \
## --no-save --no-restore --quiet CMD INSTALL '/tmp/uuid' \
## --library='/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.6' \
## --install-tests
##
## Reloading installed uuid
Add the following unit test and implement appropriate functionality
test_that("uuid returns multiple values", {
n <- 5
result <- uuid(5)
expect_true(is.character(result))
expect_identical(length(result), 5L)
expect_identical(length(unique(result)), 5L)
})
Do this in several stages, from ‘easy’ to harder
Implement the R infrastructure by adding an argument n
to the definition of the R-level uuid()
. Provide n
with a default value so that the original unit tests continue to work, i.e., that devtools::test("/tmp/uuid")
still passes the original tests.
Pass the argument n
to C. Do this by adding n
to the R-level .Call()
. Then modify the C-level uuid()
to accept an SEXP as argument, so the signature is SEXP uuid(SEXP n_sexp)
. Update the callMethods[]
array so that the uuid
function takes 1 argument. Test that the source code compiles using, e.g., devtools::install("/tmp/uuid")
or by running the unit tests again.
In C, modify uuid()
to check that the argument is in fact a scalar integer. Do this by using the macro IS_SCALAR()
and function Rf_asInteger()
and the constant R_NaInt
; see the header file ‘Rinternals.h’ in R.home("include")
. Signal an error with Rf_error()
. You’ll add code that looks like
bool test = IS_SCALAR(n_sexp, INTSXP) && (R_NaInt != Rf_asInteger(n_sexp));
if (!test)
Rf_error("'n' cannot be coerced to integer(1) and not NA");
int n = Rf_asInteger(n_sexp);
Modify uuid()
to allocate an SEXP to contain the result – a STRSXP of length n
. This allocated SEXP needs to be protected from garbage collection, because it is an R memory allocation not associated with a symbol. Protection is provided by the PROTECT()
macro, which is traditional wrapped around functions that return SEXP.
SEXP result = PROTECT(Rf_allocVector(n, STRSXP));
Fill the return vector using a C loop, constructing a CHARSXP for each element using mkChar()
and setting the i
th element of result
using the function SET_STRING_ELT()
. Although at the R level the first element of a vector is element number 1, in C the first element is 0.
for (int i = 0; i < n; ++i)
SET_STRING_ELT(result, i, mkChar(uuid_generate().c_str()));
Our function is done; use UNPROTECT()
to remove the protection on the allocated SEXP (the calling function, in this case R itself, is resposible for PROTECTing any SEXP it recieves) and return the result to R.
UNPROTECT(1);
return result;
(Optional) As a nicety for our users, coerce their R-level argument n
to an integer (so, e.g., they can provide 5
rather than 5L
as the argument). This can be done at the R level using as.integer()
, or at the C level with the following. Consider the merits of each approach.
n_sexp = PROTECT(Rf_coerceVector(n_sexp, INTSXP));
Create a new package using Rcpp::Rcpp.package.skeleton()
. Edit the DESCRIPTION file to include BH
in the LinkingTo:
field.
Add a file src/rcpp_uuid.cpp
with the boost-related code
#include <boost/uuid/uuid_generators.hpp>
#include <boost/uuid/uuid_io.hpp>
static boost::uuids::random_generator uuid_generator =
boost::uuids::random_generator();
std::string uuid_generate()
{
return boost::uuids::to_string(uuid_generator());
}
Include the Rcpp.h
header and use the Rcpp namespace
#include <Rcpp.h>
using namespace Rcpp;
Implement the function using Rcpp structures like CharacterVector
; no need to worry about coercing the argument, PROTECTion, etc. Decorate the function with the // [[Rcpp::export]]
attribute.
// [[Rcpp::export]]
CharacterVector rcpp_uuid(int n) {
CharacterVector uuids(n);
for (int i = 0; i < uuids.size(); ++i)
uuids[i] = uuid_generate();
return uuids;
}
In R, make Rcpp process the attributes; check out src/RcppExports.cpp
and R/RcppExports.R
> Rcpp::compileAttributes()
Install and use!
Rcppuuid$ R CMD INSTALL .
[...]
Rcppuuid$ R --vanilla -e "Rcppuuid::rcpp_uuid(3)"
> Rcppuuid::rcpp_uuid(3)
[1] "cb9cc7e8-bfcb-4262-b05b-2f1c0a4bc196"
[2] "7b592e1f-8341-4da0-af04-b387726af8ea"
[3] "7b2ade09-6ff2-4e76-b564-31b1a626e864"
Setup
gdb
on linux, lldb
on Mac; [gdb-to-lldb][gdb-lldb] command map.One approach:
$ cat ~/.R/Makevars
CFLAGS = -g -O0
CXXFLAGS = -g -O0
CXX11FLAGS = -g -O0
Alternative: configure R with appropriate CFLAGS
, etc.
Overall
Get and install a copy of our uuid package from GitHub
$ git clone https://github.com/Bioconductor/BiocAdvanced
$ cd BiocAdvanced
$ git checkout Zurich-2017
$ cd inst/uuid
$ R CMD INSTALL .
Start R and launch debugger
$ R -d gdb
(gdb)
Start the R process, load packages, etc., break into the debugger
(gdb) run
> library(uuid)
> uuid()
Error in uuid() : invalid type/length (symbol/16) in vector allocation
Break into the debugger, set a breakpoint at the point where R throws an error, and continue
> ^C
(gdb) break Rf_error
(gdb) continue
>
Trigger the error
> uuid()
Breakpoint 1, Rf_error (
format=0x643020 "invalid type/length (%s/%d) in vector allocation")
at /home/mtmorgan/src/R-3-4-branch/src/main/errors.c:821
821 {
(gdb)
Discover where
you are in the call stack. Mostly we’re in R’s source code where bugs are unlikely (though not impossible!) three up is our own code
(gdb) where
#0 Rf_error (
format=0x643020 "invalid type/length (%s/%d) in vector allocation")
at /home/mtmorgan/src/R-3-4-branch/src/main/errors.c:821
#1 0x0000000000525dbc in Rf_allocVector3 (type=1, length=16, allocator=0x0)
at /home/mtmorgan/src/R-3-4-branch/src/main/memory.c:2612
#2 0x000000000050e5c0 in Rf_allocVector (type=1, length=16)
at /home/mtmorgan/src/R-3-4-branch/src/include/Rinlinedfuns.h:196
#3 0x00007fffed786780 in uuid (n_sexp=0x21a16a8) at uuid.cpp:26
#4 0x000000000048b7e3 in R_doDotCall (ofun=0x7fffed7866ac <uuid(SEXPREC*)>,
nargs=1, cargs=0x7fffffffbdd0, call=0x19f7d48)
at /home/mtmorgan/src/R-3-4-branch/src/main/dotcode.c:570
Navigate and look around
Research reported in this course was supported by the National Human Genome Research Institute and the National Cancer Institute of the National Institutes of Health under award numbers U41HG004059 and U24CA180996.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 633974)