Suppose we represent heads as ‘0’ and tails as ‘1’. Let p be the
probability that a coin that is currently a head is tossed and remains
a head, and similarly for tails. If p == 0.5, then the tosses would
seem to be uncorrelated. We implement this idea as a simple function
f1 <- function(n, p) {
    current <- rbinom(1, 1, 0.5)
    outcome <- NULL
    for (i in 1:n) {
        if (runif(1) > p)
            current <- (current + 1) %% 2
        outcome <- c(outcome, current)
    }
    outcome
}Is it correct?
f1(10, .5)##  [1] 1 1 0 0 0 0 0 0 1 0table(f1(1000, .5))## 
##   0   1 
## 496 504res <- f1(1000, .9)
mean(rle(res)$length) # expectation: 1 / (1 - p)## [1] 10.75269Is it understandable? Pretty much
Is it robust? We’ll come back to this.
Is it fast? Seems so initially, but it scales poorly!
system.time(f1(5000, .5))##    user  system elapsed 
##   0.141   0.019   0.161system.time(f1(10000, .5))##    user  system elapsed 
##   0.455   0.047   0.503system.time(f1(20000, .5))##    user  system elapsed 
##   1.796   0.261   2.062Doubling the problem size (n) causes a 4-fold increase in execution time.
What’s the problem? c(outcome, current) causes outcome to be
copied each time through the loop!
Pre-allocate and fill the outcome vector; outome updated in place,
without copying.
f2 <- function(n, p) {
    current <- rbinom(1, 1, 0.5)
    outcome <- numeric(n)
    for (i in 1:n) {
        if (runif(1) > p)
            current <- (current + 1) %% 2
        outcome[i] <- current
    }
    outcome
}Is it correct?
set.seed(123)
res1 <- f1(100, .9)
set.seed(123)
res2 <- f2(100, .9)
identical(res1, res2)## [1] TRUEIs it understandable? As understandable as f1().
Is it robust? In a minute…
Is it fast?
system.time(f2(5000, .5))##    user  system elapsed 
##   0.029   0.000   0.029system.time(f2(10000, .5))##    user  system elapsed 
##   0.055   0.001   0.057system.time(f2(20000, .5))##    user  system elapsed 
##   0.106   0.003   0.111f2() seems to scale linearly, so performs increasingly well compared
to f1().
Note that runif() is called n times, but the program could be
modified, and still be understandable, if it were called just once –
hoist runif(1) > p out of the loop.
f3 <- function(n, p) {
    current <- rbinom(1, 1, 0.5)
    outcome <- numeric(n)
    test <- runif(n) > p
    for (i in 1:n) {
        if (test[i])
            current <- (current + 1) %% 2
        outcome[i] <- current
    }
    outcome
}set.seed(123)
res1 <- f1(100, .9)
set.seed(123)
res3 <- f3(100, .9)
identical(res1, res3)## [1] TRUEUnderstandable? Yes.
Robust? None of them have been, and f3() really isn’t!
set.seed(123)
f1(0, .5)## [1] 1 1set.seed(123)
try(f3(0, .5))## Error in if (test[i]) current <- (current + 1)%%2 : 
##   missing value where TRUE/FALSE neededf3()n <- 100000
system.time(f2(n, .5))##    user  system elapsed 
##   0.551   0.033   0.587system.time(f3(n, .5))##    user  system elapsed 
##   0.043   0.001   0.044… with linear scaling.
system.time(f3(n * 10, .5))##    user  system elapsed 
##   0.424   0.007   0.434The problem is that 1:n is not robust, especially 1:0 generates
the sequence c(1, 0), whereas we were expecting a zero-length
sequence!
Solution: use seq_len(n)
lapply(3:1, seq_len)## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] 1f4 <- function(n, p) {
    current <- rbinom(1, 1, 0.5)
    outcome <- numeric(n)
    test <- runif(n) > p
    for (i in seq_len(n)) {
        if (test[i])
            current <- (current + 1) %% 2
        outcome[i] <- current
    }
    outcome
}Correct? Yes
Understandable? Yes
Robust? Yes
set.seed(123)
f4(3, .5)## [1] 1 1 0f4(2, .5)## [1] 1 0f4(1, .5)## [1] 0f4(0, .5)## numeric(0)Use cumsum() (cummulative sum) to produce sequential groups that
have the same head or tail status. Use %% on the cummulative sum to
translate those groups into heads (cummsum() %% 2 == 0 or tails
(cummsum() %% 2 == 1).
f5 <- function(n, p) {
    current <- rbinom(1, 1, 0.5)
    test <- runif(n) > p
    cumsum(current + test) %% 2
}set.seed(123); res1 <- f1(10, .8)
set.seed(123); res5 <- f5(10, .8)
identical(res1, res5)## [1] TRUEUnderstandable? Harder to understand…
Robust? Yes
f5(0, .8)## numeric(0)n <- 1000000
system.time(f4(n, .5))##    user  system elapsed 
##   0.416   0.002   0.420system.time(f5(n, .5))##    user  system elapsed 
##   0.087   0.002   0.088system.time(f5(n * 10, .5))##    user  system elapsed 
##   1.086   0.063   1.151About 4x faster than f4(), scales linearly, fast even for very large n.
Could be used to generate a large data set for developing methods about correlated samples, along the lines of
correlated_tosses_expts <- function(m, n, p) {
    ## m tosses (rows) in each of n experiments
    start0 <- rep(rbinom(m, 1, .5), each = m)
    group0 <- cumsum(runif(m * n) > p)
    toss <- (start0 + group0) %% 2
    matrix(toss, m)
}
system.time({
    expt <- correlated_tosses_expts(1000, 10000, .8)
})##    user  system elapsed 
##   0.977   0.058   1.035hist(colSums(expt))Probably we have reached the point of diminishing gains, we’ve already
spent far more time developing f5() than we’ll ever save by further
investigation… However,
current to cumsum() vector.rgeom() to generate change points.Other tools
sessionInfo()## R version 3.6.1 Patched (2019-07-16 r76845)
## Platform: x86_64-apple-darwin17.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Users/ma38727/bin/R-3-6-branch/lib/libRblas.dylib
## LAPACK: /Users/ma38727/bin/R-3-6-branch/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.13.2
## 
## loaded via a namespace (and not attached):
##  [1] BiocManager_1.30.4 compiler_3.6.1     magrittr_1.5      
##  [4] bookdown_0.12      htmltools_0.3.6    tools_3.6.1       
##  [7] yaml_2.2.0         Rcpp_1.0.1         codetools_0.2-16  
## [10] stringi_1.4.3      rmarkdown_1.14     knitr_1.23        
## [13] stringr_1.4.0      digest_0.6.20      xfun_0.8          
## [16] evaluate_0.14