Contents

1 Motivation

The chihaya package saves DelayedArray objects for efficient, portable and stable reproduction of delayed operations in a new R session or other programming frameworks.

Check out the specification for more details.

2 Quick start

Make a DelayedArray object with some operations:

library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5) 
x <- log2(x + 1)
x
## <5 x 10> DelayedMatrix object of type "double":
##            [,1]       [,2]       [,3] ...      [,9]     [,10]
## [1,] 0.06635354 1.00361773 0.12479046   . 0.9588107 0.8895361
## [2,] 1.22448192 0.64536950 0.32726298   . 1.0825564 0.3992236
## [3,] 3.67748693 4.13306605 6.10048866   . 1.9703431 5.1670070
## [4,] 4.09612283 3.85363957 0.06637671   . 1.7857269 2.8049865
## [5,] 1.02527039 0.72091090 0.64850529   . 0.1934733 0.8436823
showtree(x)
## 5x10 double: DelayedMatrix object
## └─ 5x10 double: Stack of 2 unary iso op(s)
##    └─ 5x10 double: Unary iso op with args
##       └─ 5x10 double: Subset
##          └─ 100x10 double: [seed] matrix object

Save it into a HDF5 file with saveDelayed():

library(chihaya)
tmp <- tempfile(fileext=".h5")
saveDelayed(x, tmp)
rhdf5::h5ls(tmp)
##                            group    name       otype  dclass      dim
## 0                              / delayed   H5I_GROUP                 
## 1                       /delayed    base H5I_DATASET   FLOAT    ( 0 )
## 2                       /delayed  method H5I_DATASET  STRING    ( 0 )
## 3                       /delayed    seed   H5I_GROUP                 
## 4                  /delayed/seed  method H5I_DATASET  STRING    ( 0 )
## 5                  /delayed/seed    seed   H5I_GROUP                 
## 6             /delayed/seed/seed   along H5I_DATASET INTEGER    ( 0 )
## 7             /delayed/seed/seed  method H5I_DATASET  STRING    ( 0 )
## 8             /delayed/seed/seed    seed   H5I_GROUP                 
## 9        /delayed/seed/seed/seed   index   H5I_GROUP                 
## 10 /delayed/seed/seed/seed/index       0 H5I_DATASET INTEGER        5
## 11       /delayed/seed/seed/seed    seed   H5I_GROUP                 
## 12  /delayed/seed/seed/seed/seed    data H5I_DATASET   FLOAT 100 x 10
## 13  /delayed/seed/seed/seed/seed  native H5I_DATASET INTEGER    ( 0 )
## 14            /delayed/seed/seed    side H5I_DATASET  STRING    ( 0 )
## 15            /delayed/seed/seed   value H5I_DATASET   FLOAT        5
## 16                 /delayed/seed    side H5I_DATASET  STRING    ( 0 )
## 17                 /delayed/seed   value H5I_DATASET   FLOAT    ( 0 )

And then load it back in later:

y <- loadDelayed(tmp)
y
## <5 x 10> DelayedMatrix object of type "double":
##            [,1]       [,2]       [,3] ...      [,9]     [,10]
## [1,] 0.06635354 1.00361773 0.12479046   . 0.9588107 0.8895361
## [2,] 1.22448192 0.64536950 0.32726298   . 1.0825564 0.3992236
## [3,] 3.67748693 4.13306605 6.10048866   . 1.9703431 5.1670070
## [4,] 4.09612283 3.85363957 0.06637671   . 1.7857269 2.8049865
## [5,] 1.02527039 0.72091090 0.64850529   . 0.1934733 0.8436823

Of course, this is not a particularly interesting case as we end up saving the original array inside our HDF5 file anyway. The real fun begins when you have some more interesting seeds.

3 More interesting seeds

We can use the delayed nature of the operations to avoid breaking sparsity. For example:

library(Matrix)
x <- rsparsematrix(1000, 1000, density=0.01)
x <- DelayedArray(x) + runif(1000)

tmp <- tempfile(fileext=".h5")
saveDelayed(x, tmp)
rhdf5::h5ls(tmp)
##            group     name       otype  dclass   dim
## 0              /  delayed   H5I_GROUP              
## 1       /delayed    along H5I_DATASET INTEGER ( 0 )
## 2       /delayed   method H5I_DATASET  STRING ( 0 )
## 3       /delayed     seed   H5I_GROUP              
## 4  /delayed/seed     data H5I_DATASET   FLOAT 10000
## 5  /delayed/seed dimnames   H5I_GROUP              
## 6  /delayed/seed  indices H5I_DATASET INTEGER 10000
## 7  /delayed/seed   indptr H5I_DATASET INTEGER  1001
## 8  /delayed/seed    shape H5I_DATASET INTEGER     2
## 9       /delayed     side H5I_DATASET  STRING ( 0 )
## 10      /delayed    value H5I_DATASET   FLOAT  1000
file.info(tmp)[["size"]]
## [1] 101908
# Compared to a dense array.
tmp2 <- tempfile(fileext=".h5")
out <- HDF5Array::writeHDF5Array(x, tmp2, "data")
file.info(tmp2)[["size"]]
## [1] 280905
# Loading it back in.
y <- loadDelayed(tmp)
showtree(y)
## 1000x1000 double: DelayedMatrix object
## └─ 1000x1000 double: Unary iso op with args
##    └─ 1000x1000 double, sparse: [seed] dgCMatrix object

We can also store references to external files, thus avoiding data duplication:

library(HDF5Array)
test <- HDF5Array(tmp2, "data")
stuff <- log2(test + 1)
stuff
## <1000 x 1000> DelayedMatrix object of type "double":
##               [,1]       [,2]       [,3] ...     [,999]    [,1000]
##    [1,]  0.2411104  0.2411104  0.2411104   .  0.2411104  0.2411104
##    [2,]  0.7992442  0.7992442  0.7992442   .  0.7992442  0.7992442
##    [3,]  0.4219574  0.4219574  0.4219574   .  0.4219574  0.4219574
##    [4,]  0.4022458  0.4022458  0.4022458   .  0.4022458  0.4022458
##    [5,]  0.5976521  0.5976521  0.5976521   .  0.5976521  0.5976521
##     ...          .          .          .   .          .          .
##  [996,] 0.18583283 0.18583283 0.18583283   . 0.18583283 0.18583283
##  [997,] 0.03333831 0.03333831 0.03333831   . 0.03333831 0.03333831
##  [998,] 0.90286426 0.90286426 0.90286426   . 0.90286426 0.90286426
##  [999,] 0.33744242 0.33744242 0.33744242   . 0.33744242 0.33744242
## [1000,] 0.91038186 0.91038186 0.91038186   . 0.91038186 0.91038186
tmp <- tempfile(fileext=".h5")
saveDelayed(stuff, tmp)
rhdf5::h5ls(tmp)
##                 group       name       otype  dclass   dim
## 0                   /    delayed   H5I_GROUP              
## 1            /delayed       base H5I_DATASET   FLOAT ( 0 )
## 2            /delayed     method H5I_DATASET  STRING ( 0 )
## 3            /delayed       seed   H5I_GROUP              
## 4       /delayed/seed     method H5I_DATASET  STRING ( 0 )
## 5       /delayed/seed       seed   H5I_GROUP              
## 6  /delayed/seed/seed dimensions H5I_DATASET INTEGER     2
## 7  /delayed/seed/seed       file H5I_DATASET  STRING ( 0 )
## 8  /delayed/seed/seed       name H5I_DATASET  STRING ( 0 )
## 9  /delayed/seed/seed     sparse H5I_DATASET INTEGER ( 0 )
## 10 /delayed/seed/seed       type H5I_DATASET  STRING ( 0 )
## 11      /delayed/seed       side H5I_DATASET  STRING ( 0 )
## 12      /delayed/seed      value H5I_DATASET   FLOAT ( 0 )
file.info(tmp)[["size"]] # size of the delayed operations + pointer to the actual file
## [1] 49642
y <- loadDelayed(tmp)
y
## <1000 x 1000> DelayedMatrix object of type "double":
##               [,1]       [,2]       [,3] ...     [,999]    [,1000]
##    [1,]  0.2411104  0.2411104  0.2411104   .  0.2411104  0.2411104
##    [2,]  0.7992442  0.7992442  0.7992442   .  0.7992442  0.7992442
##    [3,]  0.4219574  0.4219574  0.4219574   .  0.4219574  0.4219574
##    [4,]  0.4022458  0.4022458  0.4022458   .  0.4022458  0.4022458
##    [5,]  0.5976521  0.5976521  0.5976521   .  0.5976521  0.5976521
##     ...          .          .          .   .          .          .
##  [996,] 0.18583283 0.18583283 0.18583283   . 0.18583283 0.18583283
##  [997,] 0.03333831 0.03333831 0.03333831   . 0.03333831 0.03333831
##  [998,] 0.90286426 0.90286426 0.90286426   . 0.90286426 0.90286426
##  [999,] 0.33744242 0.33744242 0.33744242   . 0.33744242 0.33744242
## [1000,] 0.91038186 0.91038186 0.91038186   . 0.91038186 0.91038186

Session information

sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] HDF5Array_1.36.0      h5mread_1.0.0         rhdf5_2.52.0         
##  [4] chihaya_1.8.0         DelayedArray_0.34.0   SparseArray_1.8.0    
##  [7] S4Arrays_1.8.0        abind_1.4-8           IRanges_2.42.0       
## [10] S4Vectors_0.46.0      MatrixGenerics_1.20.0 matrixStats_1.5.0    
## [13] BiocGenerics_0.54.0   generics_0.1.3        Matrix_1.7-3         
## [16] BiocStyle_2.36.0     
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_2.0.0      compiler_4.5.0      BiocManager_1.30.25
##  [4] crayon_1.5.3        Rcpp_1.0.14         rhdf5filters_1.20.0
##  [7] jquerylib_0.1.4     yaml_2.3.10         fastmap_1.2.0      
## [10] lattice_0.22-7      R6_2.6.1            XVector_0.48.0     
## [13] knitr_1.50          bookdown_0.43       bslib_0.9.0        
## [16] rlang_1.1.6         cachem_1.1.0        xfun_0.52          
## [19] sass_0.4.10         cli_3.6.4           Rhdf5lib_1.30.0    
## [22] digest_0.6.37       grid_4.5.0          lifecycle_1.0.4    
## [25] evaluate_1.0.3      rmarkdown_2.29      tools_4.5.0        
## [28] htmltools_0.5.8.1