TileDBArray 1.6.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.108554446 2.076550068 -0.508858463 . 1.06353153 0.47808085
## [2,] 0.180025065 1.543709077 -0.040441809 . 0.87571728 -0.38488472
## [3,] -3.061381381 2.710613120 0.008302286 . -0.31069578 -1.45415127
## [4,] -0.877971883 1.288709910 1.699114981 . 1.58282350 -0.04387971
## [5,] 0.631475306 -0.025359824 0.031903090 . -1.02012212 -0.10400266
## ... . . . . . .
## [96,] 2.23864986 0.29998375 -0.33228238 . 0.6947149 -0.2457617
## [97,] -0.62193696 -1.65450363 0.08514534 . 0.3998314 -0.8720570
## [98,] -1.29443660 0.43950896 -0.86116447 . -0.8829689 -1.1414756
## [99,] 0.03587434 2.01832686 -0.04798745 . -1.4401491 0.9085430
## [100,] 0.84655098 1.18948637 -0.30941242 . -3.0378641 -0.8540375
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.108554446 2.076550068 -0.508858463 . 1.06353153 0.47808085
## [2,] 0.180025065 1.543709077 -0.040441809 . 0.87571728 -0.38488472
## [3,] -3.061381381 2.710613120 0.008302286 . -0.31069578 -1.45415127
## [4,] -0.877971883 1.288709910 1.699114981 . 1.58282350 -0.04387971
## [5,] 0.631475306 -0.025359824 0.031903090 . -1.02012212 -0.10400266
## ... . . . . . .
## [96,] 2.23864986 0.29998375 -0.33228238 . 0.6947149 -0.2457617
## [97,] -0.62193696 -1.65450363 0.08514534 . 0.3998314 -0.8720570
## [98,] -1.29443660 0.43950896 -0.86116447 . -0.8829689 -1.1414756
## [99,] 0.03587434 2.01832686 -0.04798745 . -1.4401491 0.9085430
## [100,] 0.84655098 1.18948637 -0.30941242 . -3.0378641 -0.8540375
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.108554446 2.076550068 -0.508858463 . 1.06353153 0.47808085
## GENE_2 0.180025065 1.543709077 -0.040441809 . 0.87571728 -0.38488472
## GENE_3 -3.061381381 2.710613120 0.008302286 . -0.31069578 -1.45415127
## GENE_4 -0.877971883 1.288709910 1.699114981 . 1.58282350 -0.04387971
## GENE_5 0.631475306 -0.025359824 0.031903090 . -1.02012212 -0.10400266
## ... . . . . . .
## GENE_96 2.23864986 0.29998375 -0.33228238 . 0.6947149 -0.2457617
## GENE_97 -0.62193696 -1.65450363 0.08514534 . 0.3998314 -0.8720570
## GENE_98 -1.29443660 0.43950896 -0.86116447 . -0.8829689 -1.1414756
## GENE_99 0.03587434 2.01832686 -0.04798745 . -1.4401491 0.9085430
## GENE_100 0.84655098 1.18948637 -0.30941242 . -3.0378641 -0.8540375
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.1085544 0.1800251 -3.0613814 -0.8779719 0.6314753 -1.0269813
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.108554446 2.076550068 -0.508858463 -2.580484556 -3.132536404
## GENE_2 0.180025065 1.543709077 -0.040441809 -0.804807101 0.920265780
## GENE_3 -3.061381381 2.710613120 0.008302286 0.118200534 1.155515236
## GENE_4 -0.877971883 1.288709910 1.699114981 0.154886020 1.108612759
## GENE_5 0.631475306 -0.025359824 0.031903090 -0.919595110 0.675470116
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.21710889 4.15310014 -1.01771693 . 2.12706306 0.95616170
## GENE_2 0.36005013 3.08741815 -0.08088362 . 1.75143456 -0.76976944
## GENE_3 -6.12276276 5.42122624 0.01660457 . -0.62139155 -2.90830254
## GENE_4 -1.75594377 2.57741982 3.39822996 . 3.16564700 -0.08775942
## GENE_5 1.26295061 -0.05071965 0.06380618 . -2.04024425 -0.20800533
## ... . . . . . .
## GENE_96 4.47729973 0.59996750 -0.66456476 . 1.3894297 -0.4915235
## GENE_97 -1.24387391 -3.30900727 0.17029068 . 0.7996628 -1.7441139
## GENE_98 -2.58887320 0.87901791 -1.72232895 . -1.7659378 -2.2829512
## GENE_99 0.07174868 4.03665371 -0.09597490 . -2.8802982 1.8170860
## GENE_100 1.69310196 2.37897275 -0.61882484 . -6.0757281 -1.7080750
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 0.5948649 4.8561078 -3.0219660 -14.4868673 6.7752320 13.7779343
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -16.8386874 -0.2816726 2.5485602 3.4471625
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 -3.6032894
## GENE_2 0.5722510
## GENE_3 -3.4310307
## GENE_4 -0.3311992
## GENE_5 -1.0433024
## ... .
## GENE_96 2.2484581
## GENE_97 -2.3729420
## GENE_98 -2.2890656
## GENE_99 -1.1035654
## GENE_100 0.5541941
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.22744160 2.09286523 -0.81365777 . 0.53430292 0.57013586
## [2,] -1.03898765 0.38763959 0.05907349 . 1.91425681 -0.06250155
## [3,] 0.08741989 -0.04355127 0.49890518 . 0.35933085 1.04365364
## [4,] -1.13885254 -0.25239443 -0.23568715 . 0.64777389 -0.01483898
## [5,] -1.21827688 -1.99311545 0.19766829 . 1.20911867 -0.78243657
## ... . . . . . .
## [96,] 0.99642911 -0.66941397 0.44431261 . -0.3321988 0.1404574
## [97,] 0.65108325 0.27514485 0.33451977 . 0.9369587 0.8898373
## [98,] -0.27575662 0.06339306 1.51787760 . -0.2784484 -0.2888549
## [99,] -1.03196113 0.43434634 0.43658611 . 0.5851806 -1.4323041
## [100,] 1.30628037 0.10134075 0.16032729 . 0.8320322 -0.8044427
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.22744160 2.09286523 -0.81365777 . 0.53430292 0.57013586
## [2,] -1.03898765 0.38763959 0.05907349 . 1.91425681 -0.06250155
## [3,] 0.08741989 -0.04355127 0.49890518 . 0.35933085 1.04365364
## [4,] -1.13885254 -0.25239443 -0.23568715 . 0.64777389 -0.01483898
## [5,] -1.21827688 -1.99311545 0.19766829 . 1.20911867 -0.78243657
## ... . . . . . .
## [96,] 0.99642911 -0.66941397 0.44431261 . -0.3321988 0.1404574
## [97,] 0.65108325 0.27514485 0.33451977 . 0.9369587 0.8898373
## [98,] -0.27575662 0.06339306 1.51787760 . -0.2784484 -0.2888549
## [99,] -1.03196113 0.43434634 0.43658611 . 0.5851806 -1.4323041
## [100,] 1.30628037 0.10134075 0.16032729 . 0.8320322 -0.8044427
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TileDBArray_1.6.0 DelayedArray_0.22.0 IRanges_2.30.0
## [4] S4Vectors_0.34.0 MatrixGenerics_1.8.0 matrixStats_0.62.0
## [7] BiocGenerics_0.42.0 Matrix_1.4-1 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 bslib_0.3.1 compiler_4.2.0
## [4] BiocManager_1.30.17 jquerylib_0.1.4 tools_4.2.0
## [7] digest_0.6.29 bit_4.0.4 jsonlite_1.8.0
## [10] evaluate_0.15 lattice_0.20-45 nanotime_0.3.6
## [13] rlang_1.0.2 cli_3.3.0 RcppCCTZ_0.2.10
## [16] yaml_2.3.5 xfun_0.30 fastmap_1.1.0
## [19] stringr_1.4.0 knitr_1.38 sass_0.4.1
## [22] bit64_4.0.5 grid_4.2.0 data.table_1.14.2
## [25] R6_2.5.1 rmarkdown_2.14 bookdown_0.26
## [28] tiledb_0.12.0 magrittr_2.0.3 htmltools_0.5.2
## [31] stringi_1.7.6 zoo_1.8-10