TileDBArray 1.15.4
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7271183 -1.6164748 1.0041222 . 0.8452394 -0.9820552
## [2,] -0.4408054 0.8100485 -0.2841858 . -2.4125496 0.6592228
## [3,] -0.9407056 -0.9498341 0.7106158 . 1.6135753 1.3237905
## [4,] 1.8011876 1.1519282 -1.7042402 . -0.6495744 -2.3580916
## [5,] -1.1054726 -1.4732836 -0.9648889 . 1.2557612 0.9632785
## ... . . . . . .
## [96,] 2.21041916 -1.22992413 -0.17635437 . -0.147261205 -0.105094612
## [97,] 0.06234023 -0.47041667 -1.86291303 . 1.382416737 -1.250331960
## [98,] -0.01958748 -0.67047765 -0.41136027 . 1.810164899 0.130821442
## [99,] -0.71913372 -2.42109389 -1.43492143 . -1.296466234 0.297642276
## [100,] -0.53832900 -1.22262031 -0.25701175 . -0.670256375 -0.001650841
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7271183 -1.6164748 1.0041222 . 0.8452394 -0.9820552
## [2,] -0.4408054 0.8100485 -0.2841858 . -2.4125496 0.6592228
## [3,] -0.9407056 -0.9498341 0.7106158 . 1.6135753 1.3237905
## [4,] 1.8011876 1.1519282 -1.7042402 . -0.6495744 -2.3580916
## [5,] -1.1054726 -1.4732836 -0.9648889 . 1.2557612 0.9632785
## ... . . . . . .
## [96,] 2.21041916 -1.22992413 -0.17635437 . -0.147261205 -0.105094612
## [97,] 0.06234023 -0.47041667 -1.86291303 . 1.382416737 -1.250331960
## [98,] -0.01958748 -0.67047765 -0.41136027 . 1.810164899 0.130821442
## [99,] -0.71913372 -2.42109389 -1.43492143 . -1.296466234 0.297642276
## [100,] -0.53832900 -1.22262031 -0.25701175 . -0.670256375 -0.001650841
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.0 1.6
## [997,] 0 0 0 . 0.0 0.0
## [998,] 0 0 0 . 0.0 0.0
## [999,] 0 0 0 . 0.0 0.0
## [1000,] 0 0 0 . 0.0 0.0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE TRUE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.7271183 -1.6164748 1.0041222 . 0.8452394 -0.9820552
## GENE_2 -0.4408054 0.8100485 -0.2841858 . -2.4125496 0.6592228
## GENE_3 -0.9407056 -0.9498341 0.7106158 . 1.6135753 1.3237905
## GENE_4 1.8011876 1.1519282 -1.7042402 . -0.6495744 -2.3580916
## GENE_5 -1.1054726 -1.4732836 -0.9648889 . 1.2557612 0.9632785
## ... . . . . . .
## GENE_96 2.21041916 -1.22992413 -0.17635437 . -0.147261205 -0.105094612
## GENE_97 0.06234023 -0.47041667 -1.86291303 . 1.382416737 -1.250331960
## GENE_98 -0.01958748 -0.67047765 -0.41136027 . 1.810164899 0.130821442
## GENE_99 -0.71913372 -2.42109389 -1.43492143 . -1.296466234 0.297642276
## GENE_100 -0.53832900 -1.22262031 -0.25701175 . -0.670256375 -0.001650841
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.7271183 -0.4408054 -0.9407056 1.8011876 -1.1054726 -0.1154446
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.7271183 -1.6164748 1.0041222 -0.4355904 0.4713761
## GENE_2 -0.4408054 0.8100485 -0.2841858 0.8700101 -0.6645667
## GENE_3 -0.9407056 -0.9498341 0.7106158 -0.6061874 0.4869128
## GENE_4 1.8011876 1.1519282 -1.7042402 -0.9466682 0.6669956
## GENE_5 -1.1054726 -1.4732836 -0.9648889 1.3927064 -1.5758783
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.4542365 -3.2329496 2.0082444 . 1.690479 -1.964110
## GENE_2 -0.8816107 1.6200970 -0.5683716 . -4.825099 1.318446
## GENE_3 -1.8814111 -1.8996683 1.4212316 . 3.227151 2.647581
## GENE_4 3.6023753 2.3038564 -3.4084803 . -1.299149 -4.716183
## GENE_5 -2.2109452 -2.9465672 -1.9297778 . 2.511522 1.926557
## ... . . . . . .
## GENE_96 4.42083833 -2.45984827 -0.35270874 . -0.294522410 -0.210189223
## GENE_97 0.12468046 -0.94083333 -3.72582606 . 2.764833474 -2.500663921
## GENE_98 -0.03917497 -1.34095530 -0.82272054 . 3.620329797 0.261642884
## GENE_99 -1.43826743 -4.84218779 -2.86984286 . -2.592932469 0.595284552
## GENE_100 -1.07665799 -2.44524063 -0.51402350 . -1.340512750 -0.003301683
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 1.5534049 -6.1970087 -11.7269805 -9.1396662 -2.2144115 17.6695637
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -3.1995177 -0.8336283 8.9039360 -0.6477332
out %*% runif(ncol(out))
## [,1]
## GENE_1 1.15406722
## GENE_2 -0.99784627
## GENE_3 0.75107928
## GENE_4 0.34455061
## GENE_5 0.87562715
## GENE_6 -1.01677269
## GENE_7 -1.41636391
## GENE_8 -0.05398246
## GENE_9 -2.10549076
## GENE_10 -0.68831071
## GENE_11 3.15539671
## GENE_12 -2.10288390
## GENE_13 2.45693666
## GENE_14 -0.31766480
## GENE_15 -0.62277271
## GENE_16 -1.38246292
## GENE_17 0.70548298
## GENE_18 -0.81749106
## GENE_19 0.50144363
## GENE_20 -0.20496148
## GENE_21 -1.72617743
## GENE_22 -0.19860174
## GENE_23 -0.80994550
## GENE_24 1.33447724
## GENE_25 -0.17824063
## GENE_26 0.83256893
## GENE_27 -2.50125968
## GENE_28 -2.24969561
## GENE_29 3.48340073
## GENE_30 1.14610773
## GENE_31 0.39196762
## GENE_32 -1.95577531
## GENE_33 -2.32904060
## GENE_34 0.24598840
## GENE_35 1.78745805
## GENE_36 -0.08539968
## GENE_37 -0.97967376
## GENE_38 1.65431993
## GENE_39 0.18069860
## GENE_40 2.54502138
## GENE_41 2.82877270
## GENE_42 -2.87813835
## GENE_43 -1.04716442
## GENE_44 1.23951073
## GENE_45 0.82926184
## GENE_46 -0.37201978
## GENE_47 -1.45158132
## GENE_48 0.05856130
## GENE_49 -0.70595647
## GENE_50 -1.34678221
## GENE_51 0.47346436
## GENE_52 0.34122571
## GENE_53 0.50450648
## GENE_54 0.99202007
## GENE_55 2.03608987
## GENE_56 2.97229349
## GENE_57 -1.05447742
## GENE_58 -1.11519701
## GENE_59 1.82826122
## GENE_60 -1.05278704
## GENE_61 5.74607784
## GENE_62 -1.54737015
## GENE_63 0.79433572
## GENE_64 -1.10992847
## GENE_65 0.89132811
## GENE_66 1.41509542
## GENE_67 -1.54175038
## GENE_68 -1.08933698
## GENE_69 -0.23211275
## GENE_70 0.84280464
## GENE_71 0.84955032
## GENE_72 -2.19774229
## GENE_73 1.06281738
## GENE_74 -0.70123618
## GENE_75 1.02422988
## GENE_76 2.19058851
## GENE_77 1.04383875
## GENE_78 -0.04323432
## GENE_79 1.03516371
## GENE_80 -0.72150214
## GENE_81 1.36984554
## GENE_82 -1.92335804
## GENE_83 -2.66999289
## GENE_84 -0.89597370
## GENE_85 -1.70426711
## GENE_86 -0.59689406
## GENE_87 -0.76103884
## GENE_88 0.39215578
## GENE_89 0.29233575
## GENE_90 2.53804307
## GENE_91 0.40038620
## GENE_92 -0.53377676
## GENE_93 1.15674576
## GENE_94 -0.75127908
## GENE_95 -1.05958229
## GENE_96 1.89760083
## GENE_97 0.84447473
## GENE_98 -0.11463914
## GENE_99 -3.65403300
## GENE_100 -1.78761546
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.19728752 1.75889865 0.56430556 . -0.31483141 1.73711661
## [2,] 0.52392979 -1.40077660 0.67191925 . -1.69742138 -0.17573475
## [3,] -0.27054326 0.65013156 -0.81632862 . -0.78121400 0.09114674
## [4,] -2.20355068 -0.52249518 -1.42900889 . -0.34435232 0.48619982
## [5,] -3.23053349 -1.26163933 -0.06379842 . 2.41282975 -0.48659758
## ... . . . . . .
## [96,] 1.1519187 -0.3961445 0.1516432 . 0.7436532 0.4290699
## [97,] -0.1149952 1.6995658 -1.0360925 . -0.9962945 -0.8269280
## [98,] 1.2975572 -0.2289509 -0.3054609 . -0.1443665 1.4693949
## [99,] -0.6443878 -2.3074435 1.4848689 . 0.5189309 -1.5467511
## [100,] -0.4435969 0.2011427 -0.1339109 . -1.5722015 -0.4032379
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.19728752 1.75889865 0.56430556 . -0.31483141 1.73711661
## [2,] 0.52392979 -1.40077660 0.67191925 . -1.69742138 -0.17573475
## [3,] -0.27054326 0.65013156 -0.81632862 . -0.78121400 0.09114674
## [4,] -2.20355068 -0.52249518 -1.42900889 . -0.34435232 0.48619982
## [5,] -3.23053349 -1.26163933 -0.06379842 . 2.41282975 -0.48659758
## ... . . . . . .
## [96,] 1.1519187 -0.3961445 0.1516432 . 0.7436532 0.4290699
## [97,] -0.1149952 1.6995658 -1.0360925 . -0.9962945 -0.8269280
## [98,] 1.2975572 -0.2289509 -0.3054609 . -0.1443665 1.4693949
## [99,] -0.6443878 -2.3074435 1.4848689 . 0.5189309 -1.5467511
## [100,] -0.4435969 0.2011427 -0.1339109 . -1.5722015 -0.4032379
sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.15.4 DelayedArray_0.31.14
## [4] SparseArray_1.5.45 S4Arrays_1.5.11 IRanges_2.39.2
## [7] abind_1.4-8 S4Vectors_0.43.2 MatrixGenerics_1.17.0
## [10] matrixStats_1.4.1 BiocGenerics_0.51.3 Matrix_1.7-1
## [13] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.30.2
## [16] knitr_1.48 bookdown_0.41 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.48
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.51.2 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.16.2
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.4.1 htmltools_0.5.8.1