TileDBArray 1.15.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.1623874 -1.4136235 -1.9579720 . -1.31064272 -1.14361192
## [2,] 2.7212192 2.0856784 1.6581024 . 0.44846762 -1.48444291
## [3,] -0.2220294 -0.1685889 -1.0712842 . 0.09553039 1.49296271
## [4,] -2.1820205 -0.8278725 -0.2126109 . -0.77911356 0.55887770
## [5,] -0.6815283 0.1083721 1.9467992 . -0.34883666 -0.46190664
## ... . . . . . .
## [96,] -0.3631141 -0.9232330 1.2460829 . 1.30795784 -1.51921503
## [97,] -1.3044134 -1.5919438 0.3197891 . -3.11396498 0.70181089
## [98,] -0.7887772 -1.1546893 0.7032725 . 0.12371770 -1.21224686
## [99,] 1.8873992 2.6577647 -0.6068436 . -1.11699042 -0.23951110
## [100,] 0.4784338 0.7995840 0.8122253 . -1.33298266 0.03101772
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.1623874 -1.4136235 -1.9579720 . -1.31064272 -1.14361192
## [2,] 2.7212192 2.0856784 1.6581024 . 0.44846762 -1.48444291
## [3,] -0.2220294 -0.1685889 -1.0712842 . 0.09553039 1.49296271
## [4,] -2.1820205 -0.8278725 -0.2126109 . -0.77911356 0.55887770
## [5,] -0.6815283 0.1083721 1.9467992 . -0.34883666 -0.46190664
## ... . . . . . .
## [96,] -0.3631141 -0.9232330 1.2460829 . 1.30795784 -1.51921503
## [97,] -1.3044134 -1.5919438 0.3197891 . -3.11396498 0.70181089
## [98,] -0.7887772 -1.1546893 0.7032725 . 0.12371770 -1.21224686
## [99,] 1.8873992 2.6577647 -0.6068436 . -1.11699042 -0.23951110
## [100,] 0.4784338 0.7995840 0.8122253 . -1.33298266 0.03101772
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0.0 0.0 0.0 . 0 0
## [997,] 0.0 -0.9 0.0 . 0 0
## [998,] 0.0 0.0 0.0 . 0 0
## [999,] 0.0 0.0 0.0 . 0 0
## [1000,] 0.0 0.0 0.0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.1623874 -1.4136235 -1.9579720 . -1.31064272 -1.14361192
## GENE_2 2.7212192 2.0856784 1.6581024 . 0.44846762 -1.48444291
## GENE_3 -0.2220294 -0.1685889 -1.0712842 . 0.09553039 1.49296271
## GENE_4 -2.1820205 -0.8278725 -0.2126109 . -0.77911356 0.55887770
## GENE_5 -0.6815283 0.1083721 1.9467992 . -0.34883666 -0.46190664
## ... . . . . . .
## GENE_96 -0.3631141 -0.9232330 1.2460829 . 1.30795784 -1.51921503
## GENE_97 -1.3044134 -1.5919438 0.3197891 . -3.11396498 0.70181089
## GENE_98 -0.7887772 -1.1546893 0.7032725 . 0.12371770 -1.21224686
## GENE_99 1.8873992 2.6577647 -0.6068436 . -1.11699042 -0.23951110
## GENE_100 0.4784338 0.7995840 0.8122253 . -1.33298266 0.03101772
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.1623874 2.7212192 -0.2220294 -2.1820205 -0.6815283 2.0000997
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.1623874 -1.4136235 -1.9579720 0.5811687 -0.6487649
## GENE_2 2.7212192 2.0856784 1.6581024 1.2455121 -0.1995401
## GENE_3 -0.2220294 -0.1685889 -1.0712842 -2.2250848 1.2254899
## GENE_4 -2.1820205 -0.8278725 -0.2126109 1.4948633 2.1543206
## GENE_5 -0.6815283 0.1083721 1.9467992 0.5612598 -1.1660616
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.3247749 -2.8272471 -3.9159440 . -2.6212854 -2.2872238
## GENE_2 5.4424384 4.1713569 3.3162047 . 0.8969352 -2.9688858
## GENE_3 -0.4440588 -0.3371778 -2.1425684 . 0.1910608 2.9859254
## GENE_4 -4.3640410 -1.6557451 -0.4252218 . -1.5582271 1.1177554
## GENE_5 -1.3630566 0.2167441 3.8935983 . -0.6976733 -0.9238133
## ... . . . . . .
## GENE_96 -0.7262281 -1.8464661 2.4921659 . 2.61591567 -3.03843006
## GENE_97 -2.6088267 -3.1838876 0.6395782 . -6.22792997 1.40362179
## GENE_98 -1.5775545 -2.3093787 1.4065450 . 0.24743540 -2.42449373
## GENE_99 3.7747984 5.3155294 -1.2136873 . -2.23398083 -0.47902221
## GENE_100 0.9568676 1.5991680 1.6244506 . -2.66596533 0.06203544
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 3.76981791 8.54924513 -9.91247778 -0.04969908 1.22078964 8.65044487
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 15.04816284 5.17679695 -4.85041366 -2.04896236
out %*% runif(ncol(out))
## [,1]
## GENE_1 -0.554014691
## GENE_2 4.342067424
## GENE_3 -3.158641163
## GENE_4 -2.083023771
## GENE_5 3.697514658
## GENE_6 2.125593565
## GENE_7 1.675126434
## GENE_8 1.852222351
## GENE_9 2.081876041
## GENE_10 0.987143292
## GENE_11 2.274319330
## GENE_12 -0.536464219
## GENE_13 0.858263042
## GENE_14 -0.150148211
## GENE_15 1.649221299
## GENE_16 2.620735420
## GENE_17 -2.032055266
## GENE_18 -0.510438814
## GENE_19 1.477674712
## GENE_20 -1.831335913
## GENE_21 0.412637325
## GENE_22 -0.905293808
## GENE_23 -0.689746176
## GENE_24 -4.121990855
## GENE_25 0.081730635
## GENE_26 1.782521495
## GENE_27 -1.377939508
## GENE_28 1.018372777
## GENE_29 -1.372009610
## GENE_30 1.113047519
## GENE_31 -1.828338895
## GENE_32 -0.310100002
## GENE_33 -0.987238076
## GENE_34 -0.776367132
## GENE_35 -2.059027055
## GENE_36 1.671681345
## GENE_37 0.428454864
## GENE_38 -1.413087693
## GENE_39 -0.045607469
## GENE_40 -0.655354318
## GENE_41 -0.003544567
## GENE_42 -1.392328709
## GENE_43 -0.884644759
## GENE_44 1.225537510
## GENE_45 0.861795168
## GENE_46 -1.612402068
## GENE_47 -1.477379981
## GENE_48 0.514154179
## GENE_49 -1.497258407
## GENE_50 -0.598442051
## GENE_51 -1.367532341
## GENE_52 -2.083390786
## GENE_53 -1.193722228
## GENE_54 1.432166719
## GENE_55 -0.189419616
## GENE_56 1.893407636
## GENE_57 0.842357724
## GENE_58 0.135744007
## GENE_59 -0.888347626
## GENE_60 -0.478087069
## GENE_61 2.386331089
## GENE_62 3.072716866
## GENE_63 0.410415747
## GENE_64 -0.895083679
## GENE_65 -3.486657702
## GENE_66 1.016371012
## GENE_67 -0.535725799
## GENE_68 -0.412921428
## GENE_69 0.581755859
## GENE_70 -0.540726017
## GENE_71 -0.586610749
## GENE_72 0.604340402
## GENE_73 3.178051888
## GENE_74 1.199962313
## GENE_75 -0.838314131
## GENE_76 -1.819778664
## GENE_77 -1.444921363
## GENE_78 -2.418999239
## GENE_79 1.048262864
## GENE_80 0.606742477
## GENE_81 1.198027198
## GENE_82 0.845130839
## GENE_83 3.612549487
## GENE_84 0.456660730
## GENE_85 0.349822726
## GENE_86 2.629186156
## GENE_87 -2.269341232
## GENE_88 3.481890494
## GENE_89 -1.946517730
## GENE_90 -1.353845254
## GENE_91 3.226524312
## GENE_92 2.843725847
## GENE_93 0.668944512
## GENE_94 1.312171829
## GENE_95 0.298969337
## GENE_96 -0.658873617
## GENE_97 -0.672181421
## GENE_98 -0.304986410
## GENE_99 3.096888642
## GENE_100 0.198093012
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.923050389 -0.407474823 0.098902021 . -0.6693487 0.5163817
## [2,] -0.454127655 0.657957250 -2.161997626 . -0.8049095 1.4348427
## [3,] -0.082321474 0.952338661 -1.996110068 . -1.0163406 0.0757759
## [4,] -0.005028966 -0.297455914 0.745078553 . 0.4133133 1.2291054
## [5,] -2.466097406 -1.384651442 1.363127142 . -0.2570601 0.9932415
## ... . . . . . .
## [96,] 2.42756288 0.27619481 -0.15101499 . 1.9079761 -0.8912770
## [97,] -1.78263020 -2.25021654 1.60322846 . -1.0132337 -0.7355569
## [98,] 1.13658325 -0.23245659 -1.56636229 . -0.6793951 -1.1319342
## [99,] -0.03987627 -0.28884155 -0.05537571 . 0.4653629 -0.5013195
## [100,] -1.05020677 0.03421316 0.54919843 . 0.1743776 1.6661391
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.923050389 -0.407474823 0.098902021 . -0.6693487 0.5163817
## [2,] -0.454127655 0.657957250 -2.161997626 . -0.8049095 1.4348427
## [3,] -0.082321474 0.952338661 -1.996110068 . -1.0163406 0.0757759
## [4,] -0.005028966 -0.297455914 0.745078553 . 0.4133133 1.2291054
## [5,] -2.466097406 -1.384651442 1.363127142 . -0.2570601 0.9932415
## ... . . . . . .
## [96,] 2.42756288 0.27619481 -0.15101499 . 1.9079761 -0.8912770
## [97,] -1.78263020 -2.25021654 1.60322846 . -1.0132337 -0.7355569
## [98,] 1.13658325 -0.23245659 -1.56636229 . -0.6793951 -1.1319342
## [99,] -0.03987627 -0.28884155 -0.05537571 . 0.4653629 -0.5013195
## [100,] -1.05020677 0.03421316 0.54919843 . 0.1743776 1.6661391
sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.15.0 DelayedArray_0.31.0
## [4] SparseArray_1.5.0 S4Arrays_1.5.0 abind_1.4-5
## [7] IRanges_2.39.0 S4Vectors_0.43.0 MatrixGenerics_1.17.0
## [10] matrixStats_1.3.0 BiocGenerics_0.51.0 Matrix_1.7-0
## [13] BiocStyle_2.33.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.51.0 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1