HDF5Array 1.35.11
The aim of this document is to measure the performance of the HDF5Array package for normalization and PCA (Principal Component Analysis) of on-disk single cell data, two computationally intensive operations at the core of single cell analysis.
The benchmarks presented in the document were specifically designed to observe the impact of two critical parameters on performance:
Hopefully these benchmarks will also facilitate comparing performance of single cell analysis workflows based on HDF5Array with workflows based on other tools like Seurat or Scanpy.
Let’s install and load HDF5Array as well as the other packages used in this vignette:
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
pkgs <- c("HDF5Array", "ExperimentHub", "DelayedMatrixStats", "RSpectra")
BiocManager::install(pkgs)
Load the packages:
library(HDF5Array)
library(ExperimentHub)
library(DelayedMatrixStats)
library(RSpectra)
The datasets that we will use for our benchmarks are subsets of the 1.3 Million Brain Cell Dataset from 10x Genomics.
The 1.3 Million Brain Cell Dataset is a 27,998 x 1,306,127 matrix of counts with one gene per row and one cell per column. It’s available via the ExperimentHub package in two forms, one that uses a sparse representation and one that uses a dense representation:
hub <- ExperimentHub()
hub["EH1039"]$description # sparse representation
## [1] "Single-cell RNA-seq data for 1.3 million brain cells from E18 mice. 'HDF5-based 10X Genomics' format originally provided by TENx Genomics"
hub["EH1040"]$description # dense representation
## [1] "Single-cell RNA-seq data for 1.3 million brain cells from E18 mice. Full rectangular, block-compressed format, 1GB block size."
The two datasets are big HDF5 files stored on a remote location. Let’s download them to the local ExperimentHub cache if they are not there yet:
## Note that this will be quick if the HDF5 files are already in the
## local ExperimentHub cache. Otherwise, it will take a while!
full_sparse_h5 <- hub[["EH1039"]]
full_dense_h5 <- hub[["EH1040"]]
We use the TENxMatrix()
and HDF5Array()
constructors to bring the
sparse and dense datasets in R as DelayedArray derivatives. Note that
this does not load the matrix data in memory.
Bring the sparse dataset in R:
## Use 'h5ls(full_sparse_h5)' to find out the group.
full_sparse <- TENxMatrix(full_sparse_h5, group="mm10")
Note that full_sparse
is a 27,998 x 1,306,127 TENxMatrix object:
class(full_sparse)
## [1] "TENxMatrix"
## attr(,"package")
## [1] "HDF5Array"
dim(full_sparse)
## [1] 27998 1306127
See ?TENxMatrix
in the HDF5Array package for more
information about TENxMatrix objects.
Bring the dense dataset in R:
## Use 'h5ls(full_dense_h5)' to find out the name of the dataset.
full_dense <- HDF5Array(full_dense_h5, name="counts")
Note that full_dense
is a 27,998 x 1,306,127 HDF5Matrix object that
contains the same data as full_sparse
:
class(full_dense)
## [1] "HDF5Matrix"
## attr(,"package")
## [1] "HDF5Array"
dim(full_dense)
## [1] 27998 1306127
See ?HDF5Matrix
in the HDF5Array package for more
information about HDF5Matrix objects.
Finally note that the dense HDF5 file does not contain the dimnames
of the matrix so we manually add them to full_sparse
:
dimnames(full_dense) <- dimnames(full_sparse)
For our benchmarks below, we create subsets of the 1.3 Million Brain
Cell Dataset of increasing sizes: subsets with 12,500 cells, 25,000 cells,
50,000 cells, 100,000 cells, and 200,000 cells. Note that subsetting a
TENxMatrix or HDF5Matrix object with [
is a delayed operation so has
virtually no cost:
sparse1 <- full_sparse[ , 1:12500]
dense1 <- full_dense[ , 1:12500]
sparse2 <- full_sparse[ , 1:25000]
dense2 <- full_dense[ , 1:25000]
sparse3 <- full_sparse[ , 1:50000]
dense3 <- full_dense[ , 1:50000]
sparse4 <- full_sparse[ , 1:100000]
dense4 <- full_dense[ , 1:100000]
sparse5 <- full_sparse[ , 1:200000]
dense5 <- full_dense[ , 1:200000]
We’ll use the following code for normalization:
## Also selects the most variable genes (1000 by default).
simple_normalize <- function(mat, num_var_genes=1000)
{
stopifnot(length(dim(mat)) == 2, !is.null(rownames(mat)))
mat <- mat[rowSums(mat) > 0, ]
mat <- t(t(mat) * 10000 / colSums(mat))
row_vars <- rowVars(mat)
rv_order <- order(row_vars, decreasing=TRUE)
variable_idx <- head(rv_order, n=num_var_genes)
mat <- log1p(mat[variable_idx, ])
mat / rowSds(mat)
}
and the following code for PCA:
simple_PCA <- function(mat, k=25)
{
stopifnot(length(dim(mat)) == 2)
row_means <- rowMeans(mat)
Ax <- function(x, args)
(as.numeric(mat %*% x) - row_means * sum(x))
Atx <- function(x, args)
(as.numeric(x %*% mat) - as.vector(row_means %*% x))
RSpectra::svds(Ax, Atrans=Atx, k=k, dim=dim(mat))
}
Note that the implementations of simple_normalize()
and simple_PCA()
are expected to work on any matrix-like object regardless of its exact
type/representation e.g. it can be an ordinary matrix, a SparseMatrix
object from the SparseArray package, a dgCMatrix object
from the Matrix package, a DelayedMatrix derivative
(TENxMatrix, HDF5Matrix, TileDBMatrix), etc…
However, when the input is a DelayedMatrix object or derivative, it’s important to be aware that:
Summarization methods like sum()
, colSums()
, rowVars()
, or rowSds()
,
and matrix multiplication (%*%
), are block-processed operations.
The block size is 100 Mb by default. Increasing or decreasing the block size will typically increase or decrease the memory usage of block-processed operations. It will also impact performance, but sometimes in unexpected or counter-intuitive ways.
The block size can be controlled with DelayedArray::getAutoBlockSize()
and DelayedArray::setAutoBlockSize()
.
For our benchmarks below, we’ll use the following block sizes:
While manually running our benchmarks below on a Linux or macOS system, we will also monitor memory usage at the command line in a terminal with:
(while true; do ps u -p <PID>; sleep 1; done) >ps.log 2>&1 &
where <PID>
is the process id of our R session. This will allow us
to measure the maximum amount of memory used by the calls
to simple_normalize()
or simple_PCA()
.
In this section we run simple_normalize()
on the two smaller test
datasets only (27,998 x 12,500 and 27,998 x 25,000, sparse and dense),
and we report timings and memory usage.
See the Comprehensive timings obtained on various systems section at
the end of this document for simple_normalize()
and simple_pca()
timings
obtained on various systems on all our test datasets and using three different
block sizes: 40 Mb, 100 Mb, and 250 Mb.
Set block size to 250 Mb:
DelayedArray::setAutoBlockSize(2.5e8)
## automatic block size set to 2.5e+08 bytes (was 1e+08)
dim(sparse1)
## [1] 27998 12500
system.time(sparse1n <- simple_normalize(sparse1))
## user system elapsed
## 98.571 8.092 106.637
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9919641 529.8 17466960 932.9 12903527 689.2
## Vcells 25723651 196.3 90452252 690.1 119073743 908.5
dim(sparse1n)
## [1] 1000 12500
dim(sparse2)
## [1] 27998 25000
system.time(sparse2n <- simple_normalize(sparse2))
## user system elapsed
## 170.148 10.935 182.908
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9919769 529.8 17466960 932.9 12903527 689.2
## Vcells 25837321 197.2 147549432 1125.8 184436537 1407.2
dim(sparse2n)
## [1] 1000 25000
With this block size (250 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 2.6 Gb at all time.
Set block size to 40 Mb:
DelayedArray::setAutoBlockSize(4e7)
## automatic block size set to 4e+07 bytes (was 2.5e+08)
dim(dense1)
## [1] 27998 12500
system.time(dense1n <- simple_normalize(dense1))
## user system elapsed
## 92.132 5.936 100.818
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9920168 529.8 17466960 932.9 12903527 689.2
## Vcells 25911659 197.7 76827731 586.2 184436537 1407.2
dim(dense1n)
## [1] 1000 12500
dim(dense2)
## [1] 27998 25000
system.time(dense2n <- simple_normalize(dense2))
## user system elapsed
## 198.184 14.445 215.174
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9920333 529.9 17466960 932.9 12903527 689.2
## Vcells 26000328 198.4 83905016 640.2 184436537 1407.2
dim(dense2n)
## [1] 1000 25000
With this block size (40 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 1.8 Gb at all time.
Note that the normalized datasets obtained in the previous section
are DelayedMatrix objects that carrry delayed operations. These can
be displayed with showtree()
:
class(sparse1n)
## [1] "DelayedMatrix"
## attr(,"package")
## [1] "DelayedArray"
showtree(sparse1n)
## 1000x12500 double, sparse: DelayedMatrix object
## └─ 1000x12500 double, sparse: Unary iso op with args
## └─ 1000x12500 double, sparse: Stack of 1 unary iso op(s)
## └─ 1000x12500 double, sparse: Aperm (perm=c(2,1))
## └─ 12500x1000 double, sparse: Unary iso op with args
## └─ 12500x1000 double, sparse: Stack of 1 unary iso op(s)
## └─ 12500x1000 integer, sparse: Aperm (perm=c(2,1))
## └─ 1000x12500 integer, sparse: Subset
## └─ 27998x1306127 integer, sparse: [seed] TENxMatrixSeed object
class(dense1n)
## [1] "DelayedMatrix"
## attr(,"package")
## [1] "DelayedArray"
showtree(dense1n)
## 1000x12500 double: DelayedMatrix object
## └─ 1000x12500 double: Set dimnames
## └─ 1000x12500 double: Unary iso op with args
## └─ 1000x12500 double: Stack of 1 unary iso op(s)
## └─ 1000x12500 double: Aperm (perm=c(2,1))
## └─ 12500x1000 double: Unary iso op with args
## └─ 12500x1000 double: Stack of 1 unary iso op(s)
## └─ 12500x1000 integer: Aperm (perm=c(2,1))
## └─ 1000x12500 integer: Subset
## └─ 27998x1306127 integer: [seed] HDF5ArraySeed object
Before we proceed with PCA, we’re going to write the normalized datasets to new HDF5 files. This has a significant cost, but, on the other hand, it has the benefit of triggering on-disk realization of the object. This means that all the delayed operations carried by the object will get realized on-the-fly before the matrix data actually lands on the disk, making the new object (TENxMatrix or HDF5Matrix) more efficient for PCA or whatever block-processed operations will come next.
Set block size to 100 Mb:
DelayedArray::setAutoBlockSize(1e8)
## automatic block size set to 1e+08 bytes (was 4e+07)
dim(sparse1n)
## [1] 1000 12500
sparse1n_path <- tempfile()
system.time(
sparse1n <- writeTENxMatrix(sparse1n, sparse1n_path, group="matrix", level=0)
)
## user system elapsed
## 13.093 0.808 17.212
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9933503 530.6 17466960 932.9 12903527 689.2
## Vcells 25977046 198.2 110853456 845.8 184436537 1407.2
The new sparse1n
object is a pristine TENxMatrix object:
class(sparse1n)
## [1] "TENxMatrix"
## attr(,"package")
## [1] "HDF5Array"
showtree(sparse1n) # "pristine" object (i.e. no more delayed operations)
## 1000x12500 double, sparse: TENxMatrix object
## └─ 1000x12500 double, sparse: [seed] TENxMatrixSeed object
dim(sparse2n)
## [1] 1000 25000
sparse2n_path <- tempfile()
system.time(
sparse2n <- writeTENxMatrix(sparse2n, sparse2n_path, group="matrix", level=0)
)
## user system elapsed
## 20.795 0.923 21.744
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9933496 530.6 17466960 932.9 12903527 689.2
## Vcells 25927085 197.9 98026769 747.9 184436537 1407.2
The new sparse2n
object is a pristine TENxMatrix object:
class(sparse2n)
## [1] "TENxMatrix"
## attr(,"package")
## [1] "HDF5Array"
showtree(sparse2n) # "pristine" object (i.e. no more delayed operations)
## 1000x25000 double, sparse: TENxMatrix object
## └─ 1000x25000 double, sparse: [seed] TENxMatrixSeed object
With this block size (100 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 2.5 Gb at all time.
Set block size to 250 Mb:
DelayedArray::setAutoBlockSize(2.5e8)
## automatic block size set to 2.5e+08 bytes (was 1e+08)
dim(dense1n)
## [1] 1000 12500
dense1n_path <- tempfile()
system.time(
dense1n <- writeHDF5Array(dense1n, dense1n_path, name="dense1n", level=0)
)
## user system elapsed
## 4.240 0.365 4.611
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9936037 530.7 17466960 932.9 12903527 689.2
## Vcells 25860474 197.3 78421416 598.4 184436537 1407.2
The new dense1n
object is a pristine HDF5Matrix object:
class(dense1n)
## [1] "HDF5Matrix"
## attr(,"package")
## [1] "HDF5Array"
showtree(dense1n) # "pristine" object (i.e. no more delayed operations)
## 1000x12500 double: HDF5Matrix object
## └─ 1000x12500 double: [seed] HDF5ArraySeed object
dim(dense2n)
## [1] 1000 25000
dense2n_path <- tempfile()
system.time(
dense2n <- writeHDF5Array(dense2n, dense2n_path, name="dense2n", level=0)
)
## user system elapsed
## 11.258 1.284 12.544
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9936007 530.7 17466960 932.9 12903527 689.2
## Vcells 25772004 196.7 100701990 768.3 184436537 1407.2
The new dense2n
object is a pristine HDF5Matrix object:
class(dense2n)
## [1] "HDF5Matrix"
## attr(,"package")
## [1] "HDF5Array"
showtree(dense2n) # "pristine" object (i.e. no more delayed operations)
## 1000x25000 double: HDF5Matrix object
## └─ 1000x25000 double: [seed] HDF5ArraySeed object
With this block size (250 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 1.6 Gb at all time.
In this section we run simple_pca()
on the two normalized datasets
obtained in the previous section (1000 x 12,500 and 1000 x 25,000,
sparse and dense), and we report timings and memory usage.
See the Comprehensive timings obtained on various systems section at
the end of this document for simple_normalize()
and simple_pca()
timings
obtained on various systems on all our test datasets and using three different
block sizes: 40 Mb, 100 Mb, and 250 Mb.
Set block size to 40 Mb:
DelayedArray::setAutoBlockSize(4e7)
## automatic block size set to 4e+07 bytes (was 2.5e+08)
sparse1n <- TENxMatrix(sparse1n_path)
dim(sparse1n)
## [1] 1000 12500
system.time(pca1s <- simple_PCA(sparse1n))
## user system elapsed
## 86.736 3.678 82.916
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9942444 531.0 17466960 932.9 12903527 689.2
## Vcells 26122561 199.3 80561592 614.7 184436537 1407.2
sparse2n <- TENxMatrix(sparse2n_path)
dim(sparse2n)
## [1] 1000 25000
system.time(pca2s <- simple_PCA(sparse2n))
## user system elapsed
## 179.510 7.053 172.868
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9942521 531.0 17466960 932.9 12903527 689.2
## Vcells 26773063 204.3 80563102 614.7 184436537 1407.2
With this block size (40 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 1.8 Gb at all time.
Set block size to 100 Mb (the default):
DelayedArray::setAutoBlockSize()
## automatic block size set to 1e+08 bytes (was 4e+07)
dense1n <- HDF5Array(dense1n_path, name="dense1n")
dim(dense1n)
## [1] 1000 12500
system.time(pca1d <- simple_PCA(dense1n))
## user system elapsed
## 77.304 15.596 92.906
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9943562 531.1 17466960 932.9 12903527 689.2
## Vcells 27111594 206.9 96063531 733.0 184436537 1407.2
dense2n <- HDF5Array(dense2n_path, name="dense2n")
dim(dense2n)
## [1] 1000 25000
system.time(pca2d <- simple_PCA(dense2n))
## user system elapsed
## 131.230 28.598 159.839
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 9943055 531.1 17466960 932.9 12903527 689.2
## Vcells 27761515 211.9 86654446 661.2 184436537 1407.2
With this block size (100 Mb), memory usage (as reported by Unix
command ps u -p <PID>
, see Monitoring memory usage above in
this document) remained <= 2.0 Gb at all time.
stopifnot(all.equal(pca1s, pca1d))
stopifnot(all.equal(pca2s, pca2d))
Here we report timings obtained on various systems. For each system, the results are summarized in a table that shows the normalization & realization & PCA timings obtained on all our test datasets and using three different block sizes: 40 Mb, 100 Mb, and 250 Mb. For each operation, the best time across the three different block sizes is displayed in bold.
# Output of 'sudo hdparm -Tt <device>': Timing cached reads: 35188 MB in 2.00 seconds = 17620.75 MB/sec Timing buffered disk reads: 7842 MB in 3.00 seconds = 2613.57 MB/sec |
Test dataset |
F o r m a t |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
Normalized Test dataset |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nrow x ncol (# genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
nrow x ncol (# sel. genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
|
1. NORMALIZATION & sel. of 1000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 70 | 1.8Gb | 53 | 2.0Gb | 44 | 2.3Gb | 1000 x 12500 | 5 | 1.8Gb | 6 | 2.1Gb | 6 | 2.2Gb | 36 | 1.7Gb | 47 | 1.8Gb | 45 | 2.0Gb |
dense | 53 | 1.8Gb | 55 | 1.9Gb | 49 | 2.4Gb | 2 | 1.6Gb | 2 | 1.5Gb | 2 | 1.4Gb | 41 | 1.8Gb | 42 | 2.0Gb | 38 | 1.6Gb | ||
27998 x 25000 | sparse | 144 | 1.9Gb | 110 | 2.4Gb | 85 | 2.6Gb | 1000 x 25000 | 11 | 1.9Gb | 10 | 2.5Gb | 11 | 2.8Gb | 75 | 1.8Gb | 72 | 2.0Gb | 133 | 2.3Gb |
dense | 103 | 1.8Gb | 116 | 2.0Gb | 103 | 2.5Gb | 5 | 1.7Gb | 5 | 1.6Gb | 5 | 1.6Gb | 85 | 1.8Gb | 82 | 2.0Gb | 128 | 1.6Gb | ||
27998 x 50000 | sparse | 277 | 1.9Gb | 215 | 2.6Gb | 179 | 3.5Gb | 1000 x 50000 | 22 | 1.8Gb | 20 | 2.3Gb | 19 | 3.3Gb | 143 | 1.8Gb | 188 | 2.1Gb | 169 | 2.6Gb |
dense | 211 | 1.9Gb | 230 | 2.1Gb | 228 | 2.5Gb | 13 | 1.7Gb | 10 | 1.7Gb | 10 | 1.9Gb | 208 | 1.8Gb | 188 | 1.9Gb | 275 | 1.7Gb | ||
27998 x 100000 | sparse | 543 | 2.0Gb | 436 | 2.9Gb | 360 | 3.7Gb | 1000 x 100000 | 41 | 2.0Gb | 38 | 2.6Gb | 39 | 3.7Gb | 289 | 1.9Gb | 313 | 2.3Gb | 323 | 2.8Gb |
dense | 419 | 1.9Gb | 467 | 2.1Gb | 456 | 2.6Gb | 25 | 1.6Gb | 20 | 1.7Gb | 18 | 2.1Gb | 417 | 1.9Gb | 311 | 2.1Gb | 309 | 2.5Gb | ||
27998 x 200000 | sparse | 1130 | 2.2Gb | 874 | 2.8Gb | 744 | 4.4Gb | 1000 x 200000 | 73 | 2.2Gb | 78 | 2.7Gb | 71 | 4.2Gb | 526 | 2.0Gb | 619 | 2.4Gb | 599 | 3.4Gb |
dense | 865 | 2.0Gb | 966 | 2.1Gb | 920 | 3.2Gb | 62 | 2.0Gb | 51 | 1.8Gb | 36 | 2.9Gb | 1089 | 2.3Gb | 892 | 2.1Gb | 943 | 2.3Gb | ||
1. NORMALIZATION & sel. of 2000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 74 | 1.8Gb | 61 | 2.2Gb | 45 | 2.5Gb | 2000 x 12500 | 7 | 1.8Gb | 8 | 2.1Gb | 7 | 2.5Gb | 55 | 1.7Gb | 59 | 1.8Gb | 61 | 2.2Gb |
dense | 54 | 1.8Gb | 59 | 2.0Gb | 50 | 2.4Gb | 3 | 1.6Gb | 3 | 1.6Gb | 3 | 1.8Gb | 82 | 1.9Gb | 87 | 1.9Gb | 129 | 1.6Gb | ||
27998 x 25000 | sparse | 151 | 1.8Gb | 118 | 2.3Gb | 100 | 2.7Gb | 2000 x 25000 | 14 | 1.9Gb | 15 | 2.3Gb | 14 | 2.9Gb | 110 | 1.7Gb | 167 | 2.0Gb | 159 | 2.3Gb |
dense | 117 | 1.8Gb | 121 | 2.0Gb | 113 | 2.5Gb | 9 | 1.6Gb | 8 | 1.6Gb | 7 | 1.9Gb | 219 | 1.8Gb | 167 | 1.9Gb | 151 | 2.5Gb | ||
27998 x 50000 | sparse | 288 | 1.9Gb | 229 | 2.4Gb | 193 | 3.8Gb | 2000 x 50000 | 28 | 1.9Gb | 30 | 2.4Gb | 26 | 3.4Gb | 195 | 1.8Gb | 229 | 2.0Gb | 223 | 2.9Gb |
dense | 219 | 1.9Gb | 240 | 2.1Gb | 237 | 2.5Gb | 19 | 1.7Gb | 15 | 1.7Gb | 14 | 2.3Gb | 342 | 1.8Gb | 275 | 2.0Gb | 381 | 1.8Gb | ||
27998 x 100000 | sparse | 562 | 2.0Gb | 450 | 2.7Gb | 375 | 3.8Gb | 2000 x 100000 | 52 | 2.0Gb | 55 | 2.4Gb | 52 | 3.5Gb | 351 | 1.8Gb | 453 | 2.1Gb | 408 | 3.0Gb |
dense | 440 | 1.9Gb | 484 | 2.1Gb | 475 | 2.6Gb | 47 | 1.7Gb | 38 | 1.7Gb | 27 | 2.3Gb | 809 | 1.9Gb | 688 | 2.1Gb | 854 | 1.7Gb | ||
27998 x 200000 | sparse | 1170 | 2.1Gb | 922 | 2.8Gb | 798 | 3.6Gb | 2000 x 200000 | 107 | 2.0Gb | 114 | 2.5Gb | 106 | 3.7Gb | 672 | 1.9Gb | 1160 | 2.2Gb | 932 | 2.9Gb |
dense | 893 | 2.1Gb | 989 | 2.0Gb | 971 | 3.2Gb | 99 | 2.0Gb | 77 | 1.9Gb | 52 | 3.0Gb | 1713 | 2.3Gb | 1366 | 2.2Gb | 2047 | 2.2Gb | ||
Table 1: Timings for DELL XPS 15 laptop For each operation, the best time across the three different block sizes is displayed in bold. In addition, if it’s also the best time across the sparse and dense formats, then we box it (only for Normalization and PCA). Memory usage > 4Gb is displayed in light red. |
# Output of 'sudo hdparm -Tt <device>': Timing cached reads: 20592 MB in 1.99 seconds = 10361.41 MB/sec Timing buffered disk reads: 1440 MB in 3.00 seconds = 479.66 MB/sec |
Test dataset |
F o r m a t |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
Normalized Test dataset |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nrow x ncol (# genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
nrow x ncol (# sel. genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
|
1. NORMALIZATION & sel. of 1000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 109 | NA | 85 | NA | 74 | NA | 1000 x 12500 | 9 | NA | 11 | NA | 9 | NA | 67 | NA | 89 | NA | 90 | NA |
dense | 85 | NA | 90 | NA | 80 | NA | 3 | NA | 3 | NA | 3 | NA | 63 | NA | 71 | NA | 66 | NA | ||
27998 x 25000 | sparse | 235 | NA | 192 | NA | 144 | NA | 1000 x 25000 | 18 | NA | 18 | NA | 19 | NA | 150 | NA | 172 | NA | 142 | NA |
dense | 178 | NA | 184 | NA | 170 | NA | 7 | NA | 8 | NA | 8 | NA | 142 | NA | 145 | NA | 215 | NA | ||
27998 x 50000 | sparse | 439 | NA | 370 | NA | 303 | NA | 1000 x 50000 | 35 | NA | 36 | NA | 34 | NA | 285 | NA | 308 | NA | 344 | NA |
dense | 352 | NA | 379 | NA | 354 | NA | 19 | NA | 15 | NA | 15 | NA | 336 | NA | 287 | NA | 462 | NA | ||
27998 x 100000 | sparse | 881 | NA | 741 | NA | 598 | NA | 1000 x 100000 | 67 | NA | 69 | NA | 67 | NA | 603 | NA | 572 | NA | 590 | NA |
dense | 706 | NA | 746 | NA | 723 | NA | 37 | NA | 30 | NA | 28 | NA | 627 | NA | 681 | NA | 546 | NA | ||
27998 x 200000 | sparse | 1851 | NA | 1487 | NA | 1283 | NA | 1000 x 200000 | 139 | NA | 132 | NA | 127 | NA | 1097 | NA | 1112 | NA | 1306 | NA |
dense | 1457 | NA | 1569 | NA | 1512 | NA | 95 | NA | 79 | NA | 58 | NA | 1749 | NA | 1505 | NA | 1747 | NA | ||
1. NORMALIZATION & sel. of 2000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 114 | NA | 98 | NA | 78 | NA | 2000 x 12500 | 12 | NA | 13 | NA | 13 | NA | 129 | NA | 141 | NA | 122 | NA |
dense | 88 | NA | 97 | NA | 83 | NA | 5 | NA | 6 | NA | 6 | NA | 128 | NA | 145 | NA | 215 | NA | ||
27998 x 25000 | sparse | 246 | NA | 201 | NA | 169 | NA | 2000 x 25000 | 25 | NA | 28 | NA | 26 | NA | 228 | NA | 276 | NA | 290 | NA |
dense | 183 | NA | 192 | NA | 184 | NA | 14 | NA | 13 | NA | 11 | NA | 301 | NA | 289 | NA | 448 | NA | ||
27998 x 50000 | sparse | 471 | NA | 391 | NA | 326 | NA | 2000 x 50000 | 51 | NA | 48 | NA | 50 | NA | 375 | NA | 424 | NA | 464 | NA |
dense | 368 | NA | 394 | NA | 369 | NA | 30 | NA | 24 | NA | 21 | NA | 570 | NA | 645 | NA | 650 | NA | ||
27998 x 100000 | sparse | 921 | NA | 781 | NA | 639 | NA | 2000 x 100000 | 92 | NA | 98 | NA | 94 | NA | 777 | NA | 1112 | NA | 988 | NA |
dense | 736 | NA | 764 | NA | 752 | NA | 70 | NA | 57 | NA | 45 | NA | 1296 | NA | 1049 | NA | 1432 | NA | ||
27998 x 200000 | sparse | 1913 | NA | 1558 | NA | 1376 | NA | 2000 x 200000 | 193 | NA | 190 | NA | 198 | NA | 1517 | NA | 1778 | NA | 1786 | NA |
dense | 1497 | NA | 1648 | NA | 1554 | NA | 156 | NA | 114 | NA | 88 | NA | 2813 | NA | 2326 | NA | 3058 | NA | ||
Table 2: Timings for Supermicro SuperServer 1029GQ-TRT For each operation, the best time across the three different block sizes is displayed in bold. In addition, if it’s also the best time across the sparse and dense formats, then we box it (only for Normalization and PCA). Memory usage > 4Gb is displayed in light red. |
Test dataset |
F o r m a t |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
Normalized Test dataset |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nrow x ncol (# genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
nrow x ncol (# sel. genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
|
1. NORMALIZATION & sel. of 1000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 100 | NA | 76 | NA | 63 | NA | 1000 x 12500 | 8 | NA | 10 | NA | 10 | NA | 56 | NA | 80 | NA | 79 | NA |
dense | 68 | NA | 68 | NA | 60 | NA | 3 | NA | 3 | NA | 2 | NA | 68 | NA | 59 | NA | 56 | NA | ||
27998 x 25000 | sparse | 206 | NA | 156 | NA | 121 | NA | 1000 x 25000 | 17 | NA | 15 | NA | 17 | NA | 110 | NA | 141 | NA | 202 | NA |
dense | 129 | NA | 139 | NA | 122 | NA | 5 | NA | 6 | NA | 5 | NA | 104 | NA | 112 | NA | 180 | NA | ||
27998 x 50000 | sparse | 404 | NA | 379 | NA | 294 | NA | 1000 x 50000 | 38 | NA | 38 | NA | 34 | NA | 286 | NA | 418 | NA | 329 | NA |
dense | 286 | NA | 290 | NA | 278 | NA | 17 | NA | 12 | NA | 12 | NA | 327 | NA | 274 | NA | 229 | NA | ||
27998 x 100000 | sparse | 839 | NA | 727 | NA | 543 | NA | 1000 x 100000 | 68 | NA | 71 | NA | 68 | NA | 495 | NA | 688 | NA | 564 | NA |
dense | 515 | NA | 553 | NA | 538 | NA | 27 | NA | 23 | NA | 20 | NA | 565 | NA | 649 | NA | 627 | NA | ||
27998 x 200000 | sparse | 1749 | NA | 1343 | NA | 1065 | NA | 1000 x 200000 | 135 | NA | 138 | NA | 135 | NA | 958 | NA | 1318 | NA | 1105 | NA |
dense | 1058 | NA | 1153 | NA | 1132 | NA | 68 | NA | 56 | NA | 42 | NA | 1498 | NA | 1197 | NA | 1437 | NA | ||
1. NORMALIZATION & sel. of 2000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 106 | NA | 88 | NA | 66 | NA | 2000 x 12500 | 11 | NA | 12 | NA | 15 | NA | 88 | NA | 101 | NA | 111 | NA |
dense | 70 | NA | 74 | NA | 62 | NA | 4 | NA | 4 | NA | 4 | NA | 110 | NA | 116 | NA | 184 | NA | ||
27998 x 25000 | sparse | 216 | NA | 169 | NA | 142 | NA | 2000 x 25000 | 25 | NA | 26 | NA | 27 | NA | 198 | NA | 283 | NA | 242 | NA |
dense | 133 | NA | 148 | NA | 134 | NA | 10 | NA | 10 | NA | 9 | NA | 286 | NA | 251 | NA | 193 | NA | ||
27998 x 50000 | sparse | 467 | NA | 360 | NA | 290 | NA | 2000 x 50000 | 49 | NA | 47 | NA | 51 | NA | 373 | NA | 462 | NA | 487 | NA |
dense | 293 | NA | 301 | NA | 289 | NA | 23 | NA | 18 | NA | 16 | NA | 505 | NA | 377 | NA | 557 | NA | ||
27998 x 100000 | sparse | 847 | NA | 728 | NA | 549 | NA | 2000 x 100000 | 91 | NA | 102 | NA | 88 | NA | 641 | NA | 789 | NA | 715 | NA |
dense | 534 | NA | 581 | NA | 560 | NA | 52 | NA | 46 | NA | 31 | NA | 1141 | NA | 968 | NA | 1252 | NA | ||
27998 x 200000 | sparse | 1808 | NA | 1429 | NA | 1145 | NA | 2000 x 200000 | 192 | NA | 200 | NA | 179 | NA | 1308 | NA | 2030 | NA | 1713 | NA |
dense | 1109 | NA | 1186 | NA | 1169 | NA | 119 | NA | 86 | NA | 72 | NA | 2326 | NA | 1766 | NA | 3058 | NA | ||
Table 3: Timings for Intel Mac Pro For each operation, the best time across the three different block sizes is displayed in bold. In addition, if it’s also the best time across the sparse and dense formats, then we box it (only for Normalization and PCA). Memory usage > 4Gb is displayed in light red. |
Test dataset |
F o r m a t |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
Normalized Test dataset |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
block size = 40 Mb |
block size = 100 Mb |
block size = 250 Mb |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nrow x ncol (# genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
nrow x ncol (# sel. genes x # cells) |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
time in sec. |
max. mem. used |
|
1. NORMALIZATION & sel. of 1000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 58 | NA | 44 | NA | 36 | NA | 1000 x 12500 | 4 | NA | 5 | NA | 5 | NA | 28 | NA | 39 | NA | 39 | NA |
dense | 38 | NA | 37 | NA | 34 | NA | 2 | NA | 2 | NA | 2 | NA | 33 | NA | 30 | NA | 29 | NA | ||
27998 x 25000 | sparse | 121 | NA | 92 | NA | 70 | NA | 1000 x 25000 | 9 | NA | 9 | NA | 9 | NA | 57 | NA | 72 | NA | 103 | NA |
dense | 71 | NA | 77 | NA | 71 | NA | 4 | NA | 4 | NA | 4 | NA | 55 | NA | 57 | NA | 96 | NA | ||
27998 x 50000 | sparse | 242 | NA | 189 | NA | 146 | NA | 1000 x 50000 | 19 | NA | 17 | NA | 15 | NA | 115 | NA | 156 | NA | 140 | NA |
dense | 149 | NA | 152 | NA | 151 | NA | 10 | NA | 7 | NA | 8 | NA | 149 | NA | 139 | NA | 114 | NA | ||
27998 x 100000 | sparse | 460 | NA | 377 | NA | 299 | NA | 1000 x 100000 | 34 | NA | 34 | NA | 30 | NA | 244 | NA | 326 | NA | 276 | NA |
dense | 291 | NA | 309 | NA | 311 | NA | 20 | NA | 16 | NA | 14 | NA | 296 | NA | 373 | NA | 334 | NA | ||
27998 x 200000 | sparse | 989 | NA | 752 | NA | 621 | NA | 1000 x 200000 | 67 | NA | 66 | NA | 62 | NA | 500 | NA | 686 | NA | 557 | NA |
dense | 596 | NA | 644 | NA | 651 | NA | 51 | NA | 40 | NA | 29 | NA | 846 | NA | 669 | NA | 790 | NA | ||
1. NORMALIZATION & sel. of 2000 most var. genes |
2. ON-DISK REALIZATION of the normalized dataset |
3. PCA of the normalized dataset |
||||||||||||||||||
27998 x 12500 | sparse | 62 | NA | 50 | NA | 38 | NA | 2000 x 12500 | 6 | NA | 7 | NA | 7 | NA | 47 | NA | 50 | NA | 51 | NA |
dense | 39 | NA | 40 | NA | 35 | NA | 2 | NA | 3 | NA | 3 | NA | 58 | NA | 59 | NA | 98 | NA | ||
27998 x 25000 | sparse | 127 | NA | 97 | NA | 81 | NA | 2000 x 25000 | 13 | NA | 12 | NA | 12 | NA | 90 | NA | 138 | NA | 116 | NA |
dense | 74 | NA | 80 | NA | 76 | NA | 7 | NA | 6 | NA | 5 | NA | 145 | NA | 124 | NA | 100 | NA | ||
27998 x 50000 | sparse | 253 | NA | 199 | NA | 158 | NA | 2000 x 50000 | 25 | NA | 24 | NA | 24 | NA | 165 | NA | 227 | NA | 205 | NA |
dense | 154 | NA | 157 | NA | 157 | NA | 14 | NA | 11 | NA | 10 | NA | 251 | NA | 197 | NA | 279 | NA | ||
27998 x 100000 | sparse | 493 | NA | 400 | NA | 322 | NA | 2000 x 100000 | 50 | NA | 52 | NA | 46 | NA | 293 | NA | 378 | NA | 358 | NA |
dense | 301 | NA | 324 | NA | 321 | NA | 36 | NA | 28 | NA | 20 | NA | 631 | NA | 517 | NA | 646 | NA | ||
27998 x 200000 | sparse | 1048 | NA | 801 | NA | 671 | NA | 2000 x 200000 | 93 | NA | 98 | NA | 91 | NA | 598 | NA | 1010 | NA | 818 | NA |
dense | 619 | NA | 666 | NA | 674 | NA | 82 | NA | 60 | NA | 44 | NA | 1256 | NA | 1010 | NA | 1634 | NA | ||
Table 4: Timings for Apple Silicon Mac Pro For each operation, the best time across the three different block sizes is displayed in bold. In addition, if it’s also the best time across the sparse and dense formats, then we box it (only for Normalization and PCA). Memory usage > 4Gb is displayed in light red. |
The Supermicro SuperServer 1029GQ-TRT machine is significantly slower than the other machines. This is most likely due to the less performant disk.
For PCA, choosing the sparse representation (TENxMatrix) and using small block sizes (40 Mb) is a clear winner.
For normalization, there’s no clear best choice between the parse and dense representations. More precisely, for this operation, the sparse representation tends to give the best times when using bigger blocks (250 Mb), whereas the dense representation tends to give the best times when using smaller blocks (40 Mb). However, based on the above benchmarks, there’s no clear best choice between “sparse with big blocks” and “dense with small blocks” in terms of speed. Maybe extending the benchmarks to include more extreme block sizes (e.g. 20 Mb and 500 Mb) could help break the tie.
Normalization and PCA are roughly linear in time, regardless of representation (sparse or dense) or block size.
For small block sizes (<= 100 Mb), memory usage doesn’t seem to be affected by the number of cells in the dataset, that is, all operations seem to perform at almost constant memory, regardless of representation (sparse or dense). However, for bigger block sizes (e.g. 250 Mb), the memory used to process the sparse datasets tend to increase slightly with the number of cells in the dataset.
[TODO] Add some plots to help vizualize the above assertions.
sessionInfo()
## R Under development (unstable) (2025-01-20 r87609)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TENxBrainData_1.27.0 SingleCellExperiment_1.29.1
## [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [5] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
## [7] RSpectra_0.16-2 DelayedMatrixStats_1.29.1
## [9] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [11] BiocFileCache_2.15.1 dbplyr_2.5.0
## [13] HDF5Array_1.35.11 h5mread_0.99.4
## [15] rhdf5_2.51.2 DelayedArray_0.33.4
## [17] SparseArray_1.7.4 S4Arrays_1.7.1
## [19] IRanges_2.41.2 abind_1.4-8
## [21] S4Vectors_0.45.2 MatrixGenerics_1.19.1
## [23] matrixStats_1.5.0 BiocGenerics_0.53.6
## [25] generics_0.1.3 Matrix_1.7-2
## [27] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.47.0 xfun_0.50 bslib_0.8.0
## [4] lattice_0.22-6 rhdf5filters_1.19.0 vctrs_0.6.5
## [7] tools_4.5.0 curl_6.2.0 tibble_3.2.1
## [10] AnnotationDbi_1.69.0 RSQLite_2.3.9 blob_1.2.4
## [13] pkgconfig_2.0.3 sparseMatrixStats_1.19.0 lifecycle_1.0.4
## [16] GenomeInfoDbData_1.2.13 compiler_4.5.0 Biostrings_2.75.3
## [19] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [22] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
## [25] cachem_1.1.0 mime_0.12 tidyselect_1.2.1
## [28] digest_0.6.37 purrr_1.0.2 dplyr_1.1.4
## [31] bookdown_0.42 BiocVersion_3.21.1 fastmap_1.2.0
## [34] grid_4.5.0 cli_3.6.3 magrittr_2.0.3
## [37] withr_3.0.2 filelock_1.0.3 UCSC.utils_1.3.1
## [40] rappdirs_0.3.3 bit64_4.6.0-1 rmarkdown_2.29
## [43] XVector_0.47.2 httr_1.4.7 bit_4.5.0.1
## [46] png_0.1-8 memoise_2.0.1 evaluate_1.0.3
## [49] knitr_1.49 rlang_1.1.5 Rcpp_1.0.14
## [52] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
## [55] jsonlite_1.8.9 R6_2.5.1 Rhdf5lib_1.29.0