5.5.2 Main takeaways
Using the TENxMatrix representation (sparse format), one can perform normalization and PCA of the bigger dataset (200,000 cells) on an average consumer-grade laptop like the DELL XPS 15 laptop (model 9520) in less than 25 minutes and using less than 4Gb of memory, as shown in this table:
Machine |
NORMALIZATION time block size = 250 Mb |
REALIZATION time block size = 250 Mb |
PCA time block size = 40 Mb |
TOTAL time |
Max. mem. used |
---|---|---|---|---|---|
DELL XPS 15 laptop | 643 | 69 | 692 | 1404 | 2.9Gb |
Supermicro SuperServer 1029GQ-TRT | NA | NA | NA | NA | NA |
Apple Silicon Mac Pro | NA | NA | NA | NA | NA |
Comparing times across machines For each machine, we show the normalization, realization, and PCA times (plus total time) obtained on the 27998 x 200000 dataset, using the “[s] TENxMatrix (sparse)” format, and selecting the 2000 most variable genes during the normalization step. All times are in seconds. |
Normalization and PCA are roughly linear in time with respect to the number of cells in the dataset, regardless of representation (sparse or dense) or block size.
Block size matters. When using the TENxMatrix representation (sparse format), the bigger the blocks the faster normalization will be (at the cost of increased memory usage). On the other hand PCA prefers small blocks.
Disk performance is of course important as attested by the lower performance of the Supermicro SuperServer 1029GQ-TRT machine, likely due to its slower disk.