Non-linear dimensionality reduction techniques such as t-SNE (Maaten and Hinton 2008) and UMAP (McInnes et al. 2020) produce a low-dimensional
embedding that summarises the global structure of high-dimensional data.
These techniques can be particularly useful when visualising
high-dimensional data in a biological setting. However, these embeddings
may not accurately represent the local density of data in the original
space, resulting in misleading visualisations where the space given to
clusters of data does not represent the fraction of the high dimensional
space that they occupy. densvis implements the
density-preserving objective function described by (Narayan et al. 2020) which aims to address this
deficiency by including a density-preserving term in the t-SNE and UMAP
optimisation procedures. This can enable the creation of visualisations
that accurately capture differing degrees of transcriptional
heterogeneity within different cell subpopulations in scRNAseq
experiments, for example.
We will illustrate the use of densvis using simulated data. We will
first load the densvis and Rtsne libraries and
set a random seed to ensure the t-SNE visualisation is reproducible
(note: it is good practice to ensure that a t-SNE embedding is robust by
running the algorithm multiple times).
library("densvis")
library("Rtsne")
library("uwot")
library("ggplot2")
theme_set(theme_bw())
set.seed(14)data <- data.frame(
x = c(rnorm(1000, 5), rnorm(1000, 0, 0.2)),
y = c(rnorm(1000, 5), rnorm(1000, 0, 0.2)),
class = c(rep("Class 1", 1000), rep("Class 2", 1000))
)
ggplot() +
aes(data[, 1], data[, 2], colour = data$class) +
geom_point(pch = 19) +
scale_colour_discrete(name = "Cluster") +
ggtitle("Original co-ordinates")Density-preserving t-SNE can be generated using the
densne function. This function returns a matrix of t-SNE
co-ordinates. We set dens_frac (the fraction of
optimisation steps that consider the density preservation) and
dens_lambda (the weight given to density preservation
relative to the standard t-SNE objective) each to 0.5.
fit1 <- densne(data[, 1:2], dens_frac = 0.5, dens_lambda = 0.5)
ggplot() +
aes(fit1[, 1], fit1[, 2], colour = data$class) +
geom_point(pch = 19) +
scale_colour_discrete(name = "Class") +
ggtitle("Density-preserving t-SNE") +
labs(x = "t-SNE 1", y = "t-SNE 2")If we run t-SNE on the same data, we can see that the density-preserving objective better represents the density of the data,
fit2 <- Rtsne(data[, 1:2])
ggplot() +
aes(fit2$Y[, 1], fit2$Y[, 2], colour = data$class) +
geom_point(pch = 19) +
scale_colour_discrete(name = "Class") +
ggtitle("Standard t-SNE") +
labs(x = "t-SNE 1", y = "t-SNE 2")A density-preserving UMAP embedding can be generated using the
densmap function. This function returns a matrix of UMAP
co-ordinates. As with t-SNE, we set dens_frac (the fraction
of optimisation steps that consider the density preservation) and
dens_lambda (the weight given to density preservation
relative to the standard t-SNE objective) each to 0.5.
fit1 <- densmap(data[, 1:2], dens_frac = 0.5, dens_lambda = 0.5)
#> Installing pyenv ...
#> Done! pyenv has been installed to '/github/home/.local/share/r-reticulate/pyenv/bin/pyenv'.
#> Using Python: /github/home/.pyenv/versions/3.12.10/bin/python3.12
#> Creating virtual environment '/github/home/.cache/R/basilisk/1.24.0/densvis/1.22.0/densvis' ...
#> Done!
#> Installing packages: pip, wheel, setuptools
#> Installing packages: 'umap-learn==0.5.9.post2', 'scikit-learn==1.7.0', 'numba==0.62.0', 'pynndescent==0.5.13', 'scipy==1.16.0', 'numpy==2.3.4', 'llvmlite==0.45.0'
#> Virtual environment '/github/home/.cache/R/basilisk/1.24.0/densvis/1.22.0/densvis' successfully created.
ggplot() +
aes(fit1[, 1], fit1[, 2], colour = data$class) +
geom_point(pch = 19) +
scale_colour_discrete(name = "Class") +
ggtitle("Density-preserving t-SNE") +
labs(x = "t-SNE 1", y = "t-SNE 2")If we run UMAP on the same data, we can see that the density-preserving objective better represents the density of the data,
fit2 <- umap(data[, 1:2])
ggplot() +
aes(fit2[, 1], fit2[, 2], colour = data$class) +
geom_point(pch = 19) +
scale_colour_discrete(name = "Class") +
ggtitle("Standard t-SNE") +
labs(x = "t-SNE 1", y = "t-SNE 2")sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.3 uwot_0.2.4 Matrix_1.7-5 Rtsne_0.17
#> [5] densvis_1.22.0 BiocStyle_2.40.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.6.0
#> [4] BiocManager_1.30.27 filelock_1.0.3 Rcpp_1.1.1-1.1
#> [7] FNN_1.1.4.1 parallel_4.6.0 assertthat_0.2.1
#> [10] jquerylib_0.1.4 scales_1.4.0 png_0.1-9
#> [13] yaml_2.3.12 fastmap_1.2.0 reticulate_1.46.0
#> [16] lattice_0.22-9 R6_2.6.1 labeling_0.4.3
#> [19] knitr_1.51 maketools_1.3.2 RColorBrewer_1.1-3
#> [22] bslib_0.10.0 rlang_1.2.0 cachem_1.1.0
#> [25] dir.expiry_1.20.0 xfun_0.57 S7_0.2.2
#> [28] sass_0.4.10 sys_3.4.3 cli_3.6.6
#> [31] withr_3.0.2 digest_0.6.39 grid_4.6.0
#> [34] rappdirs_0.3.4 basilisk_1.24.0 lifecycle_1.0.5
#> [37] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1
#> [40] farver_2.1.2 buildtools_1.0.0 rmarkdown_2.31
#> [43] tools_4.6.0 htmltools_0.5.9