DenoIST 0.99.2
DenoIST (Denoising Image-based Spatial Transcriptomics) is a method for identifying and removing contamination artefacts in image-based single-cell transcriptomics (IST) data. This vignette shows how to use it with a SpatialExperiment object or a matrix with coordinates as a separate input.
Why does IST data need denoising? As cell segmentation is imperfect, sometimes segmented cells present gene expression profiles that are contradictory and highly improbable. DenoIST can identify these cases and remove likely `contaminating’ gene transcripts. If you are interested in more details, please refer to the preprint.
You can install DenoIST via Bioconductor:
if(!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("DenoIST")
For demonstration, we will use a small Xenium sample from a lung fibrosis study (Vannan & Lyu et. al, 2025). It can be downloaded at GSE250346. We downsampled this dataset to 300 cells for demonstration purpose.
For this vignette, we will use a pre-made SpatialExperiment object.
suppressPackageStartupMessages({
library(DenoIST)
library(SpatialExperiment)
library(ggplot2)
library(patchwork)
})
spe <- readRDS(system.file("extdata", "example_spe.rds", package = "DenoIST"))
spe
#> class: SpatialExperiment
#> dim: 343 300
#> metadata(4): experiment.xenium transcripts cell_boundaries
#> nucleus_boundaries
#> assays(1): counts
#> rownames(343): ABCC2 ACKR1 ... YAP1 ZEB1
#> rowData names(3): ID Symbol Type
#> colnames(300): aaaaapjn-1 aaaablkg-1 ... aaaadbon-1 aaaabpib-1
#> colData names(10): cell_id transcript_counts ... nucleus_area sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(3): NegControlProbe UnassignedCodeword NegControlCodeword
#> spatialCoords names(2) : x_centroid y_centroid
#> imgData names(0):
Since the transcript file is too big to upload, for demonstration we will read in a very small subset to show the what the format is like:
tx <- readRDS(system.file("extdata", "tx_sub.rds", package = "DenoIST"))
tx
#> X transcript_id cell_id overlaps_nucleus
#> 1 1 VUHD116A_281474976711003 VUHD116A_UNASSIGNED 0
#> 2 2 VUHD116A_281474976711398 VUHD116A_UNASSIGNED 0
#> 3 3 VUHD116A_281474976711406 VUHD116A_UNASSIGNED 0
#> 4 4 VUHD116A_281474976711409 VUHD116A_UNASSIGNED 0
#> 5 5 VUHD116A_281474976711415 VUHD116A_UNASSIGNED 0
#> 6 6 VUHD116A_281474976711418 VUHD116A_UNASSIGNED 0
#> 7 7 VUHD116A_281474976711421 VUHD116A_UNASSIGNED 0
#> 8 8 VUHD116A_281474976712029 VUHD116A_UNASSIGNED 0
#> 9 9 VUHD116A_281474976712842 VUHD116A_UNASSIGNED 0
#> 10 10 VUHD116A_281474976712849 VUHD116A_UNASSIGNED 0
#> feature_name x_location y_location z_location qv fov_name
#> 1 NegControlCodeword_0517 202.896320 184.33257 17.63912 2.910288 Q15
#> 2 COL1A1 7.082226 110.41018 12.68721 40.000000 Q15
#> 3 LYZ 76.882706 196.56784 12.74130 40.000000 Q15
#> 4 LAMP3 101.830030 107.37660 13.04323 40.000000 Q15
#> 5 LYZ 149.067670 231.72057 12.96732 10.863165 Q15
#> 6 LYZ 175.447590 171.59155 13.25221 3.295905 Q15
#> 7 NUCB2 179.364910 233.80177 13.01727 10.531877 Q15
#> 8 SFTPC 201.708700 90.84219 12.75298 24.772476 Q15
#> 9 COL1A2 73.322300 147.30553 14.10094 7.225322 Q15
#> 10 FGFBP2 127.075035 61.95987 21.58544 3.041668 Q15
#> nucleus_distance
#> 1 329.0720
#> 2 537.2145
#> 3 439.8191
#> 4 455.0005
#> 5 360.2229
#> 6 359.2819
#> 7 330.7972
#> 8 383.3843
#> 9 461.5779
#> 10 459.5809
These 2 files would be the input for denoising.
You should only need to use 1 function most of the time, unless you are trying to debug or understand the process. The main function is denoist(), which takes a SpatialExperiment object (or a matrix with coordinates), plus the transcript data frame as input. It will return a list containing the memberships, adjusted counts, and parameters for each gene.
The distance parameter specifies the maximum distance to consider for local background estimation. The nbins parameter specifies the number of bins to use for hexagonal binning, which is used for calculating background transcript contamination. They have default values but you can adjust them based on your data. For example if your data is very small in size then perhaps a lower distance and nbins would be better.
You should also check whether transcript data frame has the correct columns. The tx_x and tx_y parameters specify the column names for the x and y coordinates, respectively. The feature_label parameter specifies the column name for the gene of each transcript. In this example, they are called x_location, y_location and feature_name. You can also speed up the process with more cpus via the cl option (which is highly recommended).
Lastly, you can specify an output directory with out_dir to save the results automatically. If you don’t want to, just leave it empty.
# DenoIST
result <- denoist(mat = spe, tx = tx, coords = NULL, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)
#> QV column found! Filtering qv for high quality transcripts...
#> flexmix failed during GMM fit: 2 Log-likelihood: NaN
#> Setting global background contamination to 1...
#> Warning in sweep(res_mat, 1, bg_offset, "+"): STATS does not recycle exactly
#> across MARGIN
Not that this is exactly the same as extracting the matrix and coordinates out manually:
count_mat <- assay(spe)
coords <- spatialCoords(spe)
result <- denoist(mat = count_mat, tx = tx, coords = coords, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)
This is useful if you want to run DenoIST on a matrix that is not a SpatialExperiment object.
The most useful output in result should be the adjusted_counts. Here we read in a pre-made result with more cells to better demonstrate the intended effect.
result <- readRDS(system.file("extdata", "result.rds", package = "DenoIST"))
result$adjusted_counts[1:5, 1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#> aaaaciko-1 aaaaapjj-1 aaaabogd-1 aaaacmej-1 aaaaaiak-1
#> ABCC2 . . . . .
#> ACKR1 2 . . . .
#> ACTA2 13 . . . 6
#> AGER . . . . 2
#> AGR3 . . . . .
The memberships and params are useful for debugging if the adjusted counts are not what you expect.
result$memberships[1:5, 1:5]
#> aaaaciko-1 aaaaapjj-1 aaaabogd-1 aaaacmej-1 aaaaaiak-1
#> ABCC2 0 0 0 0 0
#> ACKR1 1 0 0 0 0
#> ACTA2 1 0 0 0 1
#> AGER 0 0 0 0 1
#> AGR3 0 0 0 0 0
result$params[[42]]
#> $memberships
#> [1] 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0
#> [38] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [75] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> [112] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
#> [149] 0 1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0
#> [186] 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [223] 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0
#> [260] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1
#> [297] 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
#> [334] 1 0 0 0 0 0 1 1 0 0
#>
#> $posterior
#> [1] 3.203132e-01 3.203132e-01 9.866641e-01 1.080555e-01 9.996291e-01
#> [6] 7.593669e-02 3.203132e-01 9.999997e-01 9.999943e-01 2.422369e-01
#> [11] 3.203132e-01 3.203132e-01 3.203132e-01 2.422369e-01 4.452649e-03
#> [16] 3.203132e-01 2.084134e-01 2.796106e-01 2.084134e-01 2.796106e-01
#> [21] 3.203132e-01 3.203132e-01 2.422369e-01 7.842516e-01 2.796106e-01
#> [26] 3.203132e-01 1.080555e-01 9.965375e-01 2.796106e-01 9.998594e-01
#> [31] 2.796106e-01 3.203132e-01 2.084134e-01 9.209250e-01 3.203132e-01
#> [36] 3.203132e-01 3.203132e-01 9.974234e-01 3.203132e-01 3.203132e-01
#> [41] 3.203132e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [46] 3.203132e-01 3.203132e-01 3.203132e-01 2.796106e-01 3.203132e-01
#> [51] 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.084134e-01
#> [56] 2.804996e-06 3.203132e-01 1.412716e-02 3.203132e-01 3.203132e-01
#> [61] 3.203132e-01 2.796106e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [66] 2.796106e-01 3.203132e-01 2.422369e-01 9.995497e-01 3.203132e-01
#> [71] 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.796106e-01
#> [76] 3.203132e-01 6.339115e-02 7.942044e-03 4.452649e-03 3.203132e-01
#> [81] 9.209250e-01 3.203132e-01 2.422369e-01 2.796106e-01 2.068779e-02
#> [86] 3.203132e-01 3.203132e-01 9.999976e-01 3.203132e-01 3.203132e-01
#> [91] 3.203132e-01 2.796106e-01 3.203132e-01 1.080555e-01 2.796106e-01
#> [96] 2.422369e-01 3.203132e-01 3.203132e-01 2.422369e-01 1.080555e-01
#> [101] 2.084134e-01 4.389517e-02 1.782021e-01 5.279984e-02 3.203132e-01
#> [106] 3.203132e-01 2.422369e-01 2.310471e-02 9.838541e-01 3.203132e-01
#> [111] 5.279984e-02 3.203132e-01 1.166430e-02 2.796106e-01 8.667819e-01
#> [116] 3.203132e-01 3.203132e-01 4.389517e-02 9.999998e-01 1.282302e-01
#> [121] 2.422369e-01 2.796106e-01 2.422369e-01 2.796106e-01 2.796106e-01
#> [126] 3.203132e-01 2.422369e-01 3.495848e-05 3.203132e-01 3.203132e-01
#> [131] 3.203132e-01 7.842516e-01 3.203132e-01 1.282302e-01 3.203132e-01
#> [136] 6.339115e-02 3.203132e-01 2.796106e-01 8.427374e-01 3.203132e-01
#> [141] 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.422369e-01
#> [146] 4.452649e-03 9.925054e-01 2.796106e-01 3.879516e-01 9.996945e-01
#> [151] 3.203132e-01 2.422369e-01 8.974495e-01 5.315211e-01 1.782021e-01
#> [156] 9.999604e-01 9.714637e-01 4.389517e-02 6.700552e-01 3.203132e-01
#> [161] 3.203132e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [166] 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [171] 2.796106e-01 1.515318e-01 2.796106e-01 3.203132e-01 3.203132e-01
#> [176] 8.152781e-01 3.203132e-01 2.422369e-01 7.114620e-01 2.422369e-01
#> [181] 9.976735e-01 2.796106e-01 9.209250e-01 3.203132e-01 1.282302e-01
#> [186] 2.796106e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [191] 3.203132e-01 2.796106e-01 9.997484e-01 3.203132e-01 3.203132e-01
#> [196] 8.152781e-01 3.203132e-01 9.460391e-04 3.203132e-01 3.203132e-01
#> [201] 1.515318e-01 3.203132e-01 3.203132e-01 3.203132e-01 8.152781e-01
#> [206] 9.055884e-01 2.422369e-01 3.203132e-01 3.203132e-01 1.166430e-02
#> [211] 3.203132e-01 1.047879e-03 3.203132e-01 1.782021e-01 3.203132e-01
#> [216] 2.422369e-01 3.203132e-01 3.203132e-01 9.999992e-01 2.796106e-01
#> [221] 3.203132e-01 3.203132e-01 3.203132e-01 1.080555e-01 3.203132e-01
#> [226] 3.670119e-03 3.203132e-01 9.209250e-01 1.000000e+00 2.084134e-01
#> [231] 9.072457e-02 6.419218e-04 9.994534e-01 2.796106e-01 3.203132e-01
#> [236] 1.412716e-02 2.796106e-01 1.000000e+00 9.763784e-01 2.796106e-01
#> [241] 2.422369e-01 3.203132e-01 2.084134e-01 3.203132e-01 3.203132e-01
#> [246] 1.080555e-01 9.958433e-01 9.688081e-01 9.994534e-01 1.148420e-03
#> [251] 7.401641e-06 3.203132e-01 2.422369e-01 3.670119e-03 2.068779e-02
#> [256] 9.209250e-01 3.203132e-01 2.422369e-01 1.282302e-01 2.796106e-01
#> [261] 7.593669e-02 7.593669e-02 3.203132e-01 2.084134e-01 2.796106e-01
#> [266] 9.925840e-01 3.024697e-03 3.203132e-01 6.339115e-02 2.796106e-01
#> [271] 3.203132e-01 3.203132e-01 3.643451e-02 6.339115e-02 9.055884e-01
#> [276] 3.203132e-01 1.000000e+00 3.203132e-01 2.796106e-01 8.427374e-01
#> [281] 3.203132e-01 3.203132e-01 2.084134e-01 4.642031e-09 2.371364e-05
#> [286] 2.084134e-01 2.422369e-01 8.876402e-01 3.203132e-01 3.203132e-01
#> [291] 7.842516e-01 2.796106e-01 3.203132e-01 2.422369e-01 9.999995e-01
#> [296] 9.209250e-01 2.422369e-01 9.209250e-01 1.257362e-07 2.068779e-02
#> [301] 9.889906e-01 3.203132e-01 2.615142e-01 1.282302e-01 9.072457e-02
#> [306] 1.515318e-01 9.993364e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [311] 4.389517e-02 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [316] 3.203132e-01 3.203132e-01 8.152781e-01 3.203132e-01 3.203132e-01
#> [321] 3.203132e-01 2.796106e-01 3.203132e-01 2.796106e-01 2.796106e-01
#> [326] 5.315211e-01 3.203132e-01 9.938192e-01 2.796106e-01 2.796106e-01
#> [331] 7.496143e-01 4.817920e-15 3.203132e-01 1.000000e+00 2.796106e-01
#> [336] 3.203132e-01 2.422369e-01 2.796106e-01 3.203132e-01 8.667819e-01
#> [341] 1.000000e+00 4.839647e-02 9.072457e-02
#>
#> $lambda1
#> [1] 0.2007557
#>
#> $lambda2
#> [1] 0.006691134
#>
#> $pi
#> [1] 0.363928
#>
#> $log_lik
#> [1] -248.0243
We can also quickly visualise the difference.
# a custom helper function for plotting
plot_feature_scatter <- function(coords, mat, feature, size = 0.3, output_filename = NULL) {
# Create a data frame from the coordinates and features
plot_data <- data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,])
# Create the scatterplot
p <- ggplot(plot_data, aes(x = x, y = y, color = feature)) +
geom_point(size = size, alpha = 0.5) + # Make the dots smaller
theme_minimal() +
labs(title = feature, x = "X Coordinate", y = "Y Coordinate", color = "Feature") +
theme(legend.position = "right") +
scale_color_viridis_c() # Use a color palette with higher contrast
# Save the plot if an output filename is provided
if (!is.null(output_filename)) {
ggsave(output_filename, p, width = 10, height = 8, units = "in", dpi = 300)
}
# Return the plot
return(p)
}
orig <- plot_feature_scatter(coords = spatialCoords(spe), log(assay(spe)+1), feature = "EPAS1") + ggtitle('Original')
adj <- plot_feature_scatter(coords = spatialCoords(spe), log(result$adjusted_counts+1), feature = "EPAS1") + ggtitle('DenoIST adjusted')
#> Warning in data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,
#> : row names were found from a short variable and have been discarded
(orig + adj) + plot_layout(guides = "collect")
sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] patchwork_1.3.2 ggplot2_4.0.2
#> [3] SpatialExperiment_1.21.0 SingleCellExperiment_1.33.0
#> [5] SummarizedExperiment_1.41.1 Biobase_2.71.0
#> [7] GenomicRanges_1.63.1 Seqinfo_1.1.0
#> [9] IRanges_2.45.0 S4Vectors_0.49.0
#> [11] BiocGenerics_0.57.0 generics_0.1.4
#> [13] MatrixGenerics_1.23.0 matrixStats_1.5.0
#> [15] DenoIST_0.99.2 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 rjson_0.2.23 xfun_0.56
#> [4] bslib_0.10.0 lattice_0.22-9 vctrs_0.7.1
#> [7] tools_4.6.0 parallel_4.6.0 flexmix_2.3-20
#> [10] tibble_3.3.1 pkgconfig_2.0.3 Matrix_1.7-4
#> [13] RColorBrewer_1.1-3 S7_0.2.1 sparseMatrixStats_1.23.0
#> [16] assertthat_0.2.1 lifecycle_1.0.5 compiler_4.6.0
#> [19] farver_2.1.2 tinytex_0.58 htmltools_0.5.9
#> [22] sass_0.4.10 yaml_2.3.12 hexbin_1.28.5
#> [25] pillar_1.11.1 jquerylib_0.1.4 DelayedArray_0.37.0
#> [28] cachem_1.1.0 magick_2.9.0 abind_1.4-8
#> [31] tidyselect_1.2.1 digest_0.6.39 purrr_1.2.1
#> [34] dplyr_1.2.0 bookdown_0.46 labeling_0.4.3
#> [37] arrow_23.0.0.1 fastmap_1.2.0 grid_4.6.0
#> [40] cli_3.6.5 SparseArray_1.11.10 magrittr_2.0.4
#> [43] S4Arrays_1.11.1 dichromat_2.0-0.1 withr_3.0.2
#> [46] scales_1.4.0 bit64_4.6.0-1 rmarkdown_2.30
#> [49] XVector_0.51.0 bit_4.6.0 otel_0.2.0
#> [52] nnet_7.3-20 modeltools_0.2-24 pbapply_1.7-4
#> [55] evaluate_1.0.5 knitr_1.51 viridisLite_0.4.3
#> [58] rlang_1.1.7 Rcpp_1.1.1 glue_1.8.0
#> [61] BiocManager_1.30.27 jsonlite_2.0.0 R6_2.6.1