Contents

1 Introduction

DenoIST (Denoising Image-based Spatial Transcriptomics) is a method for identifying and removing contamination artefacts in image-based single-cell transcriptomics (IST) data. This vignette shows how to use it with a SpatialExperiment object or a matrix with coordinates as a separate input.

Why does IST data need denoising? As cell segmentation is imperfect, sometimes segmented cells present gene expression profiles that are contradictory and highly improbable. DenoIST can identify these cases and remove likely `contaminating’ gene transcripts. If you are interested in more details, please refer to the preprint.

2 Installation

You can install DenoIST via Bioconductor:

if(!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("DenoIST")

3 Load data

For demonstration, we will use a small Xenium sample from a lung fibrosis study (Vannan & Lyu et. al, 2025). It can be downloaded at GSE250346. We downsampled this dataset to 300 cells for demonstration purpose.

For this vignette, we will use a pre-made SpatialExperiment object.

suppressPackageStartupMessages({
  library(DenoIST)
  library(SpatialExperiment)
  library(ggplot2)
  library(patchwork)
})
spe <- readRDS(system.file("extdata", "example_spe.rds", package = "DenoIST"))
spe
#> class: SpatialExperiment 
#> dim: 343 300 
#> metadata(4): experiment.xenium transcripts cell_boundaries
#>   nucleus_boundaries
#> assays(1): counts
#> rownames(343): ABCC2 ACKR1 ... YAP1 ZEB1
#> rowData names(3): ID Symbol Type
#> colnames(300): aaaaapjn-1 aaaablkg-1 ... aaaadbon-1 aaaabpib-1
#> colData names(10): cell_id transcript_counts ... nucleus_area sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(3): NegControlProbe UnassignedCodeword NegControlCodeword
#> spatialCoords names(2) : x_centroid y_centroid
#> imgData names(0):

Since the transcript file is too big to upload, for demonstration we will read in a very small subset to show the what the format is like:

tx <- readRDS(system.file("extdata", "tx_sub.rds", package = "DenoIST"))
tx
#>     X            transcript_id             cell_id overlaps_nucleus
#> 1   1 VUHD116A_281474976711003 VUHD116A_UNASSIGNED                0
#> 2   2 VUHD116A_281474976711398 VUHD116A_UNASSIGNED                0
#> 3   3 VUHD116A_281474976711406 VUHD116A_UNASSIGNED                0
#> 4   4 VUHD116A_281474976711409 VUHD116A_UNASSIGNED                0
#> 5   5 VUHD116A_281474976711415 VUHD116A_UNASSIGNED                0
#> 6   6 VUHD116A_281474976711418 VUHD116A_UNASSIGNED                0
#> 7   7 VUHD116A_281474976711421 VUHD116A_UNASSIGNED                0
#> 8   8 VUHD116A_281474976712029 VUHD116A_UNASSIGNED                0
#> 9   9 VUHD116A_281474976712842 VUHD116A_UNASSIGNED                0
#> 10 10 VUHD116A_281474976712849 VUHD116A_UNASSIGNED                0
#>               feature_name x_location y_location z_location        qv fov_name
#> 1  NegControlCodeword_0517 202.896320  184.33257   17.63912  2.910288      Q15
#> 2                   COL1A1   7.082226  110.41018   12.68721 40.000000      Q15
#> 3                      LYZ  76.882706  196.56784   12.74130 40.000000      Q15
#> 4                    LAMP3 101.830030  107.37660   13.04323 40.000000      Q15
#> 5                      LYZ 149.067670  231.72057   12.96732 10.863165      Q15
#> 6                      LYZ 175.447590  171.59155   13.25221  3.295905      Q15
#> 7                    NUCB2 179.364910  233.80177   13.01727 10.531877      Q15
#> 8                    SFTPC 201.708700   90.84219   12.75298 24.772476      Q15
#> 9                   COL1A2  73.322300  147.30553   14.10094  7.225322      Q15
#> 10                  FGFBP2 127.075035   61.95987   21.58544  3.041668      Q15
#>    nucleus_distance
#> 1          329.0720
#> 2          537.2145
#> 3          439.8191
#> 4          455.0005
#> 5          360.2229
#> 6          359.2819
#> 7          330.7972
#> 8          383.3843
#> 9          461.5779
#> 10         459.5809

These 2 files would be the input for denoising.

4 Denoising the data

You should only need to use 1 function most of the time, unless you are trying to debug or understand the process. The main function is denoist(), which takes a SpatialExperiment object (or a matrix with coordinates), plus the transcript data frame as input. It will return a list containing the memberships, adjusted counts, and parameters for each gene.

The distance parameter specifies the maximum distance to consider for local background estimation. The nbins parameter specifies the number of bins to use for hexagonal binning, which is used for calculating background transcript contamination. They have default values but you can adjust them based on your data. For example if your data is very small in size then perhaps a lower distance and nbins would be better.

You should also check whether transcript data frame has the correct columns. The tx_x and tx_y parameters specify the column names for the x and y coordinates, respectively. The feature_label parameter specifies the column name for the gene of each transcript. In this example, they are called x_location, y_location and feature_name. You can also speed up the process with more cpus via the cl option (which is highly recommended).

Lastly, you can specify an output directory with out_dir to save the results automatically. If you don’t want to, just leave it empty.

# DenoIST
result <- denoist(mat = spe, tx = tx, coords = NULL, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)
#> QV column found! Filtering qv for high quality transcripts...
#> flexmix failed during GMM fit:    2 Log-likelihood: NaN
#> Setting global background contamination to 1...
#> Warning in sweep(res_mat, 1, bg_offset, "+"): STATS does not recycle exactly
#> across MARGIN

Not that this is exactly the same as extracting the matrix and coordinates out manually:

count_mat <- assay(spe)
coords <- spatialCoords(spe)

result <- denoist(mat = count_mat, tx = tx, coords = coords, tx_x = "x_location", tx_y = "y_location", feature_label = "feature_name", distance = 50, nbins = 200, cl = 1)

This is useful if you want to run DenoIST on a matrix that is not a SpatialExperiment object.

5 Check results

The most useful output in result should be the adjusted_counts. Here we read in a pre-made result with more cells to better demonstrate the intended effect.

result <- readRDS(system.file("extdata", "result.rds", package = "DenoIST"))
result$adjusted_counts[1:5, 1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#>       aaaaciko-1 aaaaapjj-1 aaaabogd-1 aaaacmej-1 aaaaaiak-1
#> ABCC2          .          .          .          .          .
#> ACKR1          2          .          .          .          .
#> ACTA2         13          .          .          .          6
#> AGER           .          .          .          .          2
#> AGR3           .          .          .          .          .

The memberships and params are useful for debugging if the adjusted counts are not what you expect.

result$memberships[1:5, 1:5]
#>       aaaaciko-1 aaaaapjj-1 aaaabogd-1 aaaacmej-1 aaaaaiak-1
#> ABCC2          0          0          0          0          0
#> ACKR1          1          0          0          0          0
#> ACTA2          1          0          0          0          1
#> AGER           0          0          0          0          1
#> AGR3           0          0          0          0          0
result$params[[42]]
#> $memberships
#>   [1] 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0
#>  [38] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#>  [75] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> [112] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
#> [149] 0 1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0
#> [186] 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [223] 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0
#> [260] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1
#> [297] 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
#> [334] 1 0 0 0 0 0 1 1 0 0
#> 
#> $posterior
#>   [1] 3.203132e-01 3.203132e-01 9.866641e-01 1.080555e-01 9.996291e-01
#>   [6] 7.593669e-02 3.203132e-01 9.999997e-01 9.999943e-01 2.422369e-01
#>  [11] 3.203132e-01 3.203132e-01 3.203132e-01 2.422369e-01 4.452649e-03
#>  [16] 3.203132e-01 2.084134e-01 2.796106e-01 2.084134e-01 2.796106e-01
#>  [21] 3.203132e-01 3.203132e-01 2.422369e-01 7.842516e-01 2.796106e-01
#>  [26] 3.203132e-01 1.080555e-01 9.965375e-01 2.796106e-01 9.998594e-01
#>  [31] 2.796106e-01 3.203132e-01 2.084134e-01 9.209250e-01 3.203132e-01
#>  [36] 3.203132e-01 3.203132e-01 9.974234e-01 3.203132e-01 3.203132e-01
#>  [41] 3.203132e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#>  [46] 3.203132e-01 3.203132e-01 3.203132e-01 2.796106e-01 3.203132e-01
#>  [51] 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.084134e-01
#>  [56] 2.804996e-06 3.203132e-01 1.412716e-02 3.203132e-01 3.203132e-01
#>  [61] 3.203132e-01 2.796106e-01 3.203132e-01 3.203132e-01 3.203132e-01
#>  [66] 2.796106e-01 3.203132e-01 2.422369e-01 9.995497e-01 3.203132e-01
#>  [71] 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.796106e-01
#>  [76] 3.203132e-01 6.339115e-02 7.942044e-03 4.452649e-03 3.203132e-01
#>  [81] 9.209250e-01 3.203132e-01 2.422369e-01 2.796106e-01 2.068779e-02
#>  [86] 3.203132e-01 3.203132e-01 9.999976e-01 3.203132e-01 3.203132e-01
#>  [91] 3.203132e-01 2.796106e-01 3.203132e-01 1.080555e-01 2.796106e-01
#>  [96] 2.422369e-01 3.203132e-01 3.203132e-01 2.422369e-01 1.080555e-01
#> [101] 2.084134e-01 4.389517e-02 1.782021e-01 5.279984e-02 3.203132e-01
#> [106] 3.203132e-01 2.422369e-01 2.310471e-02 9.838541e-01 3.203132e-01
#> [111] 5.279984e-02 3.203132e-01 1.166430e-02 2.796106e-01 8.667819e-01
#> [116] 3.203132e-01 3.203132e-01 4.389517e-02 9.999998e-01 1.282302e-01
#> [121] 2.422369e-01 2.796106e-01 2.422369e-01 2.796106e-01 2.796106e-01
#> [126] 3.203132e-01 2.422369e-01 3.495848e-05 3.203132e-01 3.203132e-01
#> [131] 3.203132e-01 7.842516e-01 3.203132e-01 1.282302e-01 3.203132e-01
#> [136] 6.339115e-02 3.203132e-01 2.796106e-01 8.427374e-01 3.203132e-01
#> [141] 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01 2.422369e-01
#> [146] 4.452649e-03 9.925054e-01 2.796106e-01 3.879516e-01 9.996945e-01
#> [151] 3.203132e-01 2.422369e-01 8.974495e-01 5.315211e-01 1.782021e-01
#> [156] 9.999604e-01 9.714637e-01 4.389517e-02 6.700552e-01 3.203132e-01
#> [161] 3.203132e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [166] 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [171] 2.796106e-01 1.515318e-01 2.796106e-01 3.203132e-01 3.203132e-01
#> [176] 8.152781e-01 3.203132e-01 2.422369e-01 7.114620e-01 2.422369e-01
#> [181] 9.976735e-01 2.796106e-01 9.209250e-01 3.203132e-01 1.282302e-01
#> [186] 2.796106e-01 2.422369e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [191] 3.203132e-01 2.796106e-01 9.997484e-01 3.203132e-01 3.203132e-01
#> [196] 8.152781e-01 3.203132e-01 9.460391e-04 3.203132e-01 3.203132e-01
#> [201] 1.515318e-01 3.203132e-01 3.203132e-01 3.203132e-01 8.152781e-01
#> [206] 9.055884e-01 2.422369e-01 3.203132e-01 3.203132e-01 1.166430e-02
#> [211] 3.203132e-01 1.047879e-03 3.203132e-01 1.782021e-01 3.203132e-01
#> [216] 2.422369e-01 3.203132e-01 3.203132e-01 9.999992e-01 2.796106e-01
#> [221] 3.203132e-01 3.203132e-01 3.203132e-01 1.080555e-01 3.203132e-01
#> [226] 3.670119e-03 3.203132e-01 9.209250e-01 1.000000e+00 2.084134e-01
#> [231] 9.072457e-02 6.419218e-04 9.994534e-01 2.796106e-01 3.203132e-01
#> [236] 1.412716e-02 2.796106e-01 1.000000e+00 9.763784e-01 2.796106e-01
#> [241] 2.422369e-01 3.203132e-01 2.084134e-01 3.203132e-01 3.203132e-01
#> [246] 1.080555e-01 9.958433e-01 9.688081e-01 9.994534e-01 1.148420e-03
#> [251] 7.401641e-06 3.203132e-01 2.422369e-01 3.670119e-03 2.068779e-02
#> [256] 9.209250e-01 3.203132e-01 2.422369e-01 1.282302e-01 2.796106e-01
#> [261] 7.593669e-02 7.593669e-02 3.203132e-01 2.084134e-01 2.796106e-01
#> [266] 9.925840e-01 3.024697e-03 3.203132e-01 6.339115e-02 2.796106e-01
#> [271] 3.203132e-01 3.203132e-01 3.643451e-02 6.339115e-02 9.055884e-01
#> [276] 3.203132e-01 1.000000e+00 3.203132e-01 2.796106e-01 8.427374e-01
#> [281] 3.203132e-01 3.203132e-01 2.084134e-01 4.642031e-09 2.371364e-05
#> [286] 2.084134e-01 2.422369e-01 8.876402e-01 3.203132e-01 3.203132e-01
#> [291] 7.842516e-01 2.796106e-01 3.203132e-01 2.422369e-01 9.999995e-01
#> [296] 9.209250e-01 2.422369e-01 9.209250e-01 1.257362e-07 2.068779e-02
#> [301] 9.889906e-01 3.203132e-01 2.615142e-01 1.282302e-01 9.072457e-02
#> [306] 1.515318e-01 9.993364e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [311] 4.389517e-02 3.203132e-01 3.203132e-01 3.203132e-01 3.203132e-01
#> [316] 3.203132e-01 3.203132e-01 8.152781e-01 3.203132e-01 3.203132e-01
#> [321] 3.203132e-01 2.796106e-01 3.203132e-01 2.796106e-01 2.796106e-01
#> [326] 5.315211e-01 3.203132e-01 9.938192e-01 2.796106e-01 2.796106e-01
#> [331] 7.496143e-01 4.817920e-15 3.203132e-01 1.000000e+00 2.796106e-01
#> [336] 3.203132e-01 2.422369e-01 2.796106e-01 3.203132e-01 8.667819e-01
#> [341] 1.000000e+00 4.839647e-02 9.072457e-02
#> 
#> $lambda1
#> [1] 0.2007557
#> 
#> $lambda2
#> [1] 0.006691134
#> 
#> $pi
#> [1] 0.363928
#> 
#> $log_lik
#> [1] -248.0243

We can also quickly visualise the difference.

# a custom helper function for plotting
plot_feature_scatter <- function(coords, mat, feature, size = 0.3, output_filename = NULL) {
  # Create a data frame from the coordinates and features
  plot_data <- data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,])
  
  # Create the scatterplot
  p <- ggplot(plot_data, aes(x = x, y = y, color = feature)) +
    geom_point(size = size, alpha = 0.5) +  # Make the dots smaller
    theme_minimal() +
    labs(title = feature, x = "X Coordinate", y = "Y Coordinate", color = "Feature") +
    theme(legend.position = "right") +
    scale_color_viridis_c()  # Use a color palette with higher contrast
  
  # Save the plot if an output filename is provided
  if (!is.null(output_filename)) {
    ggsave(output_filename, p, width = 10, height = 8, units = "in", dpi = 300)
  }
  
  # Return the plot
  return(p)
}
orig <- plot_feature_scatter(coords = spatialCoords(spe), log(assay(spe)+1), feature = "EPAS1") + ggtitle('Original')
adj <- plot_feature_scatter(coords = spatialCoords(spe), log(result$adjusted_counts+1), feature = "EPAS1") + ggtitle('DenoIST adjusted')
#> Warning in data.frame(x = coords[, 1], y = coords[, 2], feature = mat[feature,
#> : row names were found from a short variable and have been discarded
(orig + adj) + plot_layout(guides = "collect")

Session information

sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] patchwork_1.3.2             ggplot2_4.0.2              
#>  [3] SpatialExperiment_1.21.0    SingleCellExperiment_1.33.0
#>  [5] SummarizedExperiment_1.41.1 Biobase_2.71.0             
#>  [7] GenomicRanges_1.63.1        Seqinfo_1.1.0              
#>  [9] IRanges_2.45.0              S4Vectors_0.49.0           
#> [11] BiocGenerics_0.57.0         generics_0.1.4             
#> [13] MatrixGenerics_1.23.0       matrixStats_1.5.0          
#> [15] DenoIST_0.99.2              BiocStyle_2.39.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6             rjson_0.2.23             xfun_0.56               
#>  [4] bslib_0.10.0             lattice_0.22-9           vctrs_0.7.1             
#>  [7] tools_4.6.0              parallel_4.6.0           flexmix_2.3-20          
#> [10] tibble_3.3.1             pkgconfig_2.0.3          Matrix_1.7-4            
#> [13] RColorBrewer_1.1-3       S7_0.2.1                 sparseMatrixStats_1.23.0
#> [16] assertthat_0.2.1         lifecycle_1.0.5          compiler_4.6.0          
#> [19] farver_2.1.2             tinytex_0.58             htmltools_0.5.9         
#> [22] sass_0.4.10              yaml_2.3.12              hexbin_1.28.5           
#> [25] pillar_1.11.1            jquerylib_0.1.4          DelayedArray_0.37.0     
#> [28] cachem_1.1.0             magick_2.9.0             abind_1.4-8             
#> [31] tidyselect_1.2.1         digest_0.6.39            purrr_1.2.1             
#> [34] dplyr_1.2.0              bookdown_0.46            labeling_0.4.3          
#> [37] arrow_23.0.0.1           fastmap_1.2.0            grid_4.6.0              
#> [40] cli_3.6.5                SparseArray_1.11.10      magrittr_2.0.4          
#> [43] S4Arrays_1.11.1          dichromat_2.0-0.1        withr_3.0.2             
#> [46] scales_1.4.0             bit64_4.6.0-1            rmarkdown_2.30          
#> [49] XVector_0.51.0           bit_4.6.0                otel_0.2.0              
#> [52] nnet_7.3-20              modeltools_0.2-24        pbapply_1.7-4           
#> [55] evaluate_1.0.5           knitr_1.51               viridisLite_0.4.3       
#> [58] rlang_1.1.7              Rcpp_1.1.1               glue_1.8.0              
#> [61] BiocManager_1.30.27      jsonlite_2.0.0           R6_2.6.1