--- title: "SpatialArtifacts Tutorial" author: - name: Harriet Jiali He affiliation: Johns Hopkins University, Baltimore, MD, United States - name: Stephanie C. Hicks affiliation: Johns Hopkins University, Baltimore, MD, United States date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{SpatialArtifacts Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(SummarizedExperiment) library(SpatialExperiment) library(SpatialArtifacts) library(ggplot2) library(patchwork) library(dplyr) ``` # Introduction `SpatialArtifacts` is an R package that provides a data-driven a two-step workflow to identify, classify, and handle spatial artifacts in spatial transcriptomics data from multiple platforms including **10x Visium** (Standard and HD). Broadly, the idea behind the package is we that combine median absolute deviation (MAD)-based outlier detection with morphological image processing to identify the artifacts. These artifacts, often appearing as areas of low gene/UMI counts or high mitochondrial ratio at tissue edges (**edge artifacts**) or in the interior (**interior artifacts**), can negatively impact downstream analyses. The methods are implemented as an R package within the Bioconductor framework, and is available via `r Biocpkg("SpatialArtifacts")`. In the following, we provide an overview of the functionality in the package and we demonstrate how to apply the package on real-world datasets across different spatial transcriptomics platforms. More details describing the method will be available in our upcoming preprint. # Installation The latest development version can also be installed from the `devel` version of Bioconductor: ```{r, eval=FALSE} BiocManager::install("SpatialArtifacts", version="devel") ``` ```{r, eval=FALSE} install.packages("BiocManager") BiocManager::install("SpatialArtifacts") ``` The latest development version can also be installed from the `devel` version of Bioconductor or from [GitHub](https://github.com/CambridgeCat13/SpatialArtifacts). # Input data format In the examples below, we assume the input data are provided as a `r Biocpkg("SpatialExperiment")` Bioconductor object. In this case, the outputs are stored in the `rowData` of the `SpatialExperiment` object. ## Platform Support `SpatialArtifacts` is designed to work across multiple spatial transcriptomics platforms: - Standard Visium (55µm bins, hexagonal grid): ~5,000 spots per capture area - VisiumHD 16µm (16µm bins, square grid): ~480,000 bins per capture area - VisiumHD 8µm (8µm bins, square grid): ~1,920,000 bins per capture area > **IMPORTANT:** The morphological detection framework automatically adapts to different grid arrangements, but **parameter scaling is critical** for optimal performance across platforms. # Two key steps The core philosophy is a two-step process: **detect, and then classify**. This separates the sensitive task of identifying all potential problem spots from the more nuanced task of deciding what to do with them. ## The detection phase In the first step, we use the `detectEdgeArtifacts()` function. Here, the goal is to identify all spots that could potentially be part of an artifact. How does it work: - **Outlier identification**: Find spots with abnormally low QC metrics (e.g., `sum_gene`) using a Median Absolute Deviation (MAD) threshold. - **Morphological cleaning**: Apply sequential raster-based focal operations through `focal_transformations()`: 1. **3×3 fill** (`my_fill`): Fill spots completely surrounded by outliers 2. **5×5 outline** (`my_outline`): Fill spots outlined by outliers in a larger 16-pixel perimeter 3. **Star pattern** (`my_fill_star`): Fill spots with outliers in all four cardinal directions (N, S, E, W) 4. **Small cluster removal**: Remove isolated normal regions below `min_cluster_size` threshold (default: 40 spots) - Use 8-directional connectivity for connected component analysis - **Cluster detection**: Group these outliers into contiguous "problem areas" (`problemAreas`) - **Edge identification**: Evaluate whether clusters touch tissue boundaries using `clumpEdges()`. For each of the four borders (north, south, east, west), calculates the proportion of boundary spots belonging to each cluster. A cluster is classified as an edge artifact if this proportion meets or exceeds `edge_threshold` (default: 0.75, meaning ≥75% border coverage) on any single border direction The output from this phase will add three *raw* columns to your `spe` object: `_edge`, `_problem_id`, and `_problem_size` ## The classification phase In the second step, we use the `classifyEdgeArtifacts()` function. Here the goal is to take the raw detections from the previous step and apply a clear, hierarchical logic to assign final labels. How does it work: - **Input**: Requires the `spe` object processed by `detectEdgeArtifacts()` - **Hierarchical Classification**: Apply a 2×2 logic system based on **Location** and **Size**: 1. **Location**: Is the artifact an `_edge_artifact` (`TRUE` or `FALSE`) based on the detection step? 2. **Size**: Is the artifact's `_problem_size` larger than `min_spots` (default: `20`)? - **Label Assignment**: This logic produces five intuitive categories: - `"not_artifact"` — High-quality spots - `"large_edge_artifact"` — Large artifact cluster (`> min_spots`) touching the tissue edge - `"small_edge_artifact"` — Small artifact cluster (`≤ min_spots`) touching the tissue edge - `"large_interior_artifact"` — Large artifact cluster (`> min_spots`) located inside the tissue - `"small_interior_artifact"` — Small artifact cluster (`≤ min_spots`) located inside the tissue The output from this phase will add one classification column named `_classification`. # Helpful information on parameters Tuning the parameters lets you adapt the workflow to different tissue types, data quality, and spatial transcriptomics platforms. The package uses a wrapper function that routes to platform-specific implementations. ## Platform selection **CRITICAL FIRST STEP:** Specify your platform using the `platform` parameter in `detectEdgeArtifacts()` function: | Platform | Function Call | Required Parameters | |----------|--------------|---------------------| | **Standard Visium** | `detectEdgeArtifacts(spe, platform="visium", ...)` | (none required) | | **VisiumHD** | `detectEdgeArtifacts(spe, platform="visiumhd", resolution="16um", ...)` | `resolution` ("8um" or "16um") | ## Example use cases ```{r, eval=FALSE} # Standard Visium (55µm hexagonal grid) spe <- detectEdgeArtifacts(spe, platform = "visium", ...) # VisiumHD 16µm (square grid) spe <- detectEdgeArtifacts(spe, platform = "visiumhd", resolution = "16um", ...) # VisiumHD 8µm (square grid) spe <- detectEdgeArtifacts(spe, platform = "visiumhd", resolution = "8um", ...) ``` ## Parameters for `detectEdgeArtifacts()` The wrapper function accepts platform-specific parameters that are routed to the appropriate implementation. ### For all platforms - `platform` (**REQUIRED**) – Character string: `"visium"` or `"visiumhd"` (case insensitive) - Determines which platform-specific function to use - No default value; must be explicitly specified for clarity - `qc_metric` (Default: `"sum_gene"`) – Column name for QC metric used in outlier detection - Common alternatives: `"sum_umi"`, `"detected"`, `"nFeature"` - The function will auto-detect some common variants - `samples` (Default: `"sample_id"`) – Column name for sample identifiers - Each sample is processed independently - `mad_threshold` (Default: 3) – Sensitivity for detecting outliers - Lower values (1.5–2) are more sensitive - Higher values (3–4) are more conservative - `name` (Default: `"edge_artifact"`) – Prefix for output column names - Outputs: `[name]_edge`, `[name]_problem_id`, `[name]_problem_size` - `verbose` (Default: `TRUE`) – Print progress messages - `keep_intermediate` (Default: `FALSE`) – Retain intermediate outlier detection columns ### For standard Visium When `platform = "visium"`, use: - `edge_threshold` (Default: 0.75) – Minimum proportion of a tissue boundary that must be occupied by outlier clusters (collectively) for those clusters to be classified as edge artifacts. **Important Behavior:** The threshold is applied to the **total coverage** of all outlier clusters on each boundary direction (North, South, East, West). If multiple clusters collectively cover ≥ `edge_threshold` of a boundary, **all clusters touching that boundary** are classified as edge artifacts, even if no single cluster meets the threshold individually. **Example Scenario:** - North boundary contains 100 spots total - Cluster A occupies 30 boundary spots (30% coverage) - Cluster B occupies 50 boundary spots (50% coverage) - Combined coverage: 80% ≥ 75% threshold - **Result:** Both Cluster A and Cluster B are classified as edge artifacts **Rationale:** Edge drying artifacts typically affect large, continuous regions along tissue boundaries. Multiple clusters on the same boundary often result from a single underlying technical failure (incomplete permeabilization) and should be treated as a unified artifact rather than independent events. **Tuning Guidance:** - Higher values (0.75–0.90): More conservative, captures only large-scale boundary failures - Lower values (0.40–0.60): More sensitive, may flag smaller boundary-adjacent regions - Very low values (<0.30): Aggressive, may misclassify biological low-expression zones as artifacts - `min_cluster_size` (Default: 40) – Minimum cluster size (in spots) for morphological cleaning during focal transformation steps - Isolated "normal" regions smaller than this threshold within outlier areas will be filled in to create contiguous artifact regions - **For Standard Visium:** 40 spots ≈ 0.12 mm² physical area - `shifted` (Default: `FALSE`) – Apply coordinate adjustment for hexagonal grid alignment - Keep the default FALSE when using array_row/array_col coordinates from Space Ranger, as rasterFromXYZ handles the hexagonal grid spacing automatically. - Only set to TRUE if using a custom coordinate system where odd columns require manual offset correction. - `batch_var` (Default: `"both"`) – Determines grouping for MAD calculation - Options: `"sample_id"`, `"slide"`, or `"both"` - `"both"`: Spots flagged as outliers if below threshold in either sample or slide grouping ### For VisiumHD When `platform = "visiumhd"`, use: - `resolution` (**REQUIRED**) – Character string: `"8um"` or `"16um"` - Specifies the bin size of your VisiumHD data - This is **mandatory** for VisiumHD; the function will error if not provided - Determines conversion from physical units (µm) to bins - `buffer_width_um` (Default: 80) – Buffer zone width in micrometers (physical units) - Defines the edge region where artifacts are expected - Automatically converted to bins based on `resolution`: - At 16µm resolution: 80 µm → 5 bins - At 8µm resolution: 80 µm → 10 bins - **Tuning guidance:** - Increase (100-150 µm) for tissues with larger edge artifacts - Decrease (50-60 µm) for precise edge detection - `min_cluster_area_um2` (Default: 1280) – Minimum cluster area in square micrometers (physical units) - Clusters smaller than this will be filtered out during morphological cleaning - Automatically converted to bins based on `resolution`: - At 16µm resolution: 1280 µm² → 5 bins (16×16 µm per bin) - At 8µm resolution: 1280 µm² → 20 bins (8×8 µm per bin) - **Physical consistency:** Same area threshold gives different bin counts at different resolutions - Default (1280 µm²) was calibrated for 16µm resolution - `col_x` and `col_y` (Default: `"array_col"`, `"array_row"`) – Column names for bin coordinates - **Important:** These should be **bin indices**, not pixel coordinates - Using bin indices is much more memory-efficient than pixel coordinates **Key Difference from Visium**: VisiumHD parameters are specified in **physical units (µm, µm²)** rather than bin counts. This ensures consistency across resolutions while the algorithm handles the bin conversion internally. ## Parameters for `classifyEdgeArtifacts()` The classification step is **platform-independent** but requires appropriate parameter scaling. - `min_spots` (Default: 20) – **CRITICAL PARAMETER:** Threshold (in number of spots/bins) to distinguish `"large"` from `"small"` artifacts **Platform-Specific Scaling Required:** This parameter must be adjusted based on spatial resolution to represent equivalent **physical artifact sizes**: | Platform | Recommended `min_spots` | Physical Area | Scaling Factor | |----------|------------------------|---------------|----------------| | **Standard Visium (55µm)** | `20-40` | ~0.06-0.12 mm² | Baseline (1×) | | **VisiumHD 16µm bins** | `100-200` | ~0.026-0.051 mm² | ~6-10× Visium | | **VisiumHD 8µm bins** | `400-800` | ~0.026-0.051 mm² | ~20-40× Visium | **Automatic Scaling Formula:** ```r min_spots_HD <- min_spots_visium × (55 / bin_size_µm)² # Example: For min_spots = 30 on Standard Visium # VisiumHD 16µm: 30 × (55/16)² ≈ 354 bins # VisiumHD 8µm: 30 × (55/8)² ≈ 1,420 bins ``` **Why scaling matters:** The same physical artifact (e.g., 0.1 mm² edge dryspot) will cover: - Standard Visium: ~33 spots - VisiumHD 16µm: ~390 bins - VisiumHD 8µm: ~1,560 bins Without scaling, large VisiumHD artifacts would be incorrectly classified as "small." - `qc_metric` (Default: `"sum_umi"`) – QC metric column for validation (must exist but not directly used in classification logic) - `samples` (Default: `"sample_id"`) – Sample ID column name - `exclude_slides` (Default: `NULL`) – Vector of slide IDs to exclude from edge classification - Spots on these slides will have edge artifact status set to `FALSE` - `name` (Default: `"edge_artifact"`) – Must match the name used in `detectEdgeArtifacts()` ## **Platform Comparison Summary** | Feature | Standard Visium | VisiumHD | |---------|----------------|----------| | **Grid Type** | Hexagonal | Square | | **Requires `shifted`?** | No (default FALSE) | No (not used) | | **Resolution Parameter** | Not applicable | **Required** (`"8um"` or `"16um"`) | | **Edge Detection Method** | Morphological + boundary coverage | Buffer zone + morphological | | **Parameter Units** | Spot counts | Physical units (µm, µm²) | | **Default `min_spots` (classify)** | 20-40 | 100-200 (16µm), 400-800 (8µm) | | **Typical Dataset Size** | ~5,000 spots | ~480k bins (16µm), ~1.9M bins (8µm) | ## Understanding the output columns After both functions, several columns are added to `colData(spe)`: - **`*_edge`** – Raw detection: Is the spot in a cluster touching the tissue border? (TRUE/FALSE) - **`*_problem_id`** – Raw detection: ID of the problem area. - **`*_problem_size`** – Raw detection: Size (number of spots) of the problem area. - `*_true_edges` — **Intermediate:** Edge status after applying `exclude_slides` (used by `classifyEdgeArtifacts()`). - `*_classification` — **Final classification:** One of `"not_artifact"`, `"large_edge_artifact"`, `"small_edge_artifact"`, `"large_interior_artifact"`, or `"small_interior_artifact"`. # Example: Standard Visium workflow This package includes `spe_vignette`, a lightweight `SpatialExperiment` object derived from a human hippocampus Visium sample. **This vignette will load this raw-like object and run the full `SpatialArtifacts` workflow on it live.** **Note:** To meet package size requirements (< 5MB), this object has been subset (e.g., to coding genes) and sparsified, but **no artifact detection has been run.** We will perform those steps now. ## Data preparation: converting to dense matrix The underlying spatial clustering functions in this package currently require a **dense matrix** to perform coordinate-based calculations. We must first convert the sparse `counts` assay in our `spe_vignette` object to a standard (dense) matrix. ```{r run_workflow, message=TRUE, warning=FALSE} data(spe_vignette) # Loaded data dimensions: dim(spe_vignette) assay(spe_vignette, "counts") <- as.matrix(assay(spe_vignette, "counts")) names(colData(spe_vignette))[names(colData(spe_vignette)) == "sum"] <- "sum_umi" spe_detected <- detectEdgeArtifacts( spe_vignette, platform = "visium", # IMPORTANT: Specify Standard Visium platform qc_metric = "sum_umi", samples = "sample_id", batch_var = "sample_id", mad_threshold = 3, edge_threshold = 0.75, name = "edge_artifact" ) # === RESULTS === table(Edge_Detected = spe_detected$edge_artifact_edge) # Classification with Standard Visium parameters spe_classified <- classifyEdgeArtifacts( spe_detected, min_spots = 20, name = "edge_artifact" ) # === Classification Results === table(spe_classified$edge_artifact_classification) ``` ## VisiumHD Workflow Example For VisiumHD data, the workflow is identical **except for parameter scaling**. Here's a complete example showing how to adapt parameters for VisiumHD: ### VisiumHD 16µm Resolution Example ```{r visiumhd_16um_example, eval=FALSE} # This is a pseudo-example demonstrating VisiumHD 16µm workflow # Assumes you have loaded a VisiumHD SpatialExperiment object as 'spe_hd16' # Step 1: Ensure required QC metrics are calculated library(scuttle) spe_hd16 <- addPerCellQCMetrics(spe_hd16) # Step 2: Detection Phase - VisiumHD uses square grid (no 'shifted' needed) spe_hd16_detected <- detectEdgeArtifacts( spe_hd16, platform = "visiumhd", # Specify VisiumHD platform resolution = "16um", # REQUIRED for VisiumHD qc_metric = "sum_umi", # or "sum" depending on your colData samples = "sample_id", buffer_width_um = 100, # VisiumHD-specific parameter mad_threshold = 2.5, edge_threshold = 0.75, name = "edge_artifact" ) # Step 3: Classification Phase - CRITICAL: Scale min_spots for VisiumHD resolution # For 16µm bins, use ~6-10× the Standard Visium threshold min_spots_16um <- 30 * (55 / 16)^2 # ≈ 354 bins spe_hd16_classified <- classifyEdgeArtifacts( spe_hd16_detected, qc_metric = "sum_umi", min_spots = round(min_spots_16um), # ~350 bins name = "edge_artifact" ) # Visualization (same approach as Standard Visium) table(spe_hd16_classified$edge_artifact_classification) ``` ### VisiumHD 8µm Resolution Example ```{r visiumhd_8um_example, eval=FALSE} # This is a pseudo-example demonstrating VisiumHD 8µm workflow # Assumes you have loaded a VisiumHD 8µm SpatialExperiment object as 'spe_hd8' # Step 1: QC metrics spe_hd8 <- addPerCellQCMetrics(spe_hd8) # Step 2: Detection Phase spe_hd8_detected <- detectEdgeArtifacts( spe_hd8, platform = "visiumhd", # Specify VisiumHD platform resolution = "8um", # REQUIRED: Specify 8µm resolution qc_metric = "sum_umi", samples = "sample_id", buffer_width_um = 100, # Buffer zone in micrometers mad_threshold = 2.5, edge_threshold = 0.75, name = "edge_artifact" ) # Step 3: Classification with 8µm-appropriate threshold # For 8µm bins, use ~20-40× the Standard Visium threshold min_spots_8um <- 30 * (55 / 8)^2 # ≈ 1,420 bins spe_hd8_classified <- classifyEdgeArtifacts( spe_hd8_detected, qc_metric = "sum_umi", min_spots = round(min_spots_8um), # ~1,400 bins name = "edge_artifact" ) table(spe_hd8_classified$edge_artifact_classification) ``` ### Key VisiumHD Considerations **Platform-Specific Function Calls:** | Platform | Function Call | Required Parameters | |----------|--------------|---------------------| | **Standard Visium** | `detectEdgeArtifacts(..., platform="visium")` | (none required) | | **VisiumHD 16µm** | `detectEdgeArtifacts(..., platform="visiumhd", resolution="16um")` | `resolution` | | **VisiumHD 8µm** | `detectEdgeArtifacts(..., platform="visiumhd", resolution="8um")` | `resolution` | **Parameter Recommendations by Platform:** | Parameter | Standard Visium | VisiumHD 16µm | VisiumHD 8µm | |-----------|----------------|---------------|--------------| | `platform` | `"visium"` | `"visiumhd"` | `"visiumhd"` | | `resolution` | N/A (not used) | `"16um"` | `"8um"` | | `shifted` | FALSE (default) | N/A (handled internally) | N/A | | `buffer_width_um` | N/A | `100` (default) | `100` (default) | | `mad_threshold` | 1.5-3.0 | 2.0-3.0 | 2.0-3.0 | | `min_spots` (classify) | 20-40 | 100-200 | 400-800 | | Grid Type | Hexagonal | Square | Square | ## Visualization: QC Metrics and Detection Results We'll create a comprehensive visualization showing QC metrics, detection results, and detailed cluster information: ```{r plot, fig.width=12, fig.height=16} library(SpatialExperiment) library(patchwork) plot_data <- as.data.frame(colData(spe_classified)) plot_data <- cbind(plot_data, as.data.frame(spatialCoords(spe_classified))) plot_data_in_tissue <- plot_data[plot_data$in_tissue, ] base_theme <- theme_void() + theme(plot.title = element_text(size = 12, hjust = 0.5), legend.position = "right") .plt <- \(df, col, fun = \(.) .) { ggplot(df, aes(x = pxl_col_in_fullres, y = pxl_row_in_fullres, col = fun(.data[[col]]))) + geom_point(size = 0.5) + base_theme + coord_fixed() } plot_data_in_tissue$raw_problem <- !is.na(plot_data_in_tissue$edge_artifact_problem_id) plot_data_in_tissue$cluster_display <- plot_data_in_tissue$edge_artifact_problem_id plot_data_in_tissue$artifact_type <- "Normal" plot_data_in_tissue$artifact_type[plot_data_in_tissue$edge_artifact_edge] <- "Edge Artifact" plot_data_in_tissue$artifact_type[!is.na(plot_data_in_tissue$edge_artifact_problem_id) & !plot_data_in_tissue$edge_artifact_edge] <- "Interior Problem" p1 <- .plt(plot_data_in_tissue, "sum_umi", \(.) log10(.+1)) + scale_color_viridis_c(name = "log10(UMI)") + ggtitle("A. UMI Counts") p2 <- .plt(plot_data_in_tissue, "detected") + scale_color_viridis_c(name = "Genes", option = "plasma") + ggtitle("B. Detected Genes") p3 <- .plt(plot_data_in_tissue, "raw_problem") + scale_color_manual(values = c("FALSE" = "lightgray", "TRUE" = "red"), name = "Problem?") + ggtitle("C. Raw Detection") p4 <- .plt(plot_data_in_tissue, "cluster_display") + scale_color_discrete(name = "Cluster ID", na.value = "lightgray") + ggtitle("D. Problem Area Clusters") + theme(legend.key.size = unit(0.3, "cm"), legend.text = element_text(size = 8)) p5 <- .plt(plot_data_in_tissue, "artifact_type") + scale_color_manual(values = c("Normal" = "lightgray", "Edge Artifact" = "red", "Interior Problem" = "blue"), name = "Type") + ggtitle("E. Edge vs Interior") p6 <- .plt(plot_data_in_tissue, "edge_artifact_classification") + scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50") + ggtitle("F. Hierarchical Classification") (p1 | p2) / (p3 | p4) / (p5 | p6) ``` ## Classification Summary Let's examine the enhanced classification system: ### Final Classification Summary ```{r classification, echo=FALSE} # Use the 'spe_classified' object we created in the step above final_summary <- table(spe_classified$edge_artifact_classification) final_pct <- round(100 * final_summary / sum(final_summary), 2) final_df <- data.frame( Classification = names(final_summary), Count = as.numeric(final_summary), Percentage = paste0(as.numeric(final_pct), "%") ) knitr::kable(final_df, caption = "Classification Breakdown") ``` ### Raw Edge Detection Summary ```{r raw_edge, echo=FALSE} # Use the 'spe_classified' object and the new column name edge_summary <- table(spe_classified$edge_artifact_edge) edge_pct <- round(100 * edge_summary / sum(edge_summary), 2) edge_df <- data.frame( Flagged_As_Edge = names(edge_summary), Count = as.numeric(edge_summary), Percentage = paste0(as.numeric(edge_pct), "%") ) knitr::kable(edge_df, caption = "Raw Detection Breakdown") ``` ## Quality Control Validation Finally, let's validate that flagged spots have lower quality metrics: ```{r validation, echo=FALSE, warning=FALSE} in_tissue_data <- spe_classified[, spe_classified$in_tissue] qc_data <- data.frame( sum_umi = in_tissue_data$sum_umi, detected_genes = in_tissue_data$detected, flagged = in_tissue_data$edge_artifact_edge ) flagged_umi <- median(qc_data$sum_umi[qc_data$flagged], na.rm = TRUE) nonflagged_umi <- median(qc_data$sum_umi[!qc_data$flagged], na.rm = TRUE) flagged_gene <- median(qc_data$detected_genes[qc_data$flagged], na.rm = TRUE) nonflagged_gene <- median(qc_data$detected_genes[!qc_data$flagged], na.rm = TRUE) qc_summary <- data.frame( Metric = c("Median UMI", "Median Detected Genes"), `Flagged (Edge)` = c(round(flagged_umi), round(flagged_gene)), `Non-flagged` = c(round(nonflagged_umi), round(nonflagged_gene)), Difference = c(round(nonflagged_umi - flagged_umi), round(nonflagged_gene - flagged_gene)), check.names = FALSE ) knitr::kable(qc_summary, caption = "QC Validation: Flagged vs Non-flagged Spots") qc_data$flag_status <- ifelse(qc_data$flagged, "Flagged (Raw Edge)", "Non-Flagged") validation_plot <- ggplot(qc_data, aes(x = flag_status, y = log10(sum_umi + 1), fill = flag_status)) + geom_boxplot() + scale_fill_manual(values = c("Flagged (Raw Edge)" = "lightcoral", "Non-Flagged" = "lightblue")) + labs( title = "QC Validation: UMI Counts in Raw Edge vs Non-Edge Spots", x = "Raw Edge Detection Status", y = "log10(UMI Count + 1)" ) + theme_minimal() print(validation_plot) ``` ## Filtering Out Problematic Spots (Optional) Based on the new classifications, users can make informed decisions about filtering. For example, you might decide to remove all **edge artifacts** (both large and small) while keeping **interior artifacts** for further review. Here's how you can filter the `SpatialExperiment` object to remove all spots classified as both `"large_edge_artifact"` or `"small_edge_artifact"`: ```{r filtering} if ("edge_artifact_classification" %in% names(colData(spe_classified))) { spots_to_keep <- !spe_classified$edge_artifact_classification %in% c("large_edge_artifact", "small_edge_artifact") spe_filtered <- spe_classified[, spots_to_keep] message("Original number of spots: ", ncol(spe_classified)) message("Number of spots after filtering: ", ncol(spe_filtered)) } else { message("Classification column not found. Filtering step skipped.") } ``` ```{r plotFiltering, fig.width=12, fig.height=6, message=FALSE, warning=FALSE} plot_data_before <- as.data.frame(colData(spe_classified)) plot_data_before <- cbind(plot_data_before, as.data.frame(spatialCoords(spe_classified))) plot_data_before_in_tissue <- plot_data_before[plot_data_before$in_tissue, ] plot_data_after <- as.data.frame(colData(spe_filtered)) if (ncol(spe_filtered) > 0) { plot_data_after <- cbind(plot_data_after, as.data.frame(spatialCoords(spe_filtered))) } p1_umi_before <- .plt(plot_data_before_in_tissue, "sum_umi", \(.) log10(.+1)) + scale_color_viridis_c(name = "log10(UMI)") + ggtitle("UMI Counts (Before Filtering)") p3_class_before <- .plt(plot_data_before_in_tissue, "edge_artifact_classification") + scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50", drop = FALSE) + ggtitle("Classification (Before Filtering)") if (ncol(spe_filtered) > 0) { p2_umi_after <- .plt(plot_data_after, "sum_umi", \(.) log10(.+1)) + scale_color_viridis_c(name = "log10(UMI)") + ggtitle("UMI Counts (After Filtering)") p4_class_after <- .plt(plot_data_after, "edge_artifact_classification") + scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50", drop = FALSE) + ggtitle("Classification (After Filtering)") } else { p2_umi_after <- ggplot() + theme_void() + ggtitle("UMI Counts (After Filtering - No Spots)") p4_class_after <- ggplot() + theme_void() + ggtitle("Classification (After Filtering - No Spots)") } combined_filtering_plot_2x2 <- (p1_umi_before | p2_umi_after) / (p3_class_before | p4_class_after) print(combined_filtering_plot_2x2) ``` ## Conclusion This vignette demonstrated the **SpatialArtifacts** workflow across multiple spatial transcriptomics platforms. Specifically, it showed: - **Standard Visium workflow:** Complete example using the included hippocampus dataset - **VisiumHD adaptation:** How to properly scale parameters for 16µm and 8µm resolution data - **Platform-specific considerations:** Grid arrangement, parameter scaling formulas, and typical artifact sizes - **Visualization:** Display hierarchical classification results alongside QC metrics - **Classification logic:** Distinguish between edge vs. interior and large vs. small artifacts - **Validation:** Flagged spots show significantly lower molecular capture **Key Takeaways for Multi-Platform Usage:** 1. **Grid Structure:** Standard Visium uses hexagonal grids (shifted = FALSE by default), while VisiumHD uses square grids (shifted parameter not used) 2. **Critical Parameter Scaling:** The `min_spots` threshold in `classifyEdgeArtifacts()` must scale with spatial resolution: - Use the formula: `min_spots_HD = min_spots_visium × (55 / bin_size)²` - Without scaling, artifacts will be incorrectly classified by size 3. **Morphological Framework:** The same detection logic works across platforms, automatically adapting to different grid arrangements 4. **Physical Consistency:** Scaled parameters ensure that "large" and "small" artifact classifications represent equivalent physical sizes regardless of platform Overall, **SpatialArtifacts** provides a unified, platform-agnostic framework for detecting and classifying spatial artifacts, enabling consistent quality control across the evolving spatial transcriptomics technology landscape. ## Session Information ```{r session_info} sessionInfo() ```