---
title: "SpatialArtifacts Tutorial"
author: 
  - name: Harriet Jiali He
    affiliation: Johns Hopkins University, Baltimore, MD, United States
  - name: Stephanie C. Hicks
    affiliation: Johns Hopkins University, Baltimore, MD, United States
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{SpatialArtifacts Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(SummarizedExperiment)
library(SpatialExperiment)
library(SpatialArtifacts)
library(ggplot2)
library(patchwork)
library(dplyr)
```

# Introduction

`SpatialArtifacts` is an R package that provides a data-driven a two-step workflow to identify, classify, and handle spatial artifacts in spatial transcriptomics data from multiple platforms including **10x Visium** (Standard and HD). Broadly, the idea behind the package is we that combine median absolute deviation (MAD)-based outlier detection with morphological image processing to identify the artifacts. These artifacts, often appearing as areas of low gene/UMI counts or high mitochondrial ratio at tissue edges (**edge artifacts**) or in the interior (**interior artifacts**), can negatively impact downstream analyses. The methods are implemented as an R package within the Bioconductor framework, and is available via `r Biocpkg("SpatialArtifacts")`.

In the following, we provide an overview of the functionality in the package and we demonstrate how to apply the package on real-world datasets across different spatial transcriptomics platforms.

More details describing the method will be available in our upcoming preprint.

# Installation
The latest development version can also be installed from the `devel` 
version of Bioconductor:
```{r, eval=FALSE}
BiocManager::install("SpatialArtifacts", version="devel")
```

```{r, eval=FALSE}
install.packages("BiocManager")
BiocManager::install("SpatialArtifacts")
```

The latest development version can also be installed from the `devel` version of Bioconductor or from [GitHub](https://github.com/CambridgeCat13/SpatialArtifacts).


# Input data format

In the examples below, we assume the input data are provided as a `r Biocpkg("SpatialExperiment")` Bioconductor object. In this case, the outputs are stored in the `rowData` of the `SpatialExperiment` object.

## Platform Support

`SpatialArtifacts` is designed to work across multiple spatial transcriptomics platforms:

- Standard Visium (55µm bins, hexagonal grid): ~5,000 spots per capture area
- VisiumHD 16µm (16µm bins, square grid): ~480,000 bins per capture area  
- VisiumHD 8µm (8µm bins, square grid): ~1,920,000 bins per capture area

> **IMPORTANT:** The morphological detection framework automatically adapts to different grid arrangements, but **parameter scaling is critical** for optimal performance across platforms.

# Two key steps 

The core philosophy is a two-step process: **detect, and then classify**. This separates the sensitive task of identifying all potential problem spots from the more nuanced task of deciding what to do with them.

## The detection phase

In the first step, we use the `detectEdgeArtifacts()` function. Here, the goal is to identify all spots that could potentially be part of an artifact.

How does it work:

- **Outlier identification**: Find spots with abnormally low QC metrics (e.g., `sum_gene`) using a Median Absolute Deviation (MAD) threshold.  
- **Morphological cleaning**: Apply sequential raster-based focal operations through `focal_transformations()`:
  1. **3×3 fill** (`my_fill`): Fill spots completely surrounded by outliers
  2. **5×5 outline** (`my_outline`): Fill spots outlined by outliers in a larger 16-pixel perimeter
  3. **Star pattern** (`my_fill_star`): Fill spots with outliers in all four cardinal directions (N, S, E, W)
  4. **Small cluster removal**: Remove isolated normal regions below `min_cluster_size` threshold (default: 40 spots)
  - Use 8-directional connectivity for connected component analysis
- **Cluster detection**: Group these outliers into contiguous "problem areas" (`problemAreas`)
- **Edge identification**: Evaluate whether clusters touch tissue boundaries using `clumpEdges()`. For each of the four borders (north, south, east, west), calculates the proportion of boundary spots belonging to each cluster. A cluster is classified as an edge artifact if this proportion meets or exceeds `edge_threshold` (default: 0.75, meaning ≥75% border coverage) on any single border direction

The output from this phase will add three *raw* columns to your `spe` object: `_edge`, `_problem_id`, and `_problem_size`


## The classification phase

In the second step, we use the `classifyEdgeArtifacts()` function. Here the goal is to take the raw detections from the previous step and apply a clear, hierarchical logic to assign final labels.

How does it work:

- **Input**: Requires the `spe` object processed by `detectEdgeArtifacts()`  
- **Hierarchical Classification**: Apply a 2×2 logic system based on **Location** and **Size**:
  1. **Location**: Is the artifact an `_edge_artifact` (`TRUE` or `FALSE`) based on the detection step?  
  2. **Size**: Is the artifact's `_problem_size` larger than `min_spots` (default: `20`)?  
- **Label Assignment**: This logic produces five intuitive categories:
  - `"not_artifact"` — High-quality spots  
  - `"large_edge_artifact"` — Large artifact cluster (`> min_spots`) touching the tissue edge  
  - `"small_edge_artifact"` — Small artifact cluster (`≤ min_spots`) touching the tissue edge  
  - `"large_interior_artifact"` — Large artifact cluster (`> min_spots`) located inside the tissue  
  - `"small_interior_artifact"` — Small artifact cluster (`≤ min_spots`) located inside the tissue  

The output from this phase will add one classification column named `_classification`.


# Helpful information on parameters

Tuning the parameters lets you adapt the workflow to different tissue types, data quality, and spatial transcriptomics platforms. The package uses a wrapper function that routes to platform-specific implementations.

## Platform selection

**CRITICAL FIRST STEP:** Specify your platform using the `platform` parameter in `detectEdgeArtifacts()` function:

| Platform | Function Call | Required Parameters |
|----------|--------------|---------------------|
| **Standard Visium** | `detectEdgeArtifacts(spe, platform="visium", ...)` | (none required) |
| **VisiumHD** | `detectEdgeArtifacts(spe, platform="visiumhd", resolution="16um", ...)` | `resolution` ("8um" or "16um") |

## Example use cases 

```{r, eval=FALSE}
# Standard Visium (55µm hexagonal grid)
spe <- detectEdgeArtifacts(spe, platform = "visium", ...)

# VisiumHD 16µm (square grid)
spe <- detectEdgeArtifacts(spe, platform = "visiumhd", resolution = "16um", ...)

# VisiumHD 8µm (square grid)
spe <- detectEdgeArtifacts(spe, platform = "visiumhd", resolution = "8um", ...)
```


## Parameters for `detectEdgeArtifacts()` 

The wrapper function accepts platform-specific parameters that are routed to the appropriate implementation.

### For all platforms

- `platform` (**REQUIRED**) – Character string: `"visium"` or `"visiumhd"` (case insensitive)
  - Determines which platform-specific function to use
  - No default value; must be explicitly specified for clarity

- `qc_metric` (Default: `"sum_gene"`) – Column name for QC metric used in outlier detection
  - Common alternatives: `"sum_umi"`, `"detected"`, `"nFeature"`
  - The function will auto-detect some common variants

- `samples` (Default: `"sample_id"`) – Column name for sample identifiers
  - Each sample is processed independently

- `mad_threshold` (Default: 3) – Sensitivity for detecting outliers
  - Lower values (1.5–2) are more sensitive
  - Higher values (3–4) are more conservative

- `name` (Default: `"edge_artifact"`) – Prefix for output column names
  - Outputs: `[name]_edge`, `[name]_problem_id`, `[name]_problem_size`

- `verbose` (Default: `TRUE`) – Print progress messages

- `keep_intermediate` (Default: `FALSE`) – Retain intermediate outlier detection columns


### For standard Visium

When `platform = "visium"`, use: 

- `edge_threshold` (Default: 0.75) – Minimum proportion of a tissue boundary that must be occupied by outlier clusters (collectively) for those clusters to be classified as edge artifacts.

  **Important Behavior:** The threshold is applied to the **total coverage** of all outlier clusters on each boundary direction (North, South, East, West). If multiple clusters collectively cover ≥ `edge_threshold` of a boundary, **all clusters touching that boundary** are classified as edge artifacts, even if no single cluster meets the threshold individually.
  
  **Example Scenario:**
  - North boundary contains 100 spots total
  - Cluster A occupies 30 boundary spots (30% coverage)
  - Cluster B occupies 50 boundary spots (50% coverage)
  - Combined coverage: 80% ≥ 75% threshold
  - **Result:** Both Cluster A and Cluster B are classified as edge artifacts
  
  **Rationale:** Edge drying artifacts typically affect large, continuous regions along tissue boundaries. Multiple clusters on the same boundary often result from a single underlying technical failure (incomplete permeabilization) and should be treated as a unified artifact rather than independent events.
  
  **Tuning Guidance:**
  - Higher values (0.75–0.90): More conservative, captures only large-scale boundary failures
  - Lower values (0.40–0.60): More sensitive, may flag smaller boundary-adjacent regions
  - Very low values (<0.30): Aggressive, may misclassify biological low-expression zones as artifacts

- `min_cluster_size` (Default: 40) – Minimum cluster size (in spots) for morphological cleaning during focal transformation steps
  - Isolated "normal" regions smaller than this threshold within outlier areas will be filled in to create contiguous artifact regions
  - **For Standard Visium:** 40 spots ≈ 0.12 mm² physical area

- `shifted` (Default: `FALSE`) – Apply coordinate adjustment for hexagonal grid alignment
  - Keep the default FALSE when using array_row/array_col coordinates from Space Ranger, as rasterFromXYZ handles the hexagonal grid spacing automatically.
  - Only set to TRUE if using a custom coordinate system where odd columns require manual offset correction.

  
- `batch_var` (Default: `"both"`) – Determines grouping for MAD calculation
  - Options: `"sample_id"`, `"slide"`, or `"both"`
  - `"both"`: Spots flagged as outliers if below threshold in either sample or slide grouping

 
### For VisiumHD

When `platform = "visiumhd"`, use: 

- `resolution` (**REQUIRED**) – Character string: `"8um"` or `"16um"`
  - Specifies the bin size of your VisiumHD data
  - This is **mandatory** for VisiumHD; the function will error if not provided
  - Determines conversion from physical units (µm) to bins

- `buffer_width_um` (Default: 80) – Buffer zone width in micrometers (physical units)
  - Defines the edge region where artifacts are expected
  - Automatically converted to bins based on `resolution`:
    - At 16µm resolution: 80 µm → 5 bins
    - At 8µm resolution: 80 µm → 10 bins
  - **Tuning guidance:**
    - Increase (100-150 µm) for tissues with larger edge artifacts
    - Decrease (50-60 µm) for precise edge detection

- `min_cluster_area_um2` (Default: 1280) – Minimum cluster area in square micrometers (physical units)
  - Clusters smaller than this will be filtered out during morphological cleaning
  - Automatically converted to bins based on `resolution`:
    - At 16µm resolution: 1280 µm² → 5 bins (16×16 µm per bin)
    - At 8µm resolution: 1280 µm² → 20 bins (8×8 µm per bin)
  - **Physical consistency:** Same area threshold gives different bin counts at different resolutions
  - Default (1280 µm²) was calibrated for 16µm resolution

- `col_x` and `col_y` (Default: `"array_col"`, `"array_row"`) – Column names for bin coordinates
  - **Important:** These should be **bin indices**, not pixel coordinates
  - Using bin indices is much more memory-efficient than pixel coordinates


**Key Difference from Visium**: VisiumHD parameters are specified in **physical units (µm, µm²)** rather than bin counts. This ensures consistency across resolutions while the algorithm handles the bin conversion internally.


## Parameters for `classifyEdgeArtifacts()`

The classification step is **platform-independent** but requires appropriate parameter scaling.

- `min_spots` (Default: 20) – **CRITICAL PARAMETER:** Threshold (in number of spots/bins) to distinguish `"large"` from `"small"` artifacts

  **Platform-Specific Scaling Required:**
  
  This parameter must be adjusted based on spatial resolution to represent equivalent **physical artifact sizes**:
  
  | Platform | Recommended `min_spots` | Physical Area | Scaling Factor |
  |----------|------------------------|---------------|----------------|
  | **Standard Visium (55µm)** | `20-40` | ~0.06-0.12 mm² | Baseline (1×) |
  | **VisiumHD 16µm bins** | `100-200` | ~0.026-0.051 mm² | ~6-10× Visium |
  | **VisiumHD 8µm bins** | `400-800` | ~0.026-0.051 mm² | ~20-40× Visium |
  
  **Automatic Scaling Formula:**
```r
  min_spots_HD <- min_spots_visium × (55 / bin_size_µm)²
  
  # Example: For min_spots = 30 on Standard Visium
  # VisiumHD 16µm: 30 × (55/16)² ≈ 354 bins
  # VisiumHD 8µm:  30 × (55/8)²  ≈ 1,420 bins
```
  
  **Why scaling matters:** The same physical artifact (e.g., 0.1 mm² edge dryspot) will cover:
  - Standard Visium: ~33 spots
  - VisiumHD 16µm: ~390 bins  
  - VisiumHD 8µm: ~1,560 bins
  
  Without scaling, large VisiumHD artifacts would be incorrectly classified as "small."

- `qc_metric` (Default: `"sum_umi"`) – QC metric column for validation (must exist but not directly used in classification logic)

- `samples` (Default: `"sample_id"`) – Sample ID column name

- `exclude_slides` (Default: `NULL`) – Vector of slide IDs to exclude from edge classification
  - Spots on these slides will have edge artifact status set to `FALSE`

- `name` (Default: `"edge_artifact"`) – Must match the name used in `detectEdgeArtifacts()`


## **Platform Comparison Summary**

| Feature | Standard Visium | VisiumHD |
|---------|----------------|----------|
| **Grid Type** | Hexagonal | Square |
| **Requires `shifted`?** | No (default FALSE) | No (not used) |
| **Resolution Parameter** | Not applicable | **Required** (`"8um"` or `"16um"`) |
| **Edge Detection Method** | Morphological + boundary coverage | Buffer zone + morphological |
| **Parameter Units** | Spot counts | Physical units (µm, µm²) |
| **Default `min_spots` (classify)** | 20-40 | 100-200 (16µm), 400-800 (8µm) |
| **Typical Dataset Size** | ~5,000 spots | ~480k bins (16µm), ~1.9M bins (8µm) |


## Understanding the output columns

After both functions, several columns are added to `colData(spe)`:

- **`*_edge`** – Raw detection: Is the spot in a cluster touching the tissue border? (TRUE/FALSE)  
- **`*_problem_id`** – Raw detection: ID of the problem area.  
- **`*_problem_size`** – Raw detection: Size (number of spots) of the problem area.  
- `*_true_edges` — **Intermediate:** Edge status after applying `exclude_slides` (used by `classifyEdgeArtifacts()`).
- `*_classification` — **Final classification:** One of `"not_artifact"`, `"large_edge_artifact"`, `"small_edge_artifact"`, `"large_interior_artifact"`, or `"small_interior_artifact"`.


# Example: Standard Visium workflow

This package includes `spe_vignette`, a lightweight `SpatialExperiment` object derived from a human hippocampus Visium sample.

**This vignette will load this raw-like object and run the full `SpatialArtifacts` workflow on it live.**

**Note:** To meet package size requirements (< 5MB), this object has been subset (e.g., to coding genes) and sparsified, but **no artifact detection has been run.** We will perform those steps now.

## Data preparation: converting to dense matrix

The underlying spatial clustering functions in this package currently require a **dense matrix** to perform coordinate-based calculations. We must first convert the sparse `counts` assay in our `spe_vignette` object to a standard (dense) matrix.
```{r run_workflow, message=TRUE, warning=FALSE}
data(spe_vignette)
# Loaded data dimensions:
dim(spe_vignette)

assay(spe_vignette, "counts") <- as.matrix(assay(spe_vignette, "counts"))
names(colData(spe_vignette))[names(colData(spe_vignette)) == "sum"] <- "sum_umi"

spe_detected <- detectEdgeArtifacts(
  spe_vignette,
  platform = "visium", # IMPORTANT: Specify Standard Visium platform
  qc_metric = "sum_umi",
  samples = "sample_id",
  batch_var = "sample_id",
  mad_threshold = 3,
  edge_threshold = 0.75,
  name = "edge_artifact"
)

# === RESULTS ===
table(Edge_Detected = spe_detected$edge_artifact_edge)

# Classification with Standard Visium parameters
spe_classified <- classifyEdgeArtifacts(
  spe_detected,
  min_spots = 20,
  name = "edge_artifact"
)

# === Classification Results ===
table(spe_classified$edge_artifact_classification)
```

## VisiumHD Workflow Example

For VisiumHD data, the workflow is identical **except for parameter scaling**. Here's a complete example showing how to adapt parameters for VisiumHD:

### VisiumHD 16µm Resolution Example
```{r visiumhd_16um_example, eval=FALSE}
# This is a pseudo-example demonstrating VisiumHD 16µm workflow
# Assumes you have loaded a VisiumHD SpatialExperiment object as 'spe_hd16'

# Step 1: Ensure required QC metrics are calculated
library(scuttle)
spe_hd16 <- addPerCellQCMetrics(spe_hd16)

# Step 2: Detection Phase - VisiumHD uses square grid (no 'shifted' needed)
spe_hd16_detected <- detectEdgeArtifacts(
  spe_hd16,
  platform = "visiumhd", # Specify VisiumHD platform
  resolution = "16um", # REQUIRED for VisiumHD
  qc_metric = "sum_umi", # or "sum" depending on your colData
  samples = "sample_id",
  buffer_width_um = 100, # VisiumHD-specific parameter
  mad_threshold = 2.5,
  edge_threshold = 0.75,
  name = "edge_artifact"
)

# Step 3: Classification Phase - CRITICAL: Scale min_spots for VisiumHD resolution
# For 16µm bins, use ~6-10× the Standard Visium threshold
min_spots_16um <- 30 * (55 / 16)^2 # ≈ 354 bins

spe_hd16_classified <- classifyEdgeArtifacts(
  spe_hd16_detected,
  qc_metric = "sum_umi",
  min_spots = round(min_spots_16um), # ~350 bins
  name = "edge_artifact"
)

# Visualization (same approach as Standard Visium)
table(spe_hd16_classified$edge_artifact_classification)
```

### VisiumHD 8µm Resolution Example
```{r visiumhd_8um_example, eval=FALSE}
# This is a pseudo-example demonstrating VisiumHD 8µm workflow
# Assumes you have loaded a VisiumHD 8µm SpatialExperiment object as 'spe_hd8'

# Step 1: QC metrics
spe_hd8 <- addPerCellQCMetrics(spe_hd8)

# Step 2: Detection Phase
spe_hd8_detected <- detectEdgeArtifacts(
  spe_hd8,
  platform = "visiumhd", # Specify VisiumHD platform
  resolution = "8um", # REQUIRED: Specify 8µm resolution
  qc_metric = "sum_umi",
  samples = "sample_id",
  buffer_width_um = 100, # Buffer zone in micrometers
  mad_threshold = 2.5,
  edge_threshold = 0.75,
  name = "edge_artifact"
)

# Step 3: Classification with 8µm-appropriate threshold
# For 8µm bins, use ~20-40× the Standard Visium threshold
min_spots_8um <- 30 * (55 / 8)^2 # ≈ 1,420 bins

spe_hd8_classified <- classifyEdgeArtifacts(
  spe_hd8_detected,
  qc_metric = "sum_umi",
  min_spots = round(min_spots_8um), # ~1,400 bins
  name = "edge_artifact"
)

table(spe_hd8_classified$edge_artifact_classification)
```

### Key VisiumHD Considerations

**Platform-Specific Function Calls:**

| Platform | Function Call | Required Parameters |
|----------|--------------|---------------------|
| **Standard Visium** | `detectEdgeArtifacts(..., platform="visium")` | (none required) |
| **VisiumHD 16µm** | `detectEdgeArtifacts(..., platform="visiumhd", resolution="16um")` | `resolution` |
| **VisiumHD 8µm** | `detectEdgeArtifacts(..., platform="visiumhd", resolution="8um")` | `resolution` |

**Parameter Recommendations by Platform:**

| Parameter | Standard Visium | VisiumHD 16µm | VisiumHD 8µm |
|-----------|----------------|---------------|--------------|
| `platform` | `"visium"` | `"visiumhd"` | `"visiumhd"` |
| `resolution` | N/A (not used) | `"16um"` | `"8um"` |
| `shifted` |  FALSE (default) | N/A (handled internally) | N/A |
| `buffer_width_um` | N/A | `100` (default) | `100` (default) |
| `mad_threshold` | 1.5-3.0 | 2.0-3.0 | 2.0-3.0 |
| `min_spots` (classify) | 20-40 | 100-200 | 400-800 |
| Grid Type | Hexagonal | Square | Square |


## Visualization: QC Metrics and Detection Results

We'll create a comprehensive visualization showing QC metrics, detection results, and detailed cluster information:
```{r plot, fig.width=12, fig.height=16}
library(SpatialExperiment)
library(patchwork)

plot_data <- as.data.frame(colData(spe_classified))
plot_data <- cbind(plot_data, as.data.frame(spatialCoords(spe_classified)))
plot_data_in_tissue <- plot_data[plot_data$in_tissue, ]

base_theme <- theme_void() +
  theme(plot.title = element_text(size = 12, hjust = 0.5), legend.position = "right")

.plt <- \(df, col, fun = \(.) .) {
  ggplot(df, aes(x = pxl_col_in_fullres, y = pxl_row_in_fullres, col = fun(.data[[col]]))) +
    geom_point(size = 0.5) +
    base_theme +
    coord_fixed()
}

plot_data_in_tissue$raw_problem <- !is.na(plot_data_in_tissue$edge_artifact_problem_id)
plot_data_in_tissue$cluster_display <- plot_data_in_tissue$edge_artifact_problem_id
plot_data_in_tissue$artifact_type <- "Normal"
plot_data_in_tissue$artifact_type[plot_data_in_tissue$edge_artifact_edge] <- "Edge Artifact"
plot_data_in_tissue$artifact_type[!is.na(plot_data_in_tissue$edge_artifact_problem_id) & !plot_data_in_tissue$edge_artifact_edge] <- "Interior Problem"

p1 <- .plt(plot_data_in_tissue, "sum_umi", \(.) log10(.+1)) + 
  scale_color_viridis_c(name = "log10(UMI)") + ggtitle("A. UMI Counts")

p2 <- .plt(plot_data_in_tissue, "detected") + 
  scale_color_viridis_c(name = "Genes", option = "plasma") + ggtitle("B. Detected Genes")

p3 <- .plt(plot_data_in_tissue, "raw_problem") + 
  scale_color_manual(values = c("FALSE" = "lightgray", "TRUE" = "red"), name = "Problem?") + ggtitle("C. Raw Detection")

p4 <- .plt(plot_data_in_tissue, "cluster_display") + 
  scale_color_discrete(name = "Cluster ID", na.value = "lightgray") + ggtitle("D. Problem Area Clusters") + theme(legend.key.size = unit(0.3, "cm"), legend.text = element_text(size = 8))

p5 <- .plt(plot_data_in_tissue, "artifact_type") + 
  scale_color_manual(values = c("Normal" = "lightgray", "Edge Artifact" = "red", "Interior Problem" = "blue"), name = "Type") + ggtitle("E. Edge vs Interior")

p6 <- .plt(plot_data_in_tissue, "edge_artifact_classification") + 
  scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50") + ggtitle("F. Hierarchical Classification")

(p1 | p2) / (p3 | p4) / (p5 | p6)
```

## Classification Summary

Let's examine the enhanced classification system:

### Final Classification Summary

```{r classification, echo=FALSE}
# Use the 'spe_classified' object we created in the step above
final_summary <- table(spe_classified$edge_artifact_classification)
final_pct <- round(100 * final_summary / sum(final_summary), 2)
final_df <- data.frame(
  Classification = names(final_summary),
  Count = as.numeric(final_summary),
  Percentage = paste0(as.numeric(final_pct), "%")
)
knitr::kable(final_df, caption = "Classification Breakdown")
```

### Raw Edge Detection Summary

```{r raw_edge, echo=FALSE}
# Use the 'spe_classified' object and the new column name
edge_summary <- table(spe_classified$edge_artifact_edge)
edge_pct <- round(100 * edge_summary / sum(edge_summary), 2)
edge_df <- data.frame(
  Flagged_As_Edge = names(edge_summary),
  Count = as.numeric(edge_summary),
  Percentage = paste0(as.numeric(edge_pct), "%")
)
knitr::kable(edge_df, caption = "Raw Detection Breakdown")
```

## Quality Control Validation

Finally, let's validate that flagged spots have lower quality metrics:

```{r validation, echo=FALSE, warning=FALSE}
in_tissue_data <- spe_classified[, spe_classified$in_tissue]
qc_data <- data.frame(
  sum_umi = in_tissue_data$sum_umi,
  detected_genes = in_tissue_data$detected,
  flagged = in_tissue_data$edge_artifact_edge
)

flagged_umi <- median(qc_data$sum_umi[qc_data$flagged], na.rm = TRUE)
nonflagged_umi <- median(qc_data$sum_umi[!qc_data$flagged], na.rm = TRUE)

flagged_gene <- median(qc_data$detected_genes[qc_data$flagged], na.rm = TRUE)
nonflagged_gene <- median(qc_data$detected_genes[!qc_data$flagged], na.rm = TRUE)

qc_summary <- data.frame(
  Metric = c("Median UMI", "Median Detected Genes"),
  `Flagged (Edge)` = c(round(flagged_umi), round(flagged_gene)),
  `Non-flagged` = c(round(nonflagged_umi), round(nonflagged_gene)),
  Difference = c(round(nonflagged_umi - flagged_umi), round(nonflagged_gene - flagged_gene)),
  check.names = FALSE
)

knitr::kable(qc_summary, caption = "QC Validation: Flagged vs Non-flagged Spots")

qc_data$flag_status <- ifelse(qc_data$flagged, "Flagged (Raw Edge)", "Non-Flagged")

validation_plot <- ggplot(qc_data, aes(x = flag_status, y = log10(sum_umi + 1), fill = flag_status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("Flagged (Raw Edge)" = "lightcoral", "Non-Flagged" = "lightblue")) +
  labs(
    title = "QC Validation: UMI Counts in Raw Edge vs Non-Edge Spots",
    x = "Raw Edge Detection Status", y = "log10(UMI Count + 1)"
  ) +
  theme_minimal()

print(validation_plot)
```


## Filtering Out Problematic Spots (Optional)

Based on the new classifications, users can make informed decisions about filtering. For example, you might decide to remove all **edge artifacts** (both large and small) while keeping **interior artifacts** for further review.

Here's how you can filter the `SpatialExperiment` object to remove all spots classified as both `"large_edge_artifact"` or `"small_edge_artifact"`:
```{r filtering}
if ("edge_artifact_classification" %in% names(colData(spe_classified))) {
  spots_to_keep <- !spe_classified$edge_artifact_classification %in% 
    c("large_edge_artifact", "small_edge_artifact")
  spe_filtered <- spe_classified[, spots_to_keep]
  message("Original number of spots: ", ncol(spe_classified))
  message("Number of spots after filtering: ", ncol(spe_filtered))
} else {
  message("Classification column not found. Filtering step skipped.")
}
```

```{r plotFiltering, fig.width=12, fig.height=6, message=FALSE, warning=FALSE}
plot_data_before <- as.data.frame(colData(spe_classified))
plot_data_before <- cbind(plot_data_before, as.data.frame(spatialCoords(spe_classified)))
plot_data_before_in_tissue <- plot_data_before[plot_data_before$in_tissue, ]

plot_data_after <- as.data.frame(colData(spe_filtered))
if (ncol(spe_filtered) > 0) {
  plot_data_after <- cbind(plot_data_after, as.data.frame(spatialCoords(spe_filtered)))
}

p1_umi_before <- .plt(plot_data_before_in_tissue, "sum_umi", \(.) log10(.+1)) +
  scale_color_viridis_c(name = "log10(UMI)") + ggtitle("UMI Counts (Before Filtering)")

p3_class_before <- .plt(plot_data_before_in_tissue, "edge_artifact_classification") +
  scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50", drop = FALSE) + ggtitle("Classification (Before Filtering)")

if (ncol(spe_filtered) > 0) {
  p2_umi_after <- .plt(plot_data_after, "sum_umi", \(.) log10(.+1)) +
    scale_color_viridis_c(name = "log10(UMI)") + ggtitle("UMI Counts (After Filtering)")
    
  p4_class_after <- .plt(plot_data_after, "edge_artifact_classification") +
    scale_color_manual(values = c("not_artifact" = "lightgray", "large_edge_artifact" = "red", "small_edge_artifact" = "orange", "large_interior_artifact" = "blue", "small_interior_artifact" = "cyan"), name = "Final Class", na.value = "grey50", drop = FALSE) + ggtitle("Classification (After Filtering)")
} else {
  p2_umi_after <- ggplot() + theme_void() + ggtitle("UMI Counts (After Filtering - No Spots)")
  p4_class_after <- ggplot() + theme_void() + ggtitle("Classification (After Filtering - No Spots)")
}

combined_filtering_plot_2x2 <- (p1_umi_before | p2_umi_after) / (p3_class_before | p4_class_after)
print(combined_filtering_plot_2x2)
```


## Conclusion

This vignette demonstrated the **SpatialArtifacts** workflow across multiple spatial transcriptomics platforms. Specifically, it showed:

- **Standard Visium workflow:** Complete example using the included hippocampus dataset
- **VisiumHD adaptation:** How to properly scale parameters for 16µm and 8µm resolution data  
- **Platform-specific considerations:** Grid arrangement, parameter scaling formulas, and typical artifact sizes
- **Visualization:** Display hierarchical classification results alongside QC metrics  
- **Classification logic:** Distinguish between edge vs. interior and large vs. small artifacts
- **Validation:** Flagged spots show significantly lower molecular capture

**Key Takeaways for Multi-Platform Usage:**

1. **Grid Structure:** Standard Visium uses hexagonal grids (shifted = FALSE by default), while VisiumHD uses square grids (shifted parameter not used)

2. **Critical Parameter Scaling:** The `min_spots` threshold in `classifyEdgeArtifacts()` must scale with spatial resolution:
   - Use the formula: `min_spots_HD = min_spots_visium × (55 / bin_size)²`
   - Without scaling, artifacts will be incorrectly classified by size

3. **Morphological Framework:** The same detection logic works across platforms, automatically adapting to different grid arrangements

4. **Physical Consistency:** Scaled parameters ensure that "large" and "small" artifact classifications represent equivalent physical sizes regardless of platform

Overall, **SpatialArtifacts** provides a unified, platform-agnostic framework for detecting and classifying spatial artifacts, enabling consistent quality control across the evolving spatial transcriptomics technology landscape.

## Session Information
```{r session_info}
sessionInfo()
```