--- title: "Battlefield" author: - name: Jean-Philippe Villemin affiliation: - Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France email: jean-philippe.villemin@inserm.fr date: "`r format(Sys.Date(), '%m/%d/%Y')`" abstract: >

`Battlefield` is a **Swiss-army toolkit designed to define and extract spatial spots from specific regions in spatial transcriptomics data.**

It provides low-level, modular utilities to delineate spatial regions of interest, including **interfaces between clusters**, **intra-cluster layers**, and **inter-cluster trajectories**.

These utilities are intended to be reused and composed within higher-level analytical workflows and packages.

`Battlefield` supports sequencing-based spatial transcriptomics platforms such as **10x Genomics Visium**, across multiple resolutions, including Visium HD (binned).

Battlefield package version: `r packageVersion("Battlefield")` output: BiocStyle::html_document: self_contained: true toc: true toc_depth: 4 highlight: pygments fig_height: 3 fig_width: 3 fig_caption: no code_folding: show package: Battlefield vignette: > %\VignetteIndexEntry{Battlefield-Main} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::knit_hooks$set(optipng = knitr::hook_optipng) ``` ```{r load-libs, message = FALSE, warning = FALSE, results = FALSE} library(Battlefield) library(SpatialExperiment) library(ggplot2) library(dplyr) library(tidyr) library(pheatmap) library(pals) library(grid) ``` # Introduction `Battlefield` is a **Swiss-army toolkit designed to define and extract spatial spots from specific regions in spatial transcriptomics data.**

It provides low-level, modular utilities to delineate spatial regions of interest, including **interfaces between clusters**, **intra-cluster layers**, and **inter-cluster trajectories**.

These utilities are intended to be reused and composed within higher-level analytical workflows and packages.

`Battlefield` supports sequencing-based spatial transcriptomics platforms such as **10x Genomics Visium**, across multiple resolutions, including Visium HD (binned).

# What is it for? `Battlefield` provides four core functionalities to define and extract spatial transcriptomics spots from specific tissue regions, enabling the study of spatial organization under different biological contexts: * **Interfaces between clusters** — identify and analyze boundary regions where distinct tissue or cell populations interact. * **Intra-cluster layers** — characterize spatial layers or gradients within a single cluster. * **Inter/Intra-cluster trajectories** — model spatial transitions across multiple clusters. * **Spatial neighborhood** — report the cluster composition of a spatial neighborhood defined by k nearest spots, constrained by a distance threshold. Finally, you can integrate `Battlefield` results back into a `r Biocpkg("SpatialExperiment")` object for further analysis. # Starting point We start from a `r Biocpkg("SpatialExperiment")` object that contains **(i)** spatial coordinates for each spot and **(ii)** a precomputed clustering stored in `colData()`. Battlefield operates on a simple spot-level table, so we first export a `data.frame` with four columns: * `spot_id`: spot/barcode identifier (`colnames(spe)`) * `x`, `y`: spatial coordinates (`spatialCoords(spe)`) * `cluster`: cluster label provided by the user (a column in `colData(spe)`) Because the cluster annotation can have different names depending on the workflow (e.g., `cluster`, `seurat_clusters`, `r Biocpkg("BayesSpace")`, etc.), the user can specify which `colData()` column should be used. We load a simulated Visium dataset (hexagonal grid) to illustrate the different functionalities that Battlefield provides. ```{r interfaces1, fig.dim = c(7, 3)} # Load Visium data data("visium_simulated_spe") df <- data.frame( spot_id = colnames(visium_simulated_spe), x = spatialCoords(visium_simulated_spe)[, 1], y = spatialCoords(visium_simulated_spe)[, 2], cluster = colData(visium_simulated_spe)$cluster ) # Plot cluster distribution ggplot(df, aes(x, y, fill = cluster)) + geom_point(size = 3.2,colour = "grey", shape = 21) + coord_equal() + theme_minimal()+ labs(x = "", y = "") + theme(axis.text = element_blank()) ``` You can have access to some basic information about the dataset you are using (grid type, distance threshold (radius) advised...) ```{r interfaces2, results='asis'} # Detect grid type and get parameters res <- detect_grid_type(df, verbose = FALSE) params <- get_neighborhood_params(df, verbose = FALSE) ``` ## Interfaces between clusters Battlefield allows you to identify and extract **spatial interfaces** between clusters. Here we demonstrate how to: 1. Detect interfaces between two specific clusters (according to three modes of selection called `inner`, `outer` or `both`) 2. Identify multi-interface spots all at once 3. Detect a same number of control spots relative to the cluster of interest `Battlefield` offers several functionalities related to interfaces (aka borders) via the following fonctions metionned below: * ` select_border_spots`, between two clusters according to a specified mode * ` build_all_borders` * ` select_core_spots` , to select control spots within a cluster relative to its border * ` build_all_cores` You must have first defined borders before selecting core spots. Here we show an example to detect interfaces between clusters 4 and 5. We identify border spots from cluster 4 that are adjacent to cluster 5, referenced as interface for the adjactent cluster. We used the `inner` mode to select spots, meaning that only spots from cluster 4 that are located in the inside of the cluster towards cluster 5 (the interface) are selected. Note : we used a `max_dist = 60` microns to define the spatial neighborhood to remove spots that would have been too far away to be considered as neighbors. ```{r interfaces3, fig.dim = c(7, 3)} border_in <- select_border_spots(df, cluster = 4, interface = 5, mode = "inner", max_dist = 60) # A similar approach would have been # select_border_spots(df, # cluster = 5, # interface = 4, # mode = "outer", # max_dist = 60) knitr::kable(head(border_in)) # Structre data df_in <- df |> mutate(is_border = spot_id %in% border_in$spot_id) |> left_join(border_in |> select(spot_id, is_border_multiple),by = "spot_id") |> mutate(is_border_multiple = coalesce(is_border_multiple, FALSE)) # Plot interface 5 → 4 ggplot(df_in, aes(x, y, fill = cluster)) + geom_point(size = 3.2, shape = 21, colour = "grey") + geom_point(data = subset(df_in, is_border & !is_border_multiple), size = 3.2, colour = "black", fill = NA) + geom_point(data = subset(df_in, is_border & is_border_multiple), size = 3.2, colour = "red", fill = NA) + coord_equal() + theme_minimal()+ labs(x = "", y = "") + theme(axis.text = element_blank()) ``` We can also identify all interfaces at once and visualize spots that are part of multiple interfaces. Here we used again a `max_dist = 60` microns and `k = 6` neighbors to define the spatial neighborhood. `is_border_multiple` indicates whether a spot is part of multiple interfaces and is plotted with a different alpha in the example. This approach needs further manual selection of the cluster pair of interest as it brings some redundacy. (for example, border spots for both directions 3→4 and 4→3 are identified twice due to sequencial call of `select_border_spots` with the two modes (inner/outer)). ```{r interfaces4, fig.dim = c(7, 3)} all_borders <- build_all_borders(df,max_dist = 60 , k = 6) knitr::kable(head(all_borders)) # Structure data df2 <- df |> left_join(all_borders |> select(spot_id, undirected_pair, is_border_multiple),by = "spot_id") |> mutate(is_border = !is.na(undirected_pair), is_border_multiple = coalesce(is_border_multiple, FALSE)) # Plot all interfaces ggplot(df2, aes(x, y)) + geom_point(data = subset(df2, !is_border), fill="grey" ,colour = "white", size = 3.2,shape = 21) + geom_point(data = subset(df2, is_border & is_border_multiple), aes(color = undirected_pair), size = 3.2, alpha = 0.4) + geom_point(data = subset(df2, is_border & !is_border_multiple), aes(color = undirected_pair), size = 3.2) + coord_equal() + theme_minimal()+ labs(x = "", y = "",color = "") + theme(axis.text = element_blank()) ``` Next, we plot the different types of spots identified for the interface between clusters 3 and 4. We identify border spots for both directions (3→4 and 4→3) and a relative selection of core spots within cluster 3 that can be used as control spots for further analysis. ```{r interfaces5, fig.dim = c(7, 3)} # Get all pairs and detect grid pairs_visium <- directed_cluster_interface_pairs(df$cluster) params_visium <- get_neighborhood_params(df, verbose = FALSE) # Define cluster pair of interest cluster_A <- 3 cluster_B <- 4 # Build all borders to identify inner spots all_borders_visium <- build_all_borders(df, k = 6, pairs = pairs_visium) # Select border spots for both directions border_A_to_B <- subset(all_borders_visium, cluster == cluster_A & interface == cluster_B ) border_B_to_A <- subset(all_borders_visium, cluster == cluster_B & interface == cluster_A ) # Select inner spots for cluster A inner_A <- select_core_spots(df, all_borders_visium, cluster = cluster_A, interface = cluster_B, mode = "both") # Create visualization dataframe with spot classification df_final <- mutate(df, spot_type=case_when( spot_id %in% border_A_to_B$spot_id ~ "border_A_to_B", spot_id %in% border_B_to_A$spot_id ~ "border_B_to_A", spot_id %in% inner_A$spot_id ~ "inner_A", TRUE ~ "background")) |> as.data.frame() # Plot: clusters with borders and core control spots ggplot(df_final, aes(x, y)) + # 1) Base clusters fill="gray" ,colour = "grey", size = 3.2,shape = 21) + # nolint: line_length_linter. geom_point(aes(color = cluster), size = 2.5) + # 2) Draw outlines (slightly larger) so they "surround" without hiding geom_point(data = subset(df_final, spot_type == "border_A_to_B"), color = "blue", size = 2.5, shape = 1, stroke = 1.25) + geom_point(data = subset(df_final, spot_type == "border_B_to_A"), color = "red", size = 2.5, shape = 1, stroke = 1.25) + geom_point(data = subset(df_final, spot_type == "inner_A"), color = "black", size = 2.5, shape = 1, stroke = 1.25) + # 3) Re-draw the colored points ON TOP for the highlighted subset geom_point( data = subset(df_final, spot_type %in% c("border_A_to_B", "border_B_to_A", "inner_A")), aes(color = cluster), size = 2.5 ) + coord_equal() + theme_minimal() + labs(x = "", y = "",color = "") + theme(axis.text = element_blank()) ``` ## Intra-cluster layers You can directly work on a specific cluster to define three layers as core, intermediate and border. `intermediate_quantile` allows to control the thickness of the intermediate layer. ```{r layers1,fig.dim = c(7,3)} target_cluster <- 3 # Narrow intermediate layer layers_df <- create_cluster_layers(df, target_cluster = target_cluster, intermediate_quantile = 0.33) # Create visualization dataframe df_viz <- df |> mutate(layer = NA_character_) |> as.data.frame() # Map layers to the full dataset idx_match <- match(layers_df$spot_id, df_viz$spot_id) valid_idx <- !is.na(idx_match) df_viz$layer[idx_match[valid_idx]] <- layers_df$layer[valid_idx] # Create combined plot: clusters in background + layers overlay with circles ggplot(df_viz, aes(x, y)) + # base clusters geom_point(aes(color = cluster), size = 2.5) + # 2) Draw outlines (slightly larger) so they "surround" without hiding geom_point( data = subset(df_viz, layer == "core"), shape = 1, color = "#DDDDDD", size = 3.2, stroke = 1.25 ) + geom_point( data = subset(df_viz, layer == "intermediate"), shape = 1, color = "#666666", size = 3.2, stroke = 1.25 ) + geom_point( data = subset(df_viz, layer == "border"), shape = 1, color ="black", size = 3.2, stroke = 1.25 ) + # 3) Re-draw the colored points ON TOP for the highlighted subset geom_point( data = subset(df_viz, layer %in% c("core", "intermediate", "border")), aes(color = cluster), size = 2.5 ) + coord_equal() + theme_minimal()+ labs( x = "", y = "", color = "" ) + theme( axis.text = element_blank() ) ``` ## Inter/Intra-cluster trajectories We can also identify spatial trajectories between two spots from two cluster or inside a single cluster. Here we demonstrate how to build a trajectory between clusters 3 and 4 using `build_one_trajectory()`. ```{r trajectories1,fig.dim = c(7,3)} # We retrieve expression for later visualization expr <- as.numeric(assay(visium_simulated_spe, "counts")["FAKE_GENE", ]) test <- cbind(as.data.frame(colData(visium_simulated_spe)), df[, c("x", "y"), drop = FALSE]) test$expr <- expr start_cluster <- "3" end_cluster <- "4" top_n <- 8 centroids <- compute_centroids(df) A <- centroids[centroids$cluster == start_cluster, c("x","y")] B <- centroids[centroids$cluster == end_cluster, c("x","y")] res <- build_one_trajectory(df, A, B, top_n = top_n, max_dist = NULL) knitr::kable(head(res)) # Plot cluster distribution ggplot(test, aes(x, y, fill = cluster)) + geom_point(size = 3.2,colour="grey", shape = 21) + geom_path(data=res, aes(x=x, y=y, group=trajectory_id), color="black",linewidth=2) + geom_point(data = res, aes(x, y), size = 3.2, fill="red",colour="black", shape = 21) + coord_equal() + theme_minimal() + labs(x = "", y = "") + theme(axis.text = element_blank()) ``` But the real interest here is to identify multiple similar trajectories for a fixed number of spots and trajectories to explore possible gene expression gradients using `build_similar_trajectories()`. Here we build two extra trajectories on each side of the main trajectory. `lane_factor` allows to control the spacing between trajectories. The number of trajectories returned is `n_extra` according to the `side` parameter setting. ```{r trajectories2, fig.dim = c(7,3)} out <- build_similar_trajectories(df, A, B, top_n = top_n, n_extra = 2, lane_width_factor = 2.5, side = "both") knitr::kable(head(out)) ggplot(test, aes(x, y)) + geom_point(aes(color = expr),size = 1.6, alpha = 0.85) + scale_color_gradient(low = "blue",high = "red") + geom_path(data=out, aes(x=x, y=y, group=trajectory_id), inherit.aes = FALSE, color="black",linewidth=1, arrow = grid::arrow(type = "closed", length = grid::unit(2, "mm"))) + geom_point(data = out, aes(x, y), inherit.aes = FALSE, size = 2.2, fill="white",color="black") + coord_equal() + theme_minimal() + labs(x = "", y = "") + theme(axis.text = element_blank()) ``` We can now explore quickly the expression of this fake gene via an heatmap. ```{r trajectories3, fig.dim = c(7,3)} meta <- out|> transmute( spot_id = as.character(spot_id), trajectory = as.character(trajectory_id), progress = as.numeric(pos_on_seg) ) |> group_by(trajectory) |> arrange(progress, .by_group = TRUE) |> mutate( index = seq_len(n()) # <- index 1..length ordered according to t ) |> ungroup() gene <- c("FAKE_GENE") expr <- assay(visium_simulated_spe, "counts")[gene, meta$spot_id] meta$expr <- expr mat <- meta |> select(trajectory, index, expr) |> tidyr::pivot_wider(names_from = index, values_from = expr) |> as.data.frame() rownames(mat) <- mat$trajectory mat$trajectory <- NULL pheatmap( mat, cluster_rows = FALSE, cluster_cols = FALSE, border_color = "white", main = gene,angle_col=0, col=pals::coolwarm(), scale="row", cellwidth=25, cellheight=25 , na_col = "grey90" ) ``` ## Spatial neighborhood `Battlefield` also provides a utility to report the cluster composition of a spatial neighborhood defined by k nearest spots, constrained by a distance threshold. It can used a cluster or a spot of interest. There is three major functions : * ` get_neighborhood_spots ` : get the spots in the neighborhood of a cluster * ` count_neighborhood ` : count the composition of a neighborhood for a cluster * ` count_all_neighborhoods ` : count the composition of neighborhoods for all clusters By default, it will report clusters present in the neighborhood but you can provide another column name with `inlaid_col` parameter to report other annotations. ```{r neighborhood1} # Getting neighborhoods... neighborhood_spots_1 <- get_neighborhood_spots(df, cluster = 1, k = 200) head(neighborhood_spots_1) # Counting the clusters in the neighborhoods... neighb_vis <- count_neighborhood(df, cluster = 1, k = 100) head(neighb_vis) ``` We simulated another another annotation to create inlaid black spots that we will count in the neighborhood of cluster 1 just after the plot. ```{r neighborhood2, fig.dim = c(7,3)} neighb_vis <- count_all_neighborhoods(df, k = 100) # Add inlaid column with random values (5 categories) df$inlaid <- sample(paste0("inlaid", 1:5), nrow(df), replace = TRUE) df_neighborhood_viz <- df |> mutate( spot_type = "background", spot_type = ifelse(cluster == 1, "source", spot_type), spot_type = ifelse(spot_id %in% neighborhood_spots_1$spot_id, "neighborhood", spot_type) ) |> as.data.frame() # Apply the same cluster factor levels as df_vis df_neighborhood_viz$cluster <- factor(df_neighborhood_viz$cluster, levels = sort(unique(df$cluster))) ggplot(df_neighborhood_viz, aes(x, y)) + geom_point( data = subset(df_neighborhood_viz, spot_type == "neighborhood"), shape = 1, color = "grey", size = 3.2, stroke = 1.25 ) + geom_point( aes(color = cluster), size = 2.5 ) + geom_point( data = subset(df_neighborhood_viz, inlaid == "inlaid1"), fill = "black", size = 2.5 ) + coord_equal() + theme_minimal() + labs(x = "", y = "") + theme(axis.text = element_blank() ) # Counting the black point -inlaid spots- in the neighborhood of cluster 1 neighb_vis <- count_neighborhood(df, cluster = 1, inlaid_col = "inlaid",k = 100) head(neighb_vis) ``` We previously count these point in the neighborhood of cluster 1. We can count the black spots directly inside cluster 1. For this purpose, three major functions are available: * ` get_inlaid_spots ` : get the inlaid spots inside a cluster * ` count_inlaid ` : count the composition of inlaid spots for a cluster * ` count_all_inlaids ` : count the composition of inlaid spots for all clusters ```{r neighborhood3,eval=TRUE} # === Testing get_inlaid_spots inlaid_spots_1 <- get_inlaid_spots(df, cluster = 1, inlaid_col = "inlaid") # === Testing count_inlaid === inlaid_1 <- count_inlaid(df, cluster = 1, inlaid_col = "inlaid") # === Testing count_all_inlaids === all_inlaids <- count_all_inlaids(df, inlaid_col = "inlaid") knitr::kable(head(all_inlaids, 15)) ``` # Ending point You can finaly integrate `Battlefield` results back into a `r Biocpkg("SpatialExperiment")` object for further analysis : * `add_borders_to_spe()` — adds `interface`, `directed_pair`, `is_border`, and `is_core` columns to `colData()` * `add_layers_to_spe()` — adds `layer` column ("border", "intermediate", or "core") to `colData()` * `add_trajectories_to_spe()` — adds `trajectory_id`, `pos_on_seg`, and trajectory metadata columns to `colData()` ```{r integration1} visium_simulated_spe <- add_borders_to_spe(visium_simulated_spe, border = all_borders) visium_simulated_spe <- add_layers_to_spe(visium_simulated_spe, layer = layers_df) visium_simulated_spe <- add_trajectories_to_spe(visium_simulated_spe, trajectory = out) # View the annotated colData head(colData(visium_simulated_spe)) ``` # Session Information ```{r session-info} sessionInfo() ```