---
title: "Building SpatialExperiment objects"
author: "Dario Righelli"
date: "`r format(Sys.Date(), '%b %m, %Y')`"
output:
BiocStyle::html_document:
toc: true
vignette: >
%\VignetteIndexEntry{Building SpatialExperiment object}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE, cache.lazy = FALSE)
```
# Installation
The `SpatialExperiment` package is available via Bioconductor.
```{r, eval=FALSE}
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpatialExperiment")
```
Load the package as follows:
```{r message = FALSE, warning = FALSE}
library(SpatialExperiment)
```
# The SpatialExperiment class
The `SpatialExperiment` class is designed to represent spatially
resolved transcriptomics (ST) data. It inherits from the
`SingleCellExperiment` class and is used in the same manner. In
addition, the class supports storage of spatial information via
`spatialData` and `spatialCoords`, and storage of
images via `imgData`.
# spatialData and spatialCoords
The `SpatialExperiment` class constructor is defined with several arguments to
provide maximum flexibility to the user.
In particular, we distinguish between `spatialData` and `spatialCoords` as
follows:
- `spatialData` expects a `DataFrame` with all the data associated with the
spatial information (optionally including spatial coordinates from `spatialCoords`).
- `spatialCoords` is a numeric `matrix` containing only the defined
(via `spatialCoordsNames`) spatial coordinates.
When building a `SpatialExperiment` object, the columns of spatial coordinates from the
`spatialData` `DFrame` are identified via the `spatialCoordsNames` argument and
stored separately as a numeric `matrix` within `spatialCoords`.
Following is an example of `spatialCoords` built via `spatialData` and
`spatialCoordsNames`.
```{r}
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assay=mat,
spatialData=cd,
spatialCoordsNames=c("x", "y"))
head(spatialCoords(spe))
spatialCoordsNames(spe)
head(spatialData(spe))
spatialDataNames(spe)
head(colData(spe))
```
It is also possible to display the combined spatial information with
`spatialData()` using the `spatialCoords=TRUE` argument:
```{r}
spatialData(spe, spatialCoords=TRUE)
```
Alternatively, it is possible to define the `spatialDataNames` to define
the `spatialData` `DFrame` from the columns of `colData`.
```{r}
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(
assay=mat, colData=cd, spatialCoordsNames=c("x", "y"), spatialDataNames="z")
head(spatialData(spe))
head(spatialCoords(spe))
head(colData(spe))
```
Also, it is possible to load a numeric `matrix` of coordinates with the
`spatialCoords` argument.
```{r}
y <- diag(n <- 10)
mat <- matrix(0, n, m <- 2)
spe <- SpatialExperiment(assays = y, spatialCoords = mat)
```
Finally, it is possible to set `spatialData`, `spatialCoords`, and `colData`
separately.
```{r}
mat <- as.matrix(cd[,1:2])
colnames(mat) <- c("ecs","uai")
spad <- DataFrame(a=1:26, b=1:26, z=letters)
asy <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assays = asy, spatialCoords = mat,
spatialData=spad, colData=cd)
head(spatialData(spe))
head(spatialCoords(spe))
head(colData(spe))
```
# Working with multiple samples
To work with multiple samples, the `SpatialExperiment` class provides the `cbind`
method, which assumes unique `sample_id`(s) are provided for each sample.
In case the `sample_id`(s) are duplicated across multiple samples, the `cbind`
method takes care of this by appending indices to create unique sample identifiers.
```{r}
spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
unique(spe3$sample_id)
```
Otherwise, it is possible to create unique `sample_id`(s) as follows.
```{r}
# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "sample1", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "sample2", sep = ".")
# combine into single object
spe3 <- cbind(spe1, spe2)
spe3
```
# Subsetting a SpatialExperiment object
Subsetting objects is automatically defined to synchronize across all attributes of
the objects, as for any other Bioconductor *Experiment* class.
For example, it is possible to `subset` by `sample_id` as follows:
```{r}
spe3[, colData(spe)$sample_id=="sample1.sample1"]
```
# sample_id requires one-to-one mapping replacement
In particular, when trying to replace the `sample_id`(s) of a `SpatialExperiment`
object, these must map uniquely with the already existing ones, otherwise an
error is returned.
```{r, error=TRUE}
new <- spe3$sample_id;
new[1] <- "sample1.sample2"
spe3$sample_id <- new
new[1] <- "third.one.of.two"
spe3$sample_id <- new
```
# Spot-based ST data (e.g. 10x Genomics Visium)
When working with spot-based ST data, such as *10x Genomics Visium* or other
platforms providing images, it is possible to store the image information in the
dedicated `imgData` structure.
Also, the `SpatialExperiment` class stores a `sample_id` value in the
`spatialData` structure, which is possible to set with the `sample_id` argument
(default is "sample_01").
Here we show how to load the default *Space Ranger* data files from a 10x Genomics
Visium experiment, and build a `SpatialExperiment` object.
In particular, the `readImgData()` function is used to build an `imgData`
`DataFrame` to be passed to the `SpatialExperiment` constructor.
The `sample_id` used to build the `imgData` object must be the
same one used to build the `SpatialExperiment` object, otherwise an error is
returned.
```{r}
dir <- system.file(
file.path("extdata", "10xVisium", "section1"),
package = "SpatialExperiment")
# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)
# read in image data
img <- readImgData(
path = file.path(dir, "spatial"),
sample_id="foo")
# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
col.names = c(
"barcode", "in_tissue", "array_row", "array_col",
"pxl_row_in_fullres", "pxl_col_in_fullres"))
# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
symbol = rowData(sce)$Symbol)
# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
assays = list(counts = assay(sce)),
colData = colData(sce), rowData = rd, imgData = img,
spatialData=DataFrame(xyz),
spatialCoordsNames=c("pxl_col_in_fullres", "pxl_row_in_fullres"),
sample_id="foo"))
```
Alternatively, the `read10xVisium()` function facilitates the import of
*10x Genomics Visium* data to handle one or more samples organized in
folders reflecting the default *Space Ranger* folder tree organization:
sample
|—outs
· · |—raw/filtered_feature_bc_matrix.h5
· · |—raw/filtered_feature_bc_matrix
· · · · |—barcodes.tsv
· · · · |—features.tsv
· · · · |—matrix.mtx
· · |—spatial
· · · · |—scalefactors_json.json
· · · · |—tissue_lowres_image.png
· · · · |—tissue_positions_list.csv
```{r}
dir <- system.file(
file.path("extdata", "10xVisium"),
package = "SpatialExperiment")
sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids)
(spe <- read10xVisium(samples, sample_ids,
type = "sparse", data = "raw",
images = "lowres", load = FALSE))
```
# Molecule-based ST data
To demonstrate how to accommodate molecule-based ST data (e.g. *seqFISH* platform)
inside a `SpatialExperiment` object, we generate some mock data of
`r n = 1000` molecule coordinates across `r ng = 50` and `r nc = 20` cells.
These should be formatted into a `data.frame` where each row corresponds to a
molecule, and columns specify the xy-position as well as which gene/cell the
molecule has been assigned to:
```{r message = FALSE, warning = FALSE}
# sample xy-coordinates in [0,1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# construct data.frame of molecule coodinates
df <- data.frame(gene, cell, x, y)
head(df)
```
Next, it is possible to re-shape the above table into a
`r BiocStyle::Biocpkg("BumpyMatrix")` using `splitAsBumpyMatrix()`, which takes
as input the xy-coordinates, as well as arguments specifying the row and column
index of each observation:
```{r message = FALSE, warning = FALSE}
# (assure gene & cell are factor so that
# missing observations aren't dropped)
df$gene <- factor(df$gene, gs)
df$cell <- factor(df$cell, cs)
# construct BumpyMatrix
library(BumpyMatrix)
mol <- splitAsBumpyMatrix(
df[, c("x", "y")],
row = gs, col = cs)
```
Finally, it is possible to construct a `SpatialExperiment` object with two data
slots:
- The `counts` assay stores the number of molecules per gene and cell
(equivalent to transcript counts in spot-based data).
- The `molecules` assay holds the spatial molecule positions (xy-coordinates).
Here, each entry is a `DFrame` that contains the positions of all molecules
from a given gene that have been assigned to a given cell.
```{r message = FALSE, warning = FALSE}
# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
# construct SpatialExperiment
spe <- SpatialExperiment(
assays = list(
counts = y,
molecules = mol))
spe
```
The `BumpyMatrix` of molecule locations can be accessed using the dedicated
`molecules()` accessor:
```{r message = FALSE, warning = FALSE}
molecules(spe)
```
# Session Info
```{r tidy=TRUE}
sessionInfo()
```