---
title: "Introduction to the SpatialExperiment class"
author: "Dario Righelli, Helena L. Crowell, Lukas M. Weber"
date: "`r format(Sys.Date(), '%b %d, %Y')`"
output:
BiocStyle::html_document:
toc: true
number_sections: true
toc_depth: 3
toc_float:
collapsed: true
vignette: >
%\VignetteIndexEntry{Introduction to the SpatialExperiment class}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: inline
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE, cache.lazy = FALSE)
```
# Class structure
## Introduction
The `SpatialExperiment` class is an R/Bioconductor S4 class for storing
data from spatially resolved transcriptomics (ST) experiments. The class
extends the `SingleCellExperiment` class for single-cell data to support
storage and retrieval of additional information from spot-based and
molecule-based ST platforms, including spatial coordinates, images, and
image metadata. A specialized constructor function is included for data
from the 10x Genomics Visium platform.
The following schematic illustrates the `SpatialExperiment` class
structure.
```{r, echo=FALSE, out.width = "100%", fig.cap="Overview of the SpatialExperiment class structure."}
knitr::include_graphics("SPE.png")
```
As shown, an object consists of: (i) `assays` containing expression counts,
(ii) `rowData` containing information on features, i.e. genes, (iii)
`colData` containing information on spots or cells, including nonspatial
and spatial metadata, (iv) `spatialCoords` containing spatial coordinates,
and (v) `imgData` containing image data. For spot-based ST data (e.g. 10x
Genomics Visium), a single `assay` named `counts` is used. For molecule-based
ST data (e.g. seqFISH), two `assays` named `counts` and `molecules` can be used.
Additional details on the class structure are provided in our
[preprint](https://www.biorxiv.org/content/10.1101/2021.01.27.428431v3).
## Load data
For demonstration of the general class structure, we load an example
`SpatialExperiment` (abbreviated as SPE) object (variable `spe`):
```{r message = FALSE}
library(SpatialExperiment)
example(read10xVisium, echo = FALSE)
spe
```
## `spatialCoords`
In addition to observation metadata stored inside the `colData` slot,
the `SpatialExperiment` class stores spatial coordinates as:
- `spatialCoords`, a numeric matrix of spatial coordinates (e.g. `x` and `y`)
`spatialCoords` are stored inside the `int_colData`, and are directly
accessible via the corresponding accessor:
```{r}
head(spatialCoords(spe))
```
The corresponding column names can be also accessed with `spatialCoordsNames()`:
```{r}
spatialCoordsNames(spe)
```
## `imgData`
All image related data are stored inside the `int_metadata`'s
`imgData` field as a `DataFrame` of the following structure:
* each row corresponds to one image for a given sample
and with a given unique image identifier (e.g. its resolutions)
* for each image, columns specify:
* which `sample_id` the image belongs to
* a unique `image_id` in order to accommodate multiple images
for a given sample (e.g. of different resolutions)
* the image's `data` (a `SpatialImage` object)
* the `scaleFactor` that adjusts pixel positions of the original,
full-resolution image to pixel positions in the image
The `imgData()` accessor can be used to retrieve
the image data stored within the object:
```{r}
imgData(spe)
```
### The `SpatialImage` class
Images are stored inside the `data` field of the `imgData` as a list of
`SpatialImage`s. Each image may be of one of the following sub-classes:
* `LoadedSpatialImage`
* represents an image that is fully realized into memory as a `raster` object
* `@image` contains a `raster` object: a matrix
of RGB colors for each pixel in the image
* `StoredSpatialImage`
* represents an image that is stored in a local file (e.g., as
.png, .jpg or .tif), and loaded into memory only on request
* `@path` specifies a local file from which to retrieve the image
* `RemoteSpatialImage`
* represents an image that is remotely hosted
(under some URL), and retrieved only on request
* `@url` specifies where to retrieve the image from
A `SpatialImage` can be accessed using `getImg()`,
or retrieved directly from the `imgData()`:
```{r}
(spi <- getImg(spe))
identical(spi, imgData(spe)$data[[1]])
```
Data available in an object of class `SpatialImage` may be
accessed via the `imgRaster()` and `imgSource()` accessors:
```{r fig.small = TRUE}
plot(imgRaster(spe))
```
### Adding or removing images
Image entries may be added or removed from a `SpatialExperiment`'s
`imgData` `DataFrame` using `addImg()` and `rmvImg()`, respectively.
Besides a path or URL to source the image from and a numeric scale factor,
`addImg()` requires specification of the `sample_id` the new image belongs to,
and an `image_id` that is not yet in use for that sample:
```{r fig.small = TRUE, eval = TRUE}
url <- "https://i.redd.it/3pw5uah7xo041.jpg"
spe <- addImg(spe,
sample_id = "section1",
image_id = "pomeranian",
imageSource = url,
scaleFactor = NA_real_,
load = TRUE)
img <- imgRaster(spe,
sample_id = "section1",
image_id = "pomeranian")
plot(img)
```
The `rmvImg()` function is more flexible in the specification
of the `sample_id` and `image_id` arguments. Specifically:
- `TRUE` is equivalent to *all*, e.g.
`sample_id = ""`, `image_id = TRUE`
will drop all images for a given sample
- `NULL` defaults to the first entry available, e.g.
`sample_id = ""`, `image_id = NULL`
will drop the first image for a given sample
For example, `sample_id = TRUE`, `image_id = TRUE` will specify all images;
`sample_id = NULL`, `image_id = NULL` corresponds to the first image entry in the `imgData`;
`sample_id = TRUE`, `image_id = NULL` equals the first image for all samples; and
`sample_id = NULL`, `image_id = TRUE` matches all images for the first sample.
Here, we remove `section1`'s `pomeranian` image added in the previous
code chunk; the image is now completely gone from the `imgData`:
```{r}
imgData(spe <- rmvImg(spe, "section1", "pomeranian"))
```
# Object construction
## Manually
The `SpatialExperiment` constructor provides several arguments
to give maximum flexibility to the user.
In particular, these include:
- `spatialCoords`, a numeric `matrix` containing spatial coordinates
- `spatialCoordsNames`, a character vector specifying which
`colData` fields correspond to spatial coordinates
`spatialCoords` can be supplied via `colData`
by specifying the column names that correspond to spatial coordinates
with `spatialCoordsNames`:
```{r}
n <- length(z <- letters)
y <- matrix(nrow = n, ncol = n)
cd <- DataFrame(x = seq(n), y = seq(n), z)
spe1 <- SpatialExperiment(
assay = y,
colData = cd,
spatialCoordsNames = c("x", "y"))
```
Alternatively, `spatialCoords` may be supplied separately
as a `matrix`, e.g.:
```{r}
xy <- as.matrix(cd[, c("x", "y")])
spe2 <- SpatialExperiment(
assay = y,
colData = cd["z"],
spatialCoords = xy)
```
Importantly, both of the above `SpatialExperiment()` function calls
lead to construction of the exact same object:
```{r}
identical(spe1, spe2)
```
Finally, `spatialCoords(Names)` are optional, i.e.,
we can construct a SPE using only a subset of the above arguments:
```{r}
spe <- SpatialExperiment(
assays = y)
isEmpty(spatialCoords(spe))
```
In general, `spatialCoordsNames` takes precedence over `spatialCoords`,
i.e., if both are supplied, the latter will be ignored. In other words,
`spatialCoords` are preferentially extracted from the `DataFrame`
provided via `colData`. E.g., in the following function call,
`spatialCoords` will be ignored, and xy-coordinates are instead extracted
from the input `colData` according to the specified `spatialCoordsNames`.
In this case, a message is also provided:
```{r results = "hide"}
n <- 10; m <- 20
y <- matrix(nrow = n, ncol = m)
cd <- DataFrame(x = seq(m), y = seq(m))
xy <- matrix(nrow = m, ncol = 2)
colnames(xy) <- c("x", "y")
SpatialExperiment(
assay = y,
colData = cd,
spatialCoordsNames = c("x", "y"),
spatialCoords = xy)
```
## Spot-based
When working with spot-based ST data, such as *10x Genomics Visium* or other
platforms providing images, it is possible to store the image information
in the dedicated `imgData` structure.
Also, the `SpatialExperiment` class stores a `sample_id` value in the
`colData` structure, which is possible to set with the `sample_id`
argument (default is "sample_01").
Here we show how to load the default *Space Ranger* data files from a
10x Genomics Visium experiment, and build a `SpatialExperiment` object.
In particular, the `readImgData()` function is used to build an `imgData`
`DataFrame` to be passed to the `SpatialExperiment` constructor.
The `sample_id` used to build the `imgData` object must be the same one
used to build the `SpatialExperiment` object, otherwise an error is returned.
```{r}
dir <- system.file(
file.path("extdata", "10xVisium", "section1", "outs"),
package = "SpatialExperiment")
# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)
# read in image data
img <- readImgData(
path = file.path(dir, "spatial"),
sample_id = "foo")
# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
col.names = c(
"barcode", "in_tissue", "array_row", "array_col",
"pxl_row_in_fullres", "pxl_col_in_fullres"))
# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
symbol = rowData(sce)$Symbol)
# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
assays = list(counts = assay(sce)),
rowData = rd,
colData = DataFrame(xyz),
spatialCoordsNames = c("pxl_col_in_fullres", "pxl_row_in_fullres"),
imgData = img,
sample_id = "foo"))
```
Alternatively, the `read10xVisium()` function facilitates the import of
*10x Genomics Visium* data to handle one or more samples organized in
folders reflecting the default *Space Ranger* folder tree organization,
as illustrated below (where "raw/filtered" refers to either "raw" or
"filtered" to match the `data` argument). Note that the base directory
"outs/" from Space Ranger can either be included manually in the paths
provided in the `samples` argument, or can be ignored; if ignored, it will
be added automatically. The `.h5` files are used if `type = "HDF5"`. (Note
that `tissue_positions.csv` was renamed in Space Ranger v2.0.0.)
```{bash, eval = FALSE}
sample
. | — outs
· · | — raw/filtered_feature_bc_matrix.h5
· · | — raw/filtered_feature_bc_matrix
· · · | — barcodes.tsv.gz
· · · | — features.tsv.gz
· · · | — matrix.mtx.gz
· · | — spatial
· · · | — scalefactors_json.json
· · · | — tissue_lowres_image.png
· · · | — tissue_positions.csv
```
Using `read10xVisium()`, the above code to construct the same SPE is reduced to:
```{r}
dir <- system.file(
file.path("extdata", "10xVisium"),
package = "SpatialExperiment")
sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids, "outs")
(spe10x <- read10xVisium(samples, sample_ids,
type = "sparse", data = "raw",
images = "lowres", load = FALSE))
```
Or alternatively, omitting the base directory `outs/` from Space Ranger:
```{r}
samples2 <- file.path(dir, sample_ids)
(spe10x2 <- read10xVisium(samples2, sample_ids,
type = "sparse", data = "raw",
images = "lowres", load = FALSE))
```
## Molecule-based
To demonstrate how to accommodate molecule-based ST data
(e.g. *seqFISH* platform) inside a `SpatialExperiment` object,
we generate some mock data of 1000 molecule coordinates across
50 genes and 20 cells. These should be formatted into a `data.frame`
where each row corresponds to a molecule, and columns specify the
xy-positions as well as which gene/cell the molecule has been assigned to:
```{r message = FALSE, warning = FALSE}
n <- 1e3 # number of molecules
ng <- 50 # number of genes
nc <- 20 # number of cells
# sample xy-coordinates in [0, 1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# assure gene & cell are factors so that
# missing observations aren't dropped
gene <- factor(gene, gs)
cell <- factor(cell, cs)
# construct data.frame of molecule coordinates
df <- data.frame(gene, cell, x, y)
head(df)
```
Next, it is possible to re-shape the above table into a
`r BiocStyle::Biocpkg("BumpyMatrix")` using `splitAsBumpyMatrix()`, which takes
as input the xy-coordinates, as well as arguments specifying the row and column
index of each observation:
```{r message = FALSE, warning = FALSE}
# construct 'BumpyMatrix'
library(BumpyMatrix)
mol <- splitAsBumpyMatrix(
df[, c("x", "y")],
row = gene, col = cell)
```
Finally, it is possible to construct a `SpatialExperiment` object with two data
slots:
- The `counts` assay stores the number of molecules per gene and cell
(equivalent to transcript counts in spot-based data)
- The `molecules` assay holds the spatial molecule positions (xy-coordinates)
Each entry in the `molecules` assay is a `DFrame` that contains the positions
of all molecules from a given gene that have been assigned to a given cell.
```{r message = FALSE, warning = FALSE}
# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
# construct SpatialExperiment
spe <- SpatialExperiment(
assays = list(
counts = y,
molecules = mol))
spe
```
The `BumpyMatrix` of molecule locations can be accessed
using the dedicated `molecules()` accessor:
```{r message = FALSE, warning = FALSE}
molecules(spe)
```
# Common operations
## Subsetting
Subsetting objects is automatically defined to synchronize across
all attributes, as for any other Bioconductor *Experiment* class.
For example, it is possible to `subset` by `sample_id` as follows:
```{r}
sub <- spe10x[, spe10x$sample_id == "section1"]
```
Or to retain only observations that map to tissue via:
```{r}
sub <- spe10x[, colData(spe10x)$in_tissue]
sum(colData(spe10x)$in_tissue) == ncol(sub)
```
## Combining samples
To work with multiple samples, the `SpatialExperiment` class provides the `cbind`
method, which assumes unique `sample_id`(s) are provided for each sample.
In case the `sample_id`(s) are duplicated across multiple samples, the `cbind`
method takes care of this by appending indices to create unique sample identifiers.
```{r}
spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
unique(spe3$sample_id)
```
Alternatively (and preferentially), we can create unique
`sample_id`(s) prior to `cbind`ing as follows:
```{r}
# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "A", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "B", sep = ".")
# combine into single object
spe3 <- cbind(spe1, spe2)
```
## Sample ID replacement
In particular, when trying to replace the `sample_id`(s) of a `SpatialExperiment`
object, these must map uniquely with the already existing ones, otherwise an
error is returned.
```{r, error=TRUE}
new <- spe3$sample_id
new[1] <- "section2.A"
spe3$sample_id <- new
new[1] <- "third.one.of.two"
spe3$sample_id <- new
```
Importantly, the `sample_id` `colData` field is *protected*, i.e.,
it will be retained upon attempted removal (= replacement by `NULL`):
```{r}
# backup original sample IDs
tmp <- spe$sample_id
# try to remove sample IDs
spe$sample_id <- NULL
# sample IDs remain unchanged
identical(tmp, spe$sample_id)
```
## Image transformations
Both the `SpatialImage` (SpI) and `SpatialExperiment` (SpE) class currently support
two basic image transformations, namely, rotation (via `rotateImg()`) and
mirroring (via `mirrorImg()`). Specifically, for a SpI/E `x`:
* `rotateImg(x, degrees)` expects as `degrees` a single numeric in +/-[0,90,...,360].
Here, a (negative) positive value corresponds to (counter-)clockwise rotation.
* `mirrorImg(x, axis)` expects as `axis` a character string specifying
whether to mirror horizontally (`"h"`) or vertically (`"v"`).
Here, we apply various transformations using both a SpI (`spi`) and SpE (`spe`)
as input, and compare the resulting images to the original:
### Rotation
```{r}
# extract first image
spi <- getImg(spe10x)
# apply counter-/clockwise rotation
spi1 <- rotateImg(spi, -90)
spi2 <- rotateImg(spi, +90)
# visual comparison
par(mfrow = c(1, 3))
plot(as.raster(spi))
plot(as.raster(spi1))
plot(as.raster(spi2))
```
```{r}
# specify sample & image identifier
sid <- "section1"
iid <- "lowres"
# counter-clockwise rotation
tmp <- rotateImg(spe10x,
sample_id = sid,
image_id = iid,
degrees = -90)
# visual comparison
par(mfrow = c(1, 2))
plot(imgRaster(spe10x, sid, iid))
plot(imgRaster(tmp, sid, iid))
```
### Mirroring
```{r}
# extract first image
spi <- getImg(spe10x)
# mirror horizontally/vertically
spi1 <- mirrorImg(spi, "h")
spi2 <- mirrorImg(spi, "v")
# visual comparison
par(mfrow = c(1, 3))
plot(as.raster(spi))
plot(as.raster(spi1))
plot(as.raster(spi2))
```
```{r}
# specify sample & image identifier
sid <- "section2"
iid <- "lowres"
# mirror horizontally
tmp <- mirrorImg(spe10x,
sample_id = sid,
image_id = iid,
axis = "h")
# visual comparison
par(mfrow = c(1, 2))
plot(imgRaster(spe10x, sid, iid))
plot(imgRaster(tmp, sid, iid))
```
# Session Info {.smaller}
```{r tidy = TRUE}
sessionInfo()
```