---
title: "Segmenting and normalizing multiplexed imaging data with simpleSeg"
date: "`r BiocStyle::doc_date()`"
params:
test: FALSE
author:
- name: Alexander Nicholls
affiliation:
- School of Mathematics and Statistics, University of Sydney, Australia
- name: Ellis Patrick
affiliation:
- &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
- School of Mathematics and Statistics, University of Sydney, Australia
- name: Nicolas Canete
affiliation:
- &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
vignette: >
%\VignetteIndexEntry{"Introduction to simpleSeg"}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
output:
BiocStyle::html_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(BiocStyle)
```
```{r warning=FALSE, message=FALSE}
# load required packages
library(simpleSeg)
library(ggplot2)
library(EBImage)
library(cytomapper)
```
# Installation
```{r, eval = FALSE}
# Install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("simpleSeg")
```
# Overview
The `simpleSeg` package extends existing bioconductor packages such as `cytomapper` and `EBImage` by providing a structured pipeline for creating segmentation masks from multiplexed cellular images in the form of tiff stacks. This allows for the single cell information of these images to be extracted in R, without the need for external segmentation programs. `simpleSeg` also facilitates the normalisation of cellular features after these features have been extracted from the image, priming cells for classification / clustering. These functions leverage the functionality of the [`EBImage`](https://bioconductor.org/packages/release/bioc/vignettes/EBImage/inst/doc/EBImage-introduction.html) package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the `EBimage` package. A key strength of `simpleSeg` is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise key parameters when these aren't specified.
# Load example data
In the following we will reanalyse two MIBI-TOF images from [(Risom et al., 2022)](https://www.sciencedirect.com/science/article/pii/S0092867421014860?via%3Dihub#!) profiling the spatial landscape of ductal carcinoma in situ (DCIS), which is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). These images are stored in the "extdata" folder in the package. When the path to this folder is identified, we can read these images into R using `readImage` from `EBImage` and store these as a `CytoImageList` using the `cytomapper` package.
```{r}
# Get path to image directory
pathToImages <- system.file("extdata", package = "simpleSeg")
# Get directories of images
imageDirs <- dir(pathToImages, "Point", full.names = TRUE)
names(imageDirs) <- dir(pathToImages, "Point", full.names = FALSE)
# Get files in each directory
files <- files <- lapply(
imageDirs,
list.files,
pattern = "tif",
full.names = TRUE
)
# Read files with readImage from EBImage
images <- lapply(files, EBImage::readImage, as.is = TRUE)
# Convert to cytoImageList
images <- cytomapper::CytoImageList(images)
mcols(images)$imageID <- names(images)
```
# Segmentation
`simpleSeg` accepts an `Image`, `list` of `Image`'s, or `CytoImageList` as input and generates a `CytoImageList` of masks as output. Here we will use the histone H3 channel in the image as a nuclei marker for segmentation. By default, `simpleseg` will isolate individual nuclei by watershedding using a combination of the intensity of this marker and a distance map. Nuclei are dilated out by 3 pixels to capture the cytoplasm. The user may also specify simple image transformations using the `transform` argument.
```{r}
masks <- simpleSeg::simpleSeg(images,
nucleus = "HH3",
transform = "sqrt")
```
## Visualise separation
The `display` and `colorLabels` functions in `EBImage` make it very easy to examine the performance of the cell segmentation. The great thing about `display` is that if used in an interactive session it is very easy to zoom in and out of the image.
```{r}
# Visualise segmentation performance one way.
EBImage::display(colorLabels(masks[[1]]))
```
## Visualise outlines
The `plotPixels` function in `cytomapper` make it easy to overlay the masks on top of the intensities of 6 markers. Here we can see that the segmentation appears to be performing reasonably.
```{r}
# Visualise segmentation performance another way.
cytomapper::plotPixels(image = images[1],
mask = masks[1],
img_id = "imageID",
colour_by = c("PanKRT", "GLUT1", "HH3", "CD3", "CD20"),
display = "single",
colour = list(HH3 = c("black","blue"),
CD3 = c("black","purple"),
CD20 = c("black","green"),
GLUT1 = c("black", "red"),
PanKRT = c("black", "yellow")),
bcg = list(HH3 = c(0, 1, 1.5),
CD3 = c(0, 1, 1.5),
CD20 = c(0, 1, 1.5),
GLUT1 = c(0, 1, 1.5),
PanKRT = c(0, 1, 1.5)),
legend = NULL)
```
## Methods of Watershedding
Watershedding is a method which treats images as topographical maps in order to identify individual objects and the borders between them.
The user may specify how watershedding is to be performed by using the `watershed` argument in `simpleSeg`.
Method | Description
----|:----:|:----:
"distance" | Performs watershedding on a distance map of the thresholded nuclei signal. With a pixels distance being defined as the distance from the closest background signal.
"intensity" | Performs watershedding using the intensity of the nuclei marker.
"combine" | Combines the previous two methods by multiplying the distance map by the nuclei marker intensity.
## Methods of cell body identification
The cell body can also be identified in `simpleSeg` using models of varying complexity, specified with the `cellBody` argument.
Method | Description
----|:----:|:----:
"dilation" | Dilates the nuclei by an amount defined by the user. The size of the dilatation in pixels may be specified with the `discDize` argument.
"discModel" | Uses all the markers to predict the presence of dilated 'discs' around the nuclei. The model therefore learns which markers are typically present in the cell cytoplasm and generates a mask based on this.
"marker" | The user may specify one or multiple dedicated cytoplasm markers to predict the cytoplasm. This can be done using `cellBody = "marker name"/"index"`
"None" | The nuclei mask is returned directly.
## Parallel Processing
`simpleSeg` also supports parallel processing, with the `cores` argument being used to specify how many cores should be used.
```{r parallel example}
masks <- simpleSeg::simpleSeg(images,
nucleus = "HH3",
cores = 1)
```
# Summarise cell features
In order to characterise the phenotypes of each of the segmented cells, `measureObjects` from `cytomapper` will calculate the average intensity of each channel within each cell as well as a few morphological features. The channel intensities will be stored in the `counts assay` in a `SingleCellExperiment`. Information on the spatial location of each cell is stored in `colData` in the `m.cx` and `m.cy` columns. In addition to this, it will propagate the information we have store in the `mcols` of our `CytoImageList` in the `colData` of the resulting `SingleCellExperiment`.
```{r, out.width = "400px"}
cellSCE <- cytomapper::measureObjects(masks, images, img_id = "imageID")
```
# Normalising cells
Once cellular features have been extracted into a SingleCellExperement or dataframe, these features may then be normalised using the `normalizeCells`function, transformed by any number of transformations (e.g., `asinh`, `sqrt`) and normalisation methods.
`mean`(Divides the marker cellular marker intensities by their mean), `minMax` (Subtracts the minimum value and scales markers between 0 and 1.), `trim99` (Sets the highest 1% of values to the value of the 99th percentile.), `PC1` (Removes the 1st principal component) can be performed with one call of the function, in the order specified by the user.
Method | Description
----|:----:|:----:
"mean" | Divides the marker cellular marker intensities by their mean.
"minMax" | Subtracts the minimum value and scales markers between 0 and 1.
"trim99" | Sets the highest 1% of values to the value of the 99th percentile.`
"PC1" | Removes the 1st principal component) can be performed with one call of the function, in the order specified by the user.
```{r}
# Transform and normalise the marker expression of each cell type.
# Use a square root transform, then trimmed the 99 quantile
cellSCE <- normalizeCells(cellSCE,
assayIn = "counts",
assayOut = "norm",
imageID = "imageID",
transformation = "sqrt",
method = c("trim99", "minMax"))
```
## QC normalisation
We could check to see if the marker intensities of each cell require some form of transformation or normalisation. Here we extract the intensities from the `counts` assay. Looking at PanKRT which should be expressed in the majority of the tumour cells, the intensities are clearly very skewed.
```{r, fig.width=5, fig.height=5}
# Extract marker data and bind with information about images
df <- as.data.frame(cbind(colData(cellSCE), t(assay(cellSCE, "counts"))))
# Plots densities of PanKRT for each image.
ggplot(df, aes(x = PanKRT, colour = imageID)) +
geom_density() +
labs(x = "PanKRT expression") +
theme_minimal()
```
We can see that the normalised data stored in the norm assay appears more bimodal, not perfect, but likely sufficient for clustering.
```{r, fig.width=5, fig.height=5}
# Extract normalised marker information.
df <- as.data.frame(cbind(colData(cellSCE), t(assay(cellSCE, "norm"))))
# Plots densities of normalised PanKRT for each image.
ggplot(df, aes(x = PanKRT, colour = imageID)) +
geom_density() +
labs(x = "PanKRT expression") +
theme_minimal()
```
## Session Info
```{r}
sessionInfo()
```