imcRtools 1.2.3
This vignette gives an introduction to handling and analyzing imaging mass
cytometry (IMC) and other highly-multiplexed imaging data in R. The imcRtools
package relies on expression and morphological features extracted from
multi-channel images using corresponding segmentation masks. A description
of data types and segmentation approaches can be found below
(data types, segmentation). However, due to
shared data structures, the functionalities of the imcRtools
package are
applicable to most highly multiplexed imaging modalities.
The imcRtools
package exports functions and example data to perform the
following analyses:
To highlight the usability of these function, the imcRtools
package also
exports a number of test data files.
Highly multiplexed imaging techniques (Bodenmiller 2016) such as imaging mass
cytometry (IMC) (Giesen et al. 2014), multiplexed ion beam imaging (MIBI) (Angelo et al. 2014)
and cyclic immunofluorescence techniques (Lin et al. 2018, @Gut2018)
acquire read-outs of the expression of tens of protein in a spatially resolved
manner. After technology-dependent data pre-processing, the raw data files are
comparable: multi-channel images for the dimensions x
, y
, and c
, where x
and y
define the number of pixels (x * y
) per image and c
the number
of proteins (also refered to as “markers”) measured per image. More info on the
data types and further pre-processing can be
found below.
Increased multiplexity compared to epitope-based techniques is achieved using single-cell resolved spatial transcriptomics techniques including MERFISH (Chen et al. 2015) and seqFISH (Lubeck et al. 2014). However, data produced by these techniques required different pre-processing steps. Nevertheless, analysis performed on the single-cell level is equally applicable.
IMC (Giesen et al. 2014) is a highly multiplexed imaging approach to measure spatial protein abundance. In IMC, tissue sections on slides are stained with a mix of around 40 metal-conjugated antibodies prior to laser ablation with \(1\mu{}m\) resolution. The ablated material is transferred to a mass cytometer for time-of-flight detection of the metal ions (Giesen et al. 2014). At an ablation frequency of 200Hz, a 1mm x 1mm region can be acquired within 1.5 hours.
In the case of IMC, the raw data output are .mcd files containing all acquired regions per slide. During image pre-prcoessing these files are converted into individual multi-channel .tiff and OME-TIFF files. These file formats are supported by most open-source and commercial image analysis software, such as Fiji, QuPath or napari.
In R, these .tiff files can be read into individual Image objects or combined into a CytoImageList object supported by the cytomapper package.
The pixel resolution of most highly multiplexed imaging technologies including IMC support the detection of cellular structures. Therefore, a common processing step using multi-channel images is object segmentation. In this vignette, objects are defined as cells; however, also larger scale structures could be segmented.
There are multiple different image segmentation approaches available. However,
imcRtools
only supports direct reader functions for two segmentation strategies
developed for highly multiplexed imaging technologies:
The ImcSegmentationPipeline has been developed to give the user flexibility in how to perform channel selection and image segmentation. The underlying principle is to train a pixel classifier (using ilastik) on a number of selected channels to perform probabilistic classification of each pixel into: background, cytoplasm and nucleus. Based on these classification probabilities a CellProfiler pipeline performs cell segmentation and feature extraction.
A containerized version of this pipeline is implemented in
steinbock. steinbock
further
supports segmentation via the use of Mesmer
from the DeepCell
library (Greenwald et al. 2021).
While the output of both approaches is structured differently, the exported features are comparable:
By combining these with availabel channel information, the data can be read into a SpatialExperiment or SingleCellExperiment object (see below).
The imcRtools
package contains a number of example data generated by the
Hyperion imaging system for different purposes. The following section gives an
overview of these files.
To highlight the use of the imcRtools
package for spillover correction,
we provide four .txt files containing pixel intensities of four spotted metals.
Please refer to the spillover correction section for full details.
These files are accessible via:
path <- system.file("extdata/spillover", package = "imcRtools")
list.files(path, recursive = TRUE)
## [1] "Dy161.txt" "Dy162.txt" "Dy163.txt" "Dy164.txt"
IMC generates .mcd files storing the raw data or all acquired regions of interest (ROI). In addition, the raw pixel values are also stored in individual .txt files per ROI.
To highlight reading in raw data in form of .txt files, the imcRtools
contains
3 sample acquisitions:
txt_files <- list.files(system.file("extdata/mockData/raw",
package = "imcRtools"))
txt_files
## [1] "20210305_NE_mockData2_ROI_001_1.txt" "20210305_NE_mockData2_ROI_002_2.txt"
## [3] "20210305_NE_mockData2_ROI_003_3.txt"
IMC data preprocessing and segmentation can be performed using the
ImcSegmentationPipeline.
It generates a number of .csv
files containing object/cell-specific
and image-specific metadata.
The imcRtools
package exports the read_cpout
function as convenient reader
function for outputs generated by the ImcSegmentationPipeline
. For
demonstration purposes, imcRtools
contains the output of running the pipeline
on a small example dataset:
path <- system.file("extdata/mockData/cpout", package = "imcRtools")
list.files(path, recursive = TRUE)
## [1] "Experiment.csv" "Image.csv"
## [3] "Object_relationships.csv" "cell.csv"
## [5] "panel.csv" "var_Image.csv"
## [7] "var_cell.csv"
The steinbock pipeline can be used to process, segment and extract features from IMC data. For more information, please refer to its documentation.
To highlight the functionality of imcRtools
to read in single-cell data
generated by steinbock
, we provide a small toy dataset available at:
path <- system.file("extdata/mockData/steinbock", package = "imcRtools")
list.files(path, recursive = TRUE)
## [1] "cells.csv"
## [2] "images.csv"
## [3] "intensities/20210305_NE_mockData1_1.csv"
## [4] "intensities/20210305_NE_mockData1_2.csv"
## [5] "intensities/20210305_NE_mockData1_3.csv"
## [6] "intensities/20210305_NE_mockData2_1.csv"
## [7] "intensities/20210305_NE_mockData2_2.csv"
## [8] "intensities/20210305_NE_mockData2_3.csv"
## [9] "intensities/20210305_NE_mockData3_1.csv"
## [10] "intensities/20210305_NE_mockData3_2.csv"
## [11] "intensities/20210305_NE_mockData3_3.csv"
## [12] "intensities/20210305_NE_mockData4_1.csv"
## [13] "intensities/20210305_NE_mockData4_2.csv"
## [14] "intensities/20210305_NE_mockData4_3.csv"
## [15] "intensities/20210305_NE_mockData5_1.csv"
## [16] "intensities/20210305_NE_mockData5_2.csv"
## [17] "intensities/20210305_NE_mockData5_3.csv"
## [18] "neighbors/20210305_NE_mockData1_1.csv"
## [19] "neighbors/20210305_NE_mockData1_2.csv"
## [20] "neighbors/20210305_NE_mockData1_3.csv"
## [21] "neighbors/20210305_NE_mockData2_1.csv"
## [22] "neighbors/20210305_NE_mockData2_2.csv"
## [23] "neighbors/20210305_NE_mockData2_3.csv"
## [24] "neighbors/20210305_NE_mockData3_1.csv"
## [25] "neighbors/20210305_NE_mockData3_2.csv"
## [26] "neighbors/20210305_NE_mockData3_3.csv"
## [27] "neighbors/20210305_NE_mockData4_1.csv"
## [28] "neighbors/20210305_NE_mockData4_2.csv"
## [29] "neighbors/20210305_NE_mockData4_3.csv"
## [30] "neighbors/20210305_NE_mockData5_1.csv"
## [31] "neighbors/20210305_NE_mockData5_2.csv"
## [32] "neighbors/20210305_NE_mockData5_3.csv"
## [33] "panel.csv"
## [34] "regionprops/20210305_NE_mockData1_1.csv"
## [35] "regionprops/20210305_NE_mockData1_2.csv"
## [36] "regionprops/20210305_NE_mockData1_3.csv"
## [37] "regionprops/20210305_NE_mockData2_1.csv"
## [38] "regionprops/20210305_NE_mockData2_2.csv"
## [39] "regionprops/20210305_NE_mockData2_3.csv"
## [40] "regionprops/20210305_NE_mockData3_1.csv"
## [41] "regionprops/20210305_NE_mockData3_2.csv"
## [42] "regionprops/20210305_NE_mockData3_3.csv"
## [43] "regionprops/20210305_NE_mockData4_1.csv"
## [44] "regionprops/20210305_NE_mockData4_2.csv"
## [45] "regionprops/20210305_NE_mockData4_3.csv"
## [46] "regionprops/20210305_NE_mockData5_1.csv"
## [47] "regionprops/20210305_NE_mockData5_2.csv"
## [48] "regionprops/20210305_NE_mockData5_3.csv"
## [49] "steinbock.sh"
The example data related to the ImcSegmentationPipeline
and steinbock
can also be accessed online.
The imcRtools
package supports reading in data generated by the
ImcSegmentationPipeline
or steinbock pipeline.
To read in the outpout data into a SpatialExperiment
or SingleCellExperiment, the imcRtools
package
exports the read_cpout
function.
By default, the single-cell data is read into a
SpatialExperiment object. Here, the extracted channel-
and cell-specific intensities are stored in the counts(spe)
slot. All
morphological features are stored in colData(spe)
and the spatial locations of
the cells are stored in spatialCoords(spe)
. The interaction graph is stored in
colPair(spe, "neighbourhood")
.
Alternatively, the data can be read into a
SingleCellExperiment object. The only difference is
the lack of spatialCoords(sce)
. Here, the spatial coordinates are stored in colData(spe)$Pos_X
and colData(spe)$Pos_Y
.
The ImcSegmentationPipeline
produces a number of output files. By default, all single-cell features are
measured and exported. To browse and select the features of interest, the
imcRtools
package provides the show_cpout_features
function:
path <- system.file("extdata/mockData/cpout", package = "imcRtools")
show_cpout_features(path)
By default, read_cpout
will read in the mean intensity per channel and cell
from “hot pixel” filtered image stacks specified via
intensities = "Intensity_MeanIntensity_FullStackFiltered"
. Please refer to
?read_cpout
for the full documentation.
cur_path <- system.file("extdata/mockData/cpout", package = "imcRtools")
# Read as SpatialExperiment
(spe <- read_cpout(cur_path, graph_file = "Object_relationships.csv"))
## class: SpatialExperiment
## dim: 5 209
## metadata(0):
## assays(1): counts
## rownames(5): Sm147 Yb172 Pr141 Eu153 Ag107
## rowData names(5): Tube.Number Metal.Tag Target ilastik full
## colnames: NULL
## colData names(11): sample_id ObjectNumber ... Metadata_acid
## Metadata_description
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id
# Read as SingleCellExperiment
(sce <- read_cpout(cur_path, graph_file = "Object_relationships.csv",
return_as = "sce"))
## class: SingleCellExperiment
## dim: 5 209
## metadata(0):
## assays(1): counts
## rownames(5): Sm147 Yb172 Pr141 Eu153 Ag107
## rowData names(5): Tube.Number Metal.Tag Target ilastik full
## colnames: NULL
## colData names(13): sample_id ObjectNumber ... Metadata_acid
## Metadata_description
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Single-cell data and all associated metadata (e.g. spatial location, morphology
and interaction graphs) as produced by the
steinbock pipeline can be read
in using the read_steinbock
function:
cur_path <- system.file("extdata/mockData/steinbock", package = "imcRtools")
# Read as SpatialExperiment
(spe <- read_steinbock(cur_path))
## class: SpatialExperiment
## dim: 5 218
## metadata(0):
## assays(1): counts
## rownames(5): Ag107 Cytokeratin 5 Laminin YBX1 H3K27Ac
## rowData names(6): channel name ... deepcell Tube.Number
## colnames: NULL
## colData names(8): sample_id ObjectNumber ... width_px height_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id
# Read as SingleCellExperiment
(sce <- read_steinbock(cur_path, return_as = "sce"))
## class: SingleCellExperiment
## dim: 5 218
## metadata(0):
## assays(1): counts
## rownames(5): Ag107 Cytokeratin 5 Laminin YBX1 H3K27Ac
## rowData names(6): channel name ... deepcell Tube.Number
## colnames: NULL
## colData names(10): sample_id ObjectNumber ... width_px height_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
For more information, please refer to ?read_steinbock
.
For reading in and visualization of multi-channel images and segmentation masks,
please refer to the cytomapper package. The
imcRtools
package however supports reading in raw .txt files generated by the
Hyperion imaging system into a CytoImageList
object; a data container exported
by cytomapper
.
The user needs to provide a path from which all .txt files will be read in:
path <- system.file("extdata/mockData/raw", package = "imcRtools")
cur_CytoImageList <- readImagefromTXT(path)
cur_CytoImageList
## CytoImageList containing 3 image(s)
## names(3): 20210305_NE_mockData2_ROI_001_1 20210305_NE_mockData2_ROI_002_2 20210305_NE_mockData2_ROI_003_3
## Each image contains 5 channel(s)
## channelNames(5): Ag107Di Pr141Di Sm147Di Eu153Di Yb172Di
By specifying the pattern
argument, individual or a subset of files can be
read in. For more information, please refer to ?readImagefromTXT
.
When acquiring IMC images, pixel intensities can be influenced by spillover from neighboring channels. To correct for this, Chevrier et al. have developed a staining protocol to acquire individually spotted metal isotopes (Chevrier et al. 2017). Based on these measurements, spillover into neighboring channels can be quantified to correct pixel intensities.
The imcRtools
package provides helper functions that facilitate the correction
of spillover for IMC data. For a full tutorial, please refer to the
IMC data analysis book.
In the first step, the pixel intensities of individually spotted metals need to
be read into a SingleCellExperiment
container for downstream use with the
CATALYST package. For this, the readSCEfromTXT
function can be used:
path <- system.file("extdata/spillover", package = "imcRtools")
sce <- readSCEfromTXT(path)
## Spotted channels: Dy161, Dy162, Dy163, Dy164
## Acquired channels: Dy161, Dy162, Dy163, Dy164
## Channels spotted but not acquired:
## Channels acquired but not spotted:
sce
## class: SingleCellExperiment
## dim: 4 400
## metadata(0):
## assays(1): counts
## rownames(4): 161Dy(Dy161Di) 162Dy(Dy162Di) 163Dy(Dy163Di)
## 164Dy(Dy164Di)
## rowData names(2): channel_name marker_name
## colnames(400): Dy161.1 Dy161.2 ... Dy164.99 Dy164.100
## colData names(9): Start_push End_push ... sample_metal sample_mass
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Here, the example metal spot files are read in. The spot information are stored
in the colData(sce)
slot and channel information are stored in rowData(sce)
.
Each column represents a single pixel.
In the next step, it is crucial to identify potentially mislabeled spots or
spots with low pixel intensities. The imcRtools
package exports the
plotSpotHeatmap
function, which visualizes the aggregated (default median
)
pixel intensities per spot and per metal:
plotSpotHeatmap(sce)
Here, high median pixel intensities can be observed in each spot and their
corresponding channels (visualized on the log10
scale by default).
To quickly identify spot/channel combinations with low signal, the threshold
parameter can be set:
plotSpotHeatmap(sce, log = FALSE, threshold = 200)
If pixel intensities are low, spillover estimation might not be robust.
Therefore, the binAcrossPixels
function can be used to sum consecutive pixels
and enhance the acquired signal. This step is optional for spillover estimation.
sce2 <- binAcrossPixels(sce, bin_size = 5)
plotSpotHeatmap(sce2, log = FALSE, threshold = 200)
Prior to spillover estimation, the CATALYST package
provides the assignPrelim
, estCutoffs
and applyCutoffs
functions to
estimate the spotted mass for each pixel based on their channel intensities.
For more information on the spillover estimation and correction, please refer
to the
CATALYST vignette.
This estimation can be used to identify pixels that cannot be easily assigned
to their spotted mass, potentially indicating pixels with weak signal. To remove
these pixels, the filterPixels
function can be used. This function further
removes pixels assigned to masses, which only contain very few pixels.
library(CATALYST)
bc_key <- as.numeric(unique(sce$sample_mass))
assay(sce, "exprs") <- asinh(counts(sce)/5)
sce <- assignPrelim(sce, bc_key = bc_key)
sce <- estCutoffs(sce)
sce <- applyCutoffs(sce)
# Filter out mislabeled pixels
sce <- filterPixels(sce)
table(sce$bc_id, sce$sample_mass)
##
## 161 162 163 164
## 0 0 0 0 2
## 161 100 0 0 0
## 162 0 100 0 0
## 163 0 0 100 0
## 164 0 0 0 98
Finally, the pre-processed SiingleCellExperiment
object can be used to
generate the spillover matrix using the CATALYST::computeSpillmat
function:
sce <- computeSpillmat(sce)
metadata(sce)$spillover_matrix
## Dy161Di Dy162Di Dy163Di Dy164Di
## Dy161Di 1.000000000 0.031443129 0.009734712 0.006518048
## Dy162Di 0.015715159 1.000000000 0.048116187 0.008250039
## Dy163Di 0.003809504 0.012159704 1.000000000 0.020214651
## Dy164Di 0.005058069 0.008457546 0.028912343 1.000000000
This spillover matrix can be directly applied to compensate the summarized pixel intensities per cell and per channel as described here.
The following section will highlight functions for spatial analyses of the data.
When following the ImcSegmentationPipeline
or steinbock
and reading in the
data using the corresponding functions, the generated graphs are automatically
stored in the colPair(spe, "neighborhood")
slot. Alternatively, the
buildSpatialGraph
function in the imcRtools
package constructs interaction
graphs using either (i) cell-centroid expansion, (ii) k-nearest neighbor search
or (iii) delaunay triangulation.
library(cytomapper)
data("pancreasSCE")
pancreasSCE <- buildSpatialGraph(pancreasSCE, img_id = "ImageNb",
type = "expansion",
threshold = 20)
pancreasSCE <- buildSpatialGraph(pancreasSCE, img_id = "ImageNb",
type = "knn",
k = 5)
pancreasSCE <- buildSpatialGraph(pancreasSCE, img_id = "ImageNb",
type = "delaunay")
colPairNames(pancreasSCE)
## [1] "expansion_interaction_graph" "knn_interaction_graph"
## [3] "delaunay_interaction_graph"
When setting type = "knn"
, by default a directional graph will be build.
Setting directed = FALSE
will create bi-directional edges for each pair of
cells that are connected by at least one edge in the directed setting.
The cells’ locations and constructed graphs can be visualized using the
plotSpatial
function. Here, cells are referred to as “nodes” and cell-cell
interactions are referred to as “edges”. All visual attributes of the nodes
and edges can be set. Either by specifying a variable in colData(spe)
, a
marker name or a single entry using the *_fix
parameters.
library(ggplot2)
library(ggraph)
plotSpatial(pancreasSCE,
img_id = "ImageNb",
node_color_by = "CellType",
node_shape_by = "ImageNb",
node_size_by = "Area",
draw_edges = TRUE,
colPairName = "knn_interaction_graph",
directed = FALSE)
# Colored by expression and with arrows
plotSpatial(pancreasSCE,
img_id = "ImageNb",
node_color_by = "PIN",
assay_type = "exprs",
node_size_fix = 3,
edge_width_fix = 0.2,
draw_edges = TRUE,
colPairName = "knn_interaction_graph",
directed = TRUE,
arrow = grid::arrow(length = grid::unit(0.1, "inch")),
end_cap = ggraph::circle(0.05, "cm"))
# Subsetting the SingleCellExperiment
plotSpatial(pancreasSCE[,pancreasSCE$Pattern],
img_id = "ImageNb",
node_color_by = "CellType",
node_size_fix = 1,
draw_edges = TRUE,
colPairName = "knn_interaction_graph",
directed = TRUE,
scales = "fixed")
The returned object can be further modified using the ggplot2
logic. This
includes changing the node color, shape and size using scale_color_*
,
scale_shape_*
and scale_size_*
. Edge attributes can be altered using the
scale_edge_*
function exported by ggraph
,
The aggregateNeighbors
function can be used to aggregate features of all
neighboring cells for each individual cell. This function operates in two
settings.
metadata
: when aggregating by cell-specific metadata, the
function computes the relative frequencies of all entries to
colData(sce)[[count_by]]
within the direct neighborhood of each cell.expression
: the expression counts of neighboring cells are aggregated
using the specified statistic
(defaults to mean
).Each cell’s neighborhood is defined as endpoints of edges stored in
colPair(sce, colPairName)
.
pancreasSCE <- aggregateNeighbors(pancreasSCE,
colPairName = "knn_interaction_graph",
aggregate_by = "metadata",
count_by = "CellType")
head(pancreasSCE$aggregatedNeighbors)
## DataFrame with 6 rows and 3 columns
## celltype_A celltype_B celltype_C
## <numeric> <numeric> <numeric>
## 1 0 0.0 1.0
## 2 0 0.2 0.8
## 3 0 0.0 1.0
## 4 0 0.0 1.0
## 5 0 0.0 1.0
## 6 0 0.0 1.0
pancreasSCE <- aggregateNeighbors(pancreasSCE,
colPairName = "knn_interaction_graph",
aggregate_by = "expression",
assay_type = "exprs")
head(pancreasSCE$mean_aggregatedExpression)
## DataFrame with 6 rows and 5 columns
## H3 CD99 PIN CD8a CDH
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 2.32500 0.860329 0.092871 0.725000 2.51264
## 2 2.88022 1.629762 0.319527 0.207873 2.46486
## 3 3.10829 0.735389 0.190616 0.255515 1.89484
## 4 2.55842 0.773342 0.124545 0.188629 2.51084
## 5 2.44287 1.126240 0.252129 0.200261 2.61336
## 6 2.65059 0.903869 0.181792 0.196691 2.16434
The returned entries can now be used for clustering to group cells based on their environment (either by aggregated categorical features or expression).
cur_cluster <- kmeans(pancreasSCE$aggregatedNeighbors, centers = 3)
pancreasSCE$clustered_neighbors <- factor(cur_cluster$cluster)
plotSpatial(pancreasSCE,
img_id = "ImageNb",
node_color_by = "CellType",
node_size_fix = 4,
edge_width_fix = 2,
edge_color_by = "clustered_neighbors",
draw_edges = TRUE,
colPairName = "knn_interaction_graph",
directed = FALSE,
nodes_first = FALSE) +
scale_color_brewer(palette = "Set2") +
scale_edge_color_brewer(palette = "Set1")
To exclude cells that are close to the image border, the imcRtools
package
exports the findBorderCells
function.
pancreasSCE <- findBorderCells(pancreasSCE,
img_id = "ImageNb",
border_dist = 10)
plotSpatial(pancreasSCE[,!pancreasSCE$border_cells],
img_id = "ImageNb",
node_color_by = "CellType",
node_size_fix = 4,
edge_width_fix = 2,
edge_color_by = "clustered_neighbors",
draw_edges = TRUE,
colPairName = "knn_interaction_graph",
directed = FALSE,
nodes_first = FALSE) +
scale_color_brewer(palette = "Set2") +
scale_edge_color_brewer(palette = "Set1")
Excluding border cells can be useful when incorrectly connected cells are observed at image borders.
An alternative and supervised way of detecting regions with similar types of
cells is available via the patchDetection
function. Here, the user defines
which cells should be used for patch detection via the patch_cells
parameter.
A patch is defined as a set of cells as defined by patch_cells
which are
weakly connected in the graph in colPair(object, colPairname)
.
Below, the patchDetection
function is demonstrated using the previously
computed expansion
graph and defining cells of celltype_B
as the cells of
interest. Here, the function additionally draws a concave hull around the
detected patch, expands the hull by 20\(\mu{}m\) and defines all cells within this
expanded hulls as patch cells.
pancreasSCE <- patchDetection(pancreasSCE,
patch_cells = pancreasSCE$CellType == "celltype_B",
colPairName = "expansion_interaction_graph",
expand_by = 20,
img_id = "ImageNb")
plotSpatial(pancreasSCE, img_id = "ImageNb", node_color_by = "patch_id")
Patches that only consist of 1 or 2 cells cannot be expanded.
The following section describes how to observe and test the average number of interactions between cell labels (e.g. cell-types) within grouping levels (e.g. images). For full descriptions of the testing approaches, please refer to Shapiro et al., Nature Methods (Schapiro et al. 2017) and Schulz et al., Cell Systems (Schulz et al. 2018)
The imcRtools
package exports the countInteractions
and
testInteractions
functions, which summarize all cell-cell interactions per
grouping level (e.g. image). As a result, a table is returned where each row
represents one of all possible cell-type/cell-type interactions among all
grouping levels. Missing entries or NA
s indicate missing cell-type labels for
this grouping level. The next section gives details on how interactions are
summarized.
The countInteractions
function counts the number of edges (interactions)
between each set of unique cell labels per grouping level.
Simplified, it counts for each cell of type A the number of neighbors of type B.
This count is averaged within each unique grouping level (e.g. image)
in three different ways:
method = "classic"
: The count is divided by the total number of
cells of type A. The final count can be interpreted as “How many neighbors
of type B does a cell of type A have on average?”
method = "histocat"
: The count is divided by the number of cells
of type A that have at least one neighbor of type B. The final count can be
interpreted as “How many many neighbors of type B has a cell of type A on
average, given it has at least one neighbor of type B?”
method = "patch"
: For each cell, the count is binarized to 0
(less than patch_size
neighbors of type B) or 1 (more or equal to
patch_size
neighbors of type B). The binarized counts are averaged
across all cells of type A. The final count can be interpreted as “What
fraction of cells of type A have at least a given number of neighbors of
type B?”
The countInteractions
returns a DataFrame
containing the summarized
counts (ct
) for all combinations of from_label
, to_label
and group_by
.
out <- countInteractions(pancreasSCE,
group_by = "ImageNb",
label = "CellType",
method = "classic",
colPairName = "knn_interaction_graph")
out
## DataFrame with 27 rows and 4 columns
## group_by from_label to_label ct
## <integer> <factor> <factor> <numeric>
## 1 1 celltype_A celltype_A 2.823529
## 2 1 celltype_A celltype_B 0.823529
## 3 1 celltype_A celltype_C 1.352941
## 4 1 celltype_B celltype_A 2.000000
## 5 1 celltype_B celltype_B 0.625000
## ... ... ... ... ...
## 23 3 celltype_B celltype_B 4.00000
## 24 3 celltype_B celltype_C 1.00000
## 25 3 celltype_C celltype_A NA
## 26 3 celltype_C celltype_B 1.13115
## 27 3 celltype_C celltype_C 3.86885
In the next instance, one can test if the obtained count is larger or smaller
compared to what is expected from a random distribution of cell labels. For
this, the testInteractions
function permutes the cell labels iter
times
and counts interactions as described above. This approach generates a
distribution of the interaction count under a random distribution of cell
labels. The observed interaction count is compared against this Null
distribution to derive empirical p-values:
p_gt
: fraction of perturbations equal or greater than the observed count
p_lt
: fraction of perturbations equal or less than the observed count
Based on these empirical p-values, the interaction
score (attraction
or avoidance), overall p
value and significance by comparison to
p_treshold
(sig
and sigval
) are derived. All results are returned in form
of a DataFrame
.
out <- testInteractions(pancreasSCE,
group_by = "ImageNb",
label = "CellType",
method = "classic",
colPairName = "knn_interaction_graph")
out
## DataFrame with 27 rows and 10 columns
## group_by from_label to_label ct p_gt p_lt
## <integer> <factor> <factor> <numeric> <numeric> <numeric>
## 1 1 celltype_A celltype_A 2.823529 0.000999001 1.000000000
## 2 1 celltype_A celltype_B 0.823529 0.000999001 1.000000000
## 3 1 celltype_A celltype_C 1.352941 1.000000000 0.000999001
## 4 1 celltype_B celltype_A 2.000000 0.000999001 1.000000000
## 5 1 celltype_B celltype_B 0.625000 0.133866134 0.921078921
## ... ... ... ... ... ... ...
## 23 3 celltype_B celltype_B 4.00000 0.000999001 1.000000000
## 24 3 celltype_B celltype_C 1.00000 1.000000000 0.000999001
## 25 3 celltype_C celltype_A NA NA NA
## 26 3 celltype_C celltype_B 1.13115 1.000000000 0.000999001
## 27 3 celltype_C celltype_C 3.86885 0.000999001 1.000000000
## interaction p sig sigval
## <logical> <numeric> <logical> <numeric>
## 1 TRUE 0.000999001 TRUE 1
## 2 TRUE 0.000999001 TRUE 1
## 3 FALSE 0.000999001 TRUE -1
## 4 TRUE 0.000999001 TRUE 1
## 5 TRUE 0.133866134 FALSE 0
## ... ... ... ... ...
## 23 TRUE 0.000999001 TRUE 1
## 24 FALSE 0.000999001 TRUE -1
## 25 NA NA NA NA
## 26 FALSE 0.000999001 TRUE -1
## 27 TRUE 0.000999001 TRUE 1
Large chunks of the code of the imcRtools
package is based on the work
of Vito Zanotelli. Vito has co-developed the
spillover correction approach and translated the interaction testing approach
developed by Denis Shapiro and
Jana Fischer into R (formerly
known as the neighbouRhood
R package). Jana has furthermore added the “patch” method for interaction
counting and testing. Tobias Hoch has
written the first version of reading in the ImcSegmentationPipeline
output
and the patchDetection
function.
Daniel Schulz has build the aggregateNeighbors
function and contributed to developing the spatial clustering approach.
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggraph_2.0.5 ggplot2_3.3.6
## [3] cytomapper_1.8.0 EBImage_4.38.0
## [5] CATALYST_1.20.1 imcRtools_1.2.3
## [7] SpatialExperiment_1.6.0 SingleCellExperiment_1.18.0
## [9] SummarizedExperiment_1.26.1 Biobase_2.56.0
## [11] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
## [13] IRanges_2.30.0 S4Vectors_0.34.0
## [15] BiocGenerics_0.42.0 MatrixGenerics_1.8.0
## [17] matrixStats_0.62.0 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] scattermore_0.8 flowWorkspace_4.8.0
## [3] R.methodsS3_1.8.1 tidyr_1.2.0
## [5] bit64_4.0.5 knitr_1.39
## [7] irlba_2.3.5 multcomp_1.4-19
## [9] DelayedArray_0.22.0 R.utils_2.11.0
## [11] data.table_1.14.2 RCurl_1.98-1.6
## [13] doParallel_1.0.17 generics_0.1.2
## [15] flowCore_2.8.0 ScaledMatrix_1.4.0
## [17] TH.data_1.1-1 terra_1.5-21
## [19] cowplot_1.1.1 proxy_0.4-26
## [21] ggpointdensity_0.1.0 bit_4.0.4
## [23] tzdb_0.3.0 xml2_1.3.3
## [25] httpuv_1.6.5 assertthat_0.2.1
## [27] viridis_0.6.2 xfun_0.31
## [29] hms_1.1.1 jquerylib_0.1.4
## [31] evaluate_0.15 promises_1.2.0.1
## [33] fansi_1.0.3 Rgraphviz_2.40.0
## [35] igraph_1.3.1 DBI_1.1.2
## [37] htmlwidgets_1.5.4 purrr_0.3.4
## [39] ellipsis_0.3.2 crosstalk_1.2.0
## [41] dplyr_1.0.9 ggcyto_1.24.0
## [43] ggnewscale_0.4.7 ggpubr_0.4.0
## [45] backports_1.4.1 V8_4.2.0
## [47] cytolib_2.8.0 bookdown_0.26
## [49] svgPanZoom_0.3.4 RcppParallel_5.1.5
## [51] sparseMatrixStats_1.8.0 vctrs_0.4.1
## [53] abind_1.4-5 withr_2.5.0
## [55] ggforce_0.3.3 aws.signature_0.6.0
## [57] vroom_1.5.7 svglite_2.1.0
## [59] cluster_2.1.3 crayon_1.5.1
## [61] drc_3.0-1 labeling_0.4.2
## [63] edgeR_3.38.1 pkgconfig_2.0.3
## [65] units_0.8-0 tweenr_1.0.2
## [67] vipor_0.4.5 rlang_1.0.2
## [69] lifecycle_1.0.1 sandwich_3.0-1
## [71] rsvd_1.0.5 polyclip_1.10-0
## [73] graph_1.74.0 tiff_0.1-11
## [75] Matrix_1.4-1 raster_3.5-15
## [77] carData_3.0-5 zoo_1.8-10
## [79] Rhdf5lib_1.18.2 base64enc_0.1-3
## [81] beeswarm_0.4.0 RTriangle_1.6-0.10
## [83] ggridges_0.5.3 GlobalOptions_0.1.2
## [85] pheatmap_1.0.12 png_0.1-7
## [87] viridisLite_0.4.0 rjson_0.2.21
## [89] bitops_1.0-7 shinydashboard_0.7.2
## [91] R.oo_1.24.0 ConsensusClusterPlus_1.60.0
## [93] KernSmooth_2.23-20 rhdf5filters_1.8.0
## [95] DelayedMatrixStats_1.18.0 shape_1.4.6
## [97] classInt_0.4-3 stringr_1.4.0
## [99] readr_2.1.2 jpeg_0.1-9
## [101] rstatix_0.7.0 ggsignif_0.6.3
## [103] aws.s3_0.3.21 beachmat_2.12.0
## [105] scales_1.2.0 magrittr_2.0.3
## [107] plyr_1.8.7 hexbin_1.28.2
## [109] zlibbioc_1.42.0 compiler_4.2.0
## [111] dqrng_0.3.0 concaveman_1.1.0
## [113] RColorBrewer_1.1-3 plotrix_3.8-2
## [115] clue_0.3-61 cli_3.3.0
## [117] XVector_0.36.0 ncdfFlow_2.42.0
## [119] FlowSOM_2.4.0 MASS_7.3-57
## [121] tidyselect_1.1.2 stringi_1.7.6
## [123] RProtoBufLib_2.8.0 highr_0.9
## [125] yaml_2.3.5 BiocSingular_1.12.0
## [127] locfit_1.5-9.5 latticeExtra_0.6-29
## [129] ggrepel_0.9.1 grid_4.2.0
## [131] sass_0.4.1 tools_4.2.0
## [133] parallel_4.2.0 CytoML_2.8.0
## [135] circlize_0.4.15 foreach_1.5.2
## [137] gridExtra_2.3 farver_2.1.0
## [139] Rtsne_0.16 DropletUtils_1.16.0
## [141] digest_0.6.29 BiocManager_1.30.18
## [143] shiny_1.7.1 Rcpp_1.0.8.3
## [145] car_3.0-13 broom_0.8.0
## [147] scuttle_1.6.2 later_1.3.0
## [149] httr_1.4.3 sf_1.0-7
## [151] ComplexHeatmap_2.12.0 colorspace_2.0-3
## [153] XML_3.99-0.9 splines_4.2.0
## [155] RBGL_1.72.0 scater_1.24.0
## [157] graphlayouts_0.8.0 sp_1.5-0
## [159] systemfonts_1.0.4 xtable_1.8-4
## [161] jsonlite_1.8.0 tidygraph_1.2.1
## [163] R6_2.5.1 pillar_1.7.0
## [165] htmltools_0.5.2 mime_0.12
## [167] nnls_1.4 glue_1.6.2
## [169] fastmap_1.1.0 DT_0.23
## [171] BiocParallel_1.30.3 BiocNeighbors_1.14.0
## [173] fftwtools_0.9-11 class_7.3-20
## [175] codetools_0.2-18 mvtnorm_1.1-3
## [177] utf8_1.2.2 lattice_0.20-45
## [179] bslib_0.3.1 tibble_3.1.7
## [181] curl_4.3.2 ggbeeswarm_0.6.0
## [183] colorRamps_2.3.1 gtools_3.9.2.1
## [185] magick_2.7.3 survival_3.3-1
## [187] limma_3.52.1 rmarkdown_2.14
## [189] munsell_0.5.0 e1071_1.7-9
## [191] GetoptLong_1.0.5 rhdf5_2.40.0
## [193] GenomeInfoDbData_1.2.8 iterators_1.0.14
## [195] HDF5Array_1.24.1 reshape2_1.4.4
## [197] gtable_0.3.0
Angelo, Michael, Sean C. Bendall, Rachel Finck, Matthew B. Hale, Chuck Hitzman, Alexander D. Borowsky, Richard M. Levenson, et al. 2014. “Multiplexed Ion Beam Imaging of Human Breast Tumors.” Nature Medicine 20 (4): 436–42.
Bodenmiller, Bernd. 2016. “Multiplexed Epitope-Based Tissue Imaging for Discovery and Healthcare Applications.” Cell Systems 2: 225–38.
Chen, Kok Hao, Alistair N. Boettiger, Jeffrey R. Moffitt, Siyuan Wang, and Xiaowei Zhuang. 2015. “Spatially Resolved, Highly Multiplexed Rna Profiling in Single Cells.” Science 348: aaa6090.
Chevrier, Stéphane, Helena L. Crowell, Vito R.T. Zanotelli, Stefanie Engler, Mark D. Robinson, and Bernd Bodenmiller. 2017. “Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry.” Cell Systems 6: 612–20.
Giesen, Charlotte, Hao A.O. Wang, Denis Schapiro, Nevena Zivanovic, Andrea Jacobs, Bodo Hattendorf, Peter J. Schüffler, et al. 2014. “Highly Multiplexed Imaging of Tumor Tissues with Subcellular Resolution by Mass Cytometry.” Nature Methods 11 (4): 417–22.
Greenwald, Noah F, Geneva Miller, Erick Moen, Alex Kong, Adam Kagel, Christine Camacho, Brianna J Mcintosh, et al. 2021. “Whole-Cell Segmentation of Tissue Images with Human-Level Performance Using Large-Scale Data Annotation and Deep Learning.” bioRxiv, 1–29.
Gut, Gabriele, Markus D Herrmann, and Lucas Pelkmans. 2018. “Multiplexed Protein Maps Link Subcellular Organization to Cellular States.” Science 361: 1–13.
Lin, Jia-Ren, Benjamin Izar, Shu Wang, Clarence Yapp, Shaolin Mei, Parin M. Shah, Sandro Santagata, and Peter K. Sorger. 2018. “Highly Multiplexed Immunofluorescence Imaging of Human Tissues and Tumors Using T-Cycif and Conventional Optical Microscopes.” eLife 7: 1–46.
Lubeck, Eric, Ahmet F Coskun, Timur Zhiyentayev, Mubhij Ahmad, and Long Cai. 2014. “Single-Cell in Situ Rna Profiling by Sequential Hybridization.” Nature Methods 11: 360–61.
Schapiro, Denis, Hartland W Jackson, Swetha Raghuraman, Jana R Fischer, Vito RT Zanotelli, Daniel Schulz, Charlotte Giesen, Raúl Catena, Zsuzsanna Varga, and Bernd Bodenmiller. 2017. “HistoCAT: Analysis of Cell Phenotypes and Interactions in Multiplex Image Cytometry Data.” Nature Methods 14: 873–76.
Schulz, Daniel, Vito RT Zanotelli, Rana R Fischer, Denis Schapiro, Stefanie Engler, Xiao-Kang Lun, Hartland W Jackson, and Bernd Bodenmiller. 2018. “Simultaneous Multiplexed Imaging of mRNA and Proteins with Subcellular Resolution in Breast Cancer Tissue Samples by Mass Cytometry.” Cell Systems 6: 25–36.e5.