---
title: "Introduction to Clustering of Local Indicators of Spatial Assocation (LISA) curves"
date: "`r BiocStyle::doc_date()`"
author:
- name: Nicolas Canete
  affiliation:  
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  email: nicolas.canete@sydney.edu.au
- name: Ellis Patrick
  affiliation:
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
  email: ellis.patrick@sydney.edu.au
- name: Alex Run Qin
  affiliation:
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
  email: alex.qin@sydney.edu.au
package: "`r BiocStyle::pkg_ver('lisaClust')`"
abstract: > 
  Identify and visualise regions of cell type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.
vignette: >
  %\VignetteIndexEntry{"Inroduction to lisaClust"}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
output: 
  BiocStyle::html_document
bibliography: REFERENCES.bib
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, message = FALSE, warning = FALSE
)
library(BiocStyle)
```

# Installation

```{r, eval = FALSE}
if (!require("BiocManager")) {
  install.packages("BiocManager")
}
BiocManager::install("lisaClust")
```

```{r message=FALSE, warning=FALSE}
# load required packages
library(lisaClust)
library(spicyR)
library(ggplot2)
library(SingleCellExperiment)
library(SpatialDatasets)
```

# Overview

Clustering local indicators of spatial association (LISA) functions is a
methodology for identifying consistent spatial organisation of multiple
cell-types in an unsupervised way. This can be used to enable the
characterization of interactions between multiple cell-types
simultaneously and can complement traditional pairwise analysis. In our
implementation our LISA curves are a localised summary of an L-function
from a Poisson point process model. Our framework `lisaClust` can be
used to provide a high-level summary of cell-type colocalization in
high-parameter spatial cytometry data, facilitating the identification
of distinct tissue compartments or identification of complex cellular
microenvironments.

# Quick start

## Generate toy data

To illustrate our `lisaClust` framework, we consider a very simple
toy example where two cell-types are completely separated spatially. We
simulate data for two different images.

```{r eval=T}
set.seed(51773)
x <- round(c(
  runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3,
  runif(200) + 3, runif(200) + 2, runif(200) + 1, runif(200)
), 4) * 100
y <- round(c(
  runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3,
  runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3
), 4) * 100
cellType <- factor(paste("c", rep(rep(c(1:2), rep(200, 2)), 4), sep = ""))
imageID <- rep(c("s1", "s2"), c(800, 800))

cells <- data.frame(x, y, cellType, imageID)

ggplot(cells, aes(x, y, colour = cellType)) +
  geom_point() +
  facet_wrap(~imageID) +
  theme_minimal()
```

## Create Single Cell Experiment object

First we store our data in a `SingleCellExperiment` object.

```{r}
SCE <- SingleCellExperiment(colData = cells)
SCE
```

## Running lisaCLust

We can then use the convenience function `lisaClust` to simultaneously
calculate local indicators of spatial association (LISA) functions and perform 
k-means clustering. The number of clusters can be specified with the `k =` parameter. 
In the example below, we've chosen `k = 2`, resulting in a total of 2 clusters. The cell type column can be specified using the `cellType = ` argument. By default, `lisaClust` uses the column named `cellType`.

The clusters identified by `lisaClust` are stored in `colData` of the `SingleCellExperiment`
object as a new column called `regions`.

```{r}
SCE <- lisaClust(SCE, k = 2)
colData(SCE) |> head()
```

## Plot identified regions

`lisaClust` also provides the convenient `hatchingPlot` function to
visualise the different regions that have been demarcated by the
clustering. `hatchingPlot` outputs a `ggplot` object where the regions
are marked by different hatching patterns. In a real biological dataset,
this allows us to plot both regions and cell-types on the same
visualization.

In the example below, we can visualise our stimulated data where our 2
cell types have been separated neatly into 2 distinct regions based on
which cell type each region is dominated by. `region_2` is dominated by
the red cell type `c1`, and `region_1` is dominated by the blue cell
type `c2`.

```{r}
hatchingPlot(SCE, useImages = c("s1", "s2"))
```
## Using other clustering methods.

While the `lisaClust` function is convenient, we have not implemented an exhaustive
suite of clustering methods as it is very easy to do this yourself. There are 
just two simple steps.

### Generate LISA curves

We can calculate local indicators of spatial association (LISA) functions 
using the `lisa` function. Here the LISA curves are a 
localised summary of an L-function from a Poisson point process model. The radii 
that will be calculated over can be set with `Rs`.

```{r}
lisaCurves <- lisa(SCE, Rs = c(20, 50, 100))

head(lisaCurves)
```

### Perform some clustering

The LISA curves can then be used to cluster the cells. Here we use k-means 
clustering. However, other clustering methods like SOM could also be used. We can store these 
cell clusters or cell "regions" in our `SingleCellExperiment` object.

```{r}
# Custom clustering algorithm
kM <- kmeans(lisaCurves, 2)

# Storing clusters into colData
colData(SCE)$custom_region <- paste("region", kM$cluster, sep = "_")
colData(SCE) |> head()
```


# Keren et al. breast cancer data.

Next, we apply our `lisaClust` framework to two images of breast cancer obtained by @keren2018.

## Read in data

We will start by reading in the data from the `SpatialDatasets` package
as a `SingleCellExperiment` object. Here the data is in a format consistent with
that outputted by CellProfiler.

```{r}
kerenSPE <- SpatialDatasets::spe_Keren_2018()
```

## Generate LISA curves

This data includes annotation of the cell-types of each cell. Hence, we can
move directly to performing k-means clustering on the local indicators of
spatial association (LISA) functions using the `lisaClust` function, remembering
to specify the `imageID`, `cellType`, and `spatialCoords` 
columns in `colData`. For the purpose of demonstration, we will be using
only images 5 and 6 of the `kerenSPE` dataset.

```{r}
kerenSPE <- kerenSPE[,kerenSPE$imageID %in% c("5", "6")]

kerenSPE <- lisaClust(kerenSPE,
  k = 5
)
```

These regions are stored in `colData` and can be extracted.

```{r}
colData(kerenSPE)[, c("imageID", "region")] |>
  head(20)
```

## Examine cell type enrichment

`lisaClust` also provides a convenient function, `regionMap`, for examining which 
cell types are located in which regions. In this example, we use this to check
which cell types appear more frequently in each region than expected by chance.

Here, we clearly see that healthy epithelial and mesenchymal tissue are highly
concentrated in region 1, immune cells are concentrated in regions 2 and 4, 
whilst tumour cells are concentrated in region 3.

We can further segregate these cells by increasing the number of clusters, i.e.,
increasing the parameter `k = ` in the `lisaClust()` function. For the purposes
of demonstration, let's take a look at the `hatchingPlot` of these regions.

```{r}
regionMap(kerenSPE,
  type = "bubble"
)
```

## Plot identified regions

Finally, we can use `hatchingPlot` to construct a `ggplot` object where
the regions are marked by different hatching patterns. This allows us to
visualize the 5 regions and 17 cell-types simultaneously.

```{r fig.height=7, fig.width=9}
hatchingPlot(kerenSPE, nbp = 300)
```

# References


# sessionInfo()

```{r}
sessionInfo()
```