---
title: "Smoothclust Tutorial"
author: 
  - name: Lukas M. Weber
    affiliation: "Boston University"
package: smoothclust
output: 
  BiocStyle::html_document: 
    toc_float: true
vignette: >
  %\VignetteIndexEntry{Smoothclust Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


# Introduction

`smoothclust` is a method for segmentation of spatial domains and spatially-aware clustering in spatial transcriptomics data. The method generates spatial domains with smooth boundaries by smoothing gene expression profiles across neighboring spatial locations, followed by unsupervised clustering. Spatial domains consisting of consistent mixtures of cell types may then be further investigated by applying cell type compositional analyses or differential analyses.


# Installation

The `smoothclust` package can be installed from Bioconductor as follows (using R version 4.4 and Bioconductor version 3.19, available from approximately 2024-05-01 onwards). Additional details are shown on the [Bioconductor](https://bioconductor.org/packages/smoothclust) package landing page.

```{r, eval=FALSE}
install.packages("BiocManager")
BiocManager::install("smoothclust")
```

The latest development version of the package can be installed from the [devel](https://contributions.bioconductor.org/use-devel.html) version of Bioconductor or from [GitHub](https://github.com/lmweber/smoothclust).


# Input data format

Input data can be provided either as a [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment) object within the Bioconductor framework, or as numeric matrices of expression values and spatial coordinates. See help file (`?smoothclust`) for details.

In the example workflow below, we assume the input is in `SpatialExperiment` format.


# Tutorial

The example workflow in this section demonstrates how to run `smoothclust` to generate spatial domains with smooth boundaries for a dataset from the 10x Genomics Visium platform.

```{r, message=FALSE}
library(smoothclust)
library(STexampleData)
library(scuttle)
library(scran)
library(scater)
library(ggspavis)
```


Load dataset from `STexampleData` package:

```{r}
# load data
spe <- Visium_humanDLPFC()

# keep spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]

dim(spe)
assayNames(spe)
```


Run `smoothclust` using default parameter settings, which have been selected to be appropriate for Visium data from human tissue.

The method for smoothing can be specified by providing the `method` argument, with available options `uniform`, `kernel`, and `knn`. Additional arguments can be used to set parameter values, including `bandwidth`, `k,` and `truncate`, depending on the choice of method. For more details, see the function documentation (`?smoothclust`) or the paper (in preparation).

```{r, results="hide"}
# run smoothclust
spe <- smoothclust(spe)
```


The smoothed expression counts are stored in a new assay named `counts_smooth`:

```{r}
# check output object
assayNames(spe)
```


Calculate log-transformed normalized counts (logcounts) on the smoothed expression counts. Here, the argument `assay.type = "counts_smooth"` specifies that we want to calculate logcounts using the smoothed counts from `smoothclust`.

```{r}
# calculate logcounts
spe <- logNormCounts(spe, assay.type = "counts_smooth")

assayNames(spe)
```


Run clustering. We use a standard clustering workflow from single-cell data, consisting of k-means clustering on the top principal components (PCs) calculated on the set of top highly variable genes (HVGs) with logcounts as the input.

We use a relatively small number of clusters for demonstration purposes in this example:

```{r}
# preprocessing steps for clustering

# remove mitochondrial genes
is_mito <- grepl("(^mt-)", rowData(spe)$gene_name, ignore.case = TRUE)
table(is_mito)
spe <- spe[!is_mito, ]
dim(spe)

# select top highly variable genes (HVGs)
dec <- modelGeneVar(spe)
top_hvgs <- getTopHVGs(dec, prop = 0.1)
length(top_hvgs)
spe <- spe[top_hvgs, ]
dim(spe)
```

```{r}
# dimensionality reduction

# compute PCA on top HVGs
set.seed(123)
spe <- runPCA(spe)
```

```{r}
# run k-means clustering
set.seed(123)
k <- 5
clust <- kmeans(reducedDim(spe, "PCA"), centers = k)$cluster
table(clust)
colLabels(spe) <- factor(clust)
```


Plot clusters / spatial domains generated by the smoothclust workflow:

```{r}
# color palettes
pal8 <- "libd_layer_colors"
pal36 <- unname(palette.colors(36, "Polychrome 36"))

# plot clusters / spatial domains
plotSpots(spe, annotate = "label", pal = pal8)
```


Plot manually annotated reference labels, which can be used to evaluate the performance of the clustering in this dataset:

```{r}
# plot reference labels
plotSpots(spe, annotate = "ground_truth", pal = pal8)
```


# Session information

```{r}
sessionInfo()
```