---
title: "Introduction to spatialDE"
author: 
  - name: Davide Corso
    affiliation:
    - University of Padova
    email: davide.corso.2@phd.unipd.it
  - name: Milan Malfait
    affiliation:
    - Ghent University
    email: milan.malfait94@gmail.com
  - name: Lambda Moses
    affiliation:
    - California Institute of Technology
    email: dlu2@caltech.edu
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
    df_print: paged
date: "`r BiocStyle::doc_date()`"
vignette: >
  %\VignetteIndexEntry{Introduction to spatialDE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

# Introduction

[SpatialDE](https://github.com/Teichlab/SpatialDE) by [Svensson et al., 2018](https://www.nature.com/articles/nmeth.4636), is a method to identify spatially variable genes (SVGs) in spatially resolved transcriptomics data.


# Installation

You can install `r BiocStyle::Biocpkg("spatialDE")` from *Bioconductor* with the
following code:

```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("spatialDE")
```


# Example: [Mouse Olfactory Bulb](https://github.com/Teichlab/SpatialDE/tree/master/Analysis/MouseOB)

Reproducing the
[example analysis](https://github.com/Teichlab/SpatialDE#spatialde-significance-test-example-use) from the original `r basilisk::PyPiLink("SpatialDE")` Python package.

```{r setup}
library(spatialDE)
library(ggplot2)
```


## Load data
Files originally retrieved from SpatialDE GitHub repository from the following links:
https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/data/Rep11_MOB_0.csv
https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/MOB_sample_info.csv

```{r load-data}
# Expression file used in python SpatialDE. 
data("Rep11_MOB_0")

# Sample Info file used in python SpatialDE
data("MOB_sample_info")
```

The `Rep11_MOB_0` object contains spatial expression data for
`r nrow(Rep11_MOB_0)` genes on `r ncol(Rep11_MOB_0)` spots, with genes as rows
and spots as columns.

```{r}
Rep11_MOB_0[1:5, 1:5]
dim(Rep11_MOB_0)
```

The `MOB_sample_info` object contains a `data.frame` with coordinates for each
spot.

```{r}
head(MOB_sample_info)
```

### Filter out pratically unobserved genes

```{r}
Rep11_MOB_0 <- Rep11_MOB_0[rowSums(Rep11_MOB_0) >= 3, ]
```

### Get total_counts for every spot

```{r}
Rep11_MOB_0 <- Rep11_MOB_0[, row.names(MOB_sample_info)]
MOB_sample_info$total_counts <- colSums(Rep11_MOB_0)
head(MOB_sample_info)
```

### Get coordinates from `MOB_sample_info`

```{r}
X <- MOB_sample_info[, c("x", "y")]
head(X)
```

## `stabilize`

The SpatialDE method assumes normally distributed data, so we stabilize the variance of the negative binomial distributed counts data using Anscombe's approximation.
The `stabilize()` function takes as input a `data.frame` of expression values with samples in columns and genes in rows. Thus, in this case, we have to transpose the data.

```{r stabilize}
norm_expr <- stabilize(Rep11_MOB_0)
norm_expr[1:5, 1:5]
```

## `regress_out`

Next, we account for differences in library size between the samples by regressing out the effect of the total counts for each gene using linear regression.

```{r regres_out}
resid_expr <- regress_out(norm_expr, sample_info = MOB_sample_info)
resid_expr[1:5, 1:5]
```

## `run`

To reduce running time, the SpatialDE test is run on a subset of 1000 genes.
Running it on the complete data set takes about 10 minutes.

```{r run-spatialDE}
# For this example, run spatialDE on the first 1000 genes
sample_resid_expr <- head(resid_expr, 1000)

results <- spatialDE::run(sample_resid_expr, coordinates = X)
head(results[order(results$qval), ])
```

## `model_search`

Finally, we can classify the DE genes to interpetable DE classes using the `model_search` function.
We apply the model search on filtered DE results, using a threshold of 0.05 for the Q-values.

```{r model_search}
de_results <- results[results$qval < 0.05, ]

ms_results <- model_search(
    sample_resid_expr,
    coordinates = X,
    de_results = de_results
)

# To show ms_results sorted on qvalue, uncomment the following line
# head(ms_results[order(ms_results$qval), ])

head(ms_results)
```

## `spatial_patterns`

<!-- TO DO: maybe add some more explanation of what the `n_patterns` and `length` parameters are? -->

Furthermore, we can group spatially variable genes (SVGs) into spatial patterns using automatic expression histology (AEH).

```{r spatial_patterns}
sp <- spatial_patterns(
    sample_resid_expr,
    coordinates = X,
    de_results = de_results,
    n_patterns = 4L, length = 1.5
)
sp$pattern_results
```

## Plots

Visualizing one of the most significant genes.

```{r fig1}
gene <- "Pcp4"

ggplot(data = MOB_sample_info, aes(x = x, y = y, color = norm_expr[gene, ])) +
    geom_point(size = 7) +
    ggtitle(gene) +
    scale_color_viridis_c() +
    labs(color = gene)
```

### Plot Spatial Patterns of Multiple Genes

As an alternative to the previous figure, we can plot multiple genes using the
normalized expression values.

```{r fig2, fig.height = 10, fig.width = 10}
ordered_de_results <- de_results[order(de_results$qval), ]

multiGenePlots(norm_expr,
    coordinates = X,
    ordered_de_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)
```

## Plot Fraction Spatial Variance vs Q-value

```{r}
FSV_sig(results, ms_results)
```


# SpatialExperiment integration

The SpatialDE workflow can also be executed with a
`r BiocStyle::Biocpkg("SpatialExperiment")` object as input. 

```{r, message=FALSE}
library(SpatialExperiment)

# For SpatialExperiment object, we neeed to transpose the counts matrix in order
# to have genes on rows and spot on columns. 
# For this example, run spatialDE on the first 1000 genes

partial_counts <- head(Rep11_MOB_0, 1000)

spe <- SpatialExperiment(
  assays = list(counts = partial_counts),
  spatialData = DataFrame(MOB_sample_info[, c("x", "y")]),
  spatialCoordsNames = c("x", "y")
)

out <- spatialDE(spe, assay_type = "counts", verbose = FALSE)
head(out[order(out$qval), ])
```

## Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)

We can plot spatial patterns of multiples genes using the `spe` object.

```{r fig3, fig.height = 10, fig.width = 10}
spe_results <- out[out$qval < 0.05, ]

ordered_spe_results <- spe_results[order(spe_results$qval), ]

multiGenePlots(spe,
    assay_type = "counts",
    ordered_spe_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)
```

## Classify spatially variable genes with `model_search` and `spatial_patterns`

```{r}
msearch <- modelSearch(spe,
    de_results = out, qval_thresh = 0.05,
    verbose = FALSE
)

head(msearch)
```

```{r}
spatterns <- spatialPatterns(spe,
    de_results = de_results, qval_thresh = 0.05,
    n_patterns = 4L, length = 1.5, verbose = FALSE
)

spatterns$pattern_results
```


# `sessionInfo` {-}

<details><summary>Session info</summary>

```{r session_info, echo=FALSE, cache=FALSE}
Sys.time()
sessionInfo()
```

</details>