--- title: "Overview Functional Data Analysis of Spatial Metrics" author: - name: "Martin Emons" affiliation: - &DMLS Department of Molecular Life Sciences, University of Zurich, Switzerland - &SIB SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland email: "martin.emons@uzh.ch" - name: Fabian Scheipl affiliation: - &LMU Department of Statistics, Ludwig-Maximilians-Universität München, Germany - name: Mark D. Robinson affiliation: - *DMLS - *SIB package: "`r BiocStyle::Biocpkg('spatialFDA')`" output: BiocStyle::html_document abstract: > `spatialFDA` is a package to calculate spatial statistics metrics and compare them with methods from functional data analysis. Here, we show how to perform a standard spatial analysis using `spatialFDA`. vignette: > %\VignetteIndexEntry{Overview Functional Data Analysis of Spatial Metrics} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} bibliography: spatialFDA.bib editor_options: chunk_output_type: console --- ```{r v1, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = TRUE ) ``` # Introduction This vignette serves as an overview how to use `r BiocStyle::Biocpkg('spatialFDA')` to perform functional data analysis on spatial statistics metrics. The main aim of this package is to detect differential spatial arrangements between cells in multi-sample/condition experiments. It combines the the estimation with the `r BiocStyle::CRANpkg('spatstat')` package with the inference of `r BiocStyle::CRANpkg('refund')` [@spatstat2005; @baddeleySpatialPointPatterns; @refund2024]. This vignette is a brief overview version of the "Detailed Functional Data Analysis of Spatial Metrics" vignette. The content and code is in parts overlapping. The use case is a dataset from @damondMapHumanType2019 which contains images from 12 human donors. The raw data is published under a `CC-BY-4.0` License on [Mendeley](https://data.mendeley.com/datasets/cydmwsfztj/2). # Installation `r BiocStyle::Biocpkg('spatialFDA')` can be installed and loaded from Bioconductor as follows ```{r installation, include = TRUE, eval = FALSE} if (!requireNamespace("BiocManager")) { install.packages("BiocManager") } BiocManager::install("spatialFDA") ``` ```{r setup, warning = FALSE, message = FALSE} library("spatialFDA") library("dplyr") library("ggplot2") library("tidyr") library("stringr") library("dplyr") library("patchwork") library("SpatialExperiment") set.seed(1234) ``` # Getting started In this vignette we will analyse a diabetes dataset acquired by imaging mass cytometry (IMC) as acquired by @damondMapHumanType2019. The dataset contains images from 12 human donors, 4 healthy and 8 with type 1 diabetes (T1D). With IMC, 35 markers were measured at single cell resolution [@damondMapHumanType2019]. ## Loading the data The @damondMapHumanType2019 dataset is easily loaded from `ExperimentHub` via a small reader function `.loadExample()`. The entire dataset can be loaded by setting `full = TRUE`. For computational reasons, one can reduce to three patients as well by setting this flag to `FALSE`. We will subset the entire dataset to two samples per condition in order to have a multi-condition/multi-sample setting. The package offers multiple datatypes, we will use the `r BiocStyle::Biocpkg('SpatialExperiment')` (SPE) object [@righelli2022spatialexperiment]. ```{r loading, warning = FALSE, message = FALSE} # retrieve example data from Damond et al. (2019) spe <- .loadExample(full = TRUE) spe <- subset(spe, ,patient_id %in% c(6089,6180,6126,6134,6228,6414)) # set cell types as factors colData(spe)$cell_type <- as.factor(colData(spe)$cell_type) ``` ## Visualising the raw data We can look at the fields of view (FOVs) of the diabetes dataset. To do so we extract the spatial coordinates, store them as a dataframe and add the colData from the SPE to this. We will look only at the first four FOVs of the healthy sample. We plot both the cell categories of all cells and then the cell types of secretory cells ($\alpha, \beta$ and $\delta$ cells) and T-cells (CD8+ and CD4+ T-cells). ```{r plotting fovs, warning = FALSE, fig.width=8, fig.height=15} df <- data.frame(spatialCoords(spe), colData(spe)) dfSub <- df %>% subset(image_name %in% c("E02", "E03", "E04", "E05")) p <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_category)) + geom_point(size= 0.5) + facet_wrap(~image_name) + theme(legend.title.size = 20, legend.text.size = 20) + xlab("x") + ylab("y") + labs(color = "cell category")+ coord_equal() + theme_light() dfSub <- dfSub %>% subset(cell_type %in% c("alpha", "beta", "delta", "Th", "Tc")) q <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_type)) + geom_point(size= 0.5) + facet_wrap(~image_name) + theme(legend.title.size = 20, legend.text.size = 20) + xlab("x") + ylab("y") + labs(color = "cell type") + coord_equal() + theme_light() wrap_plots(list(p,q), widths = c(1,1), heights = c(1,1), nrow = 2, ncol = 1) ``` # Spatial Inference `r BiocStyle::Biocpkg('spatialFDA')` consists of two main steps, the calculation of the spatial statistics function per individual image and the comparison of these functions via functional data analysis (FDA). `r BiocStyle::Biocpkg('spatialFDA')` contains the convenience function `spatialInference` which streamlines the estimation explained in the "Detailed Functional Data Analysis of Spatial Metrics" vignette. In order to perform the spatial inference, we have to pass the `r BiocStyle::Biocpkg('SpatialExperiment')` object, the cell types we want to analyse (`selection`), how we want to subset the data (`subsetby`), the spatial statistic function to compute (`fun`), which columns are the marks/cell types (`marks`), the radius domain to compute on (`rSeq`) and the edge correction to perform (`correction`). Furthermore, for the functional data analysis part we can provide the sample_id if there are replicates (`sample_id`, will lead to a mixed effects model), the transformation to apply to the output of the spatial statistics metric (here, we apply a square root transformation to stabilise the variance), the image ID (`image_id`) as well as the conditions (`conditions`) ```{r, spatialInference} colData(spe)[["patient_stage"]] <- factor(colData(spe)[["patient_stage"]]) #relevel to have non-diabetic as the reference category colData(spe)[["patient_stage"]] <- relevel(colData(spe)[["patient_stage"]], "Non-diabetic") #run the spatial statistics inference res <- spatialInference( spe, selection = c("alpha", "Tc"), subsetby = "image_number", fun = "Gcross", marks = "cell_type", rSeq = seq(0, 50, length.out = 50), correction = "rs", sample_id = "patient_id", transformation = 0.5, eps = 0, delta = 10, family = mgcv::scat(link = "log"), image_id = "image_number", condition = "patient_stage", ncores = 1 ) names(res) ``` The output is a list of three objects: The dataframe with the calculated spatial statistics curves from `r BiocStyle::CRANpkg('spatstat')`, the design matrix for the statistical inference, as well as the output of the `pffr` function from `r BiocStyle::CRANpkg('refund')` [@baddeleySpatialPointPatterns; @spatstat2005; @refund2024; @scheiplFunctionalAdditiveMixed2015]. ## Spatial Statistics Curves We can visualise the spatial statistics curves per image with the function `plotMetricPerFov`. Note that these curves are square-root transformed, since we added a transformation parameter above. ```{r, plotFunction} metricRes <- res$metricRes # create a unique plotting ID metricRes$ID <- paste0( metricRes$patient_stage, "|", metricRes$patient_id ) # change levels for plotting metricRes$ID <- factor(metricRes$ID, levels = c("Non-diabetic|6126", "Non-diabetic|6134", "Onset|6228","Onset|6414", "Long-duration|6089", "Long-duration|6180")) # plot metrics plotMetricPerFov(metricRes, correction = "rs", x = "r", imageId = "image_number", ID = "ID", ncol = 2) ``` We note that there is pronounced variability between the three conditions, between the patients as well as between individual images. ## Functional Boxplot In order to summarise the variability of the curves calculated above, we can use the functional boxplot. The functional boxplot is a generalisation of a standard boxplot, giving information about the median curve (black solid line), the 50% central region (area coloured in magenta), the minimum and maximum envelopes (blue solid lines) and outlier curves (red dashed lines). Here, the `fbplot` function from the `r BiocStyle::CRANpkg('fda')` package is used [@sun2011functional; @ramsay2024fda]. ```{r funcBoxPlot, warning = FALSE, results='hide'} # create a unique ID per row in the dataframe metricRes$ID <- paste0( metricRes$patient_stage, "x", metricRes$patient_id, "x", metricRes$image_number ) collector <- plotFbPlot(metricRes, "r", "rs", "patient_stage") ``` The functional boxplot provides a convenient summary of the spatial statistics curves. We see that the median curve for onset samples plateaus at a higher level and that non-diabetic and long-duration functional boxplots are in general very similar. There are some outlier curves but since they are not far off the envelopes, we do not filter them out. ## Functional GAM In the functional boxplot we got an overview of the variability of the spatial statistic curves and could describe differences qualitatively. In order to test these differences, we use inference via functional data analysis. The functional general additive model (GAM) provides a way to perform statistical inference on the spatial statistics functions and is implemented in the function `pffr` from `r BiocStyle::CRANpkg('refund')` [@scheiplFunctionalAdditiveMixed2015; @scheiplGeneralizedFunctionalAdditive2016; @refund2024]. ```{r, funcGam} mdl <- res$mdl mm <- res$designmat summary(mdl) ``` The summary of the functional GAM provides information on the constant parameters and the functional coefficients. For the functional coefficients, an $F$-test over the entire domain is performed which answers the question if there is any difference between the reference curves (here "Non-diabetic") and the conditions over the entire domain. We note that there is a small non-significant difference in the $G$ function between non-diabetic and long-duration T1D samples, but a strong difference between non-diabetic and onset T1D according to the model summary. As the functional coefficients are functions of the domain $r$ themselves, we can plot them too ```{r, plotFuncGam} plotLs <- lapply(colnames(mm), plotMdl, mdl = mdl, shift = mdl$coefficients[["(Intercept)"]]) wrap_plots(plotLs, nrow = 3, axes = 'collect') ``` From the functional coefficients we see that the difference betwenn Non-diabetic and Onset diabetic curves is at short distances and that the difference becomes less pronounced at higher distances. This means that the differential co-localisation between cytotoxic T-cells and alpha cells in the islet happens at shorter distances. The point wise confidence bands are a limitation of this method and could be improved with either bootstrapping or continuous confidence bands [@liebl2023fast]. ## Model evaluation The functional GAMs provide coefficients that we could plot above. In order to judge the confidence we can have in these coefficients, we look to quantify the model fit. First, we look at the correlation and check the qq plot of the model residuals. ```{r contour, warning = FALSE} resid(mdl) |> cor() |> filled.contour(levels = seq(-1, 1, l = 40)) resid(mdl) |> cov() |> filled.contour() qqnorm(resid(mdl), pch = 16) qqline(resid(mdl)) ``` In these model diagnostics, we note that there is still some variability in the residuals that is not considered by the model. The Q-Q plot indicates a good but not perfect model fit. The residuals show a considerable structure that is in line with the structure in the auto-covariance / correlation plots. ```{r sessionInfo} sessionInfo() ```