--- title: "mistyR and SpatialExperiment/SingleCellExperiment" author: - name: Ricardo Omar Ramirez Flores affiliation: - Heidelberg University and Heidelberg University Hospital, Heidelberg, Germany - name: Jovan Tanevski affiliation: - Heidelberg University and Heidelberg University Hospital, Heidelberg, Germany - Jožef Stefan Institute, Ljubljana, Slovenia email: jovan.tanevski@uni-heidelberg.de date: "`r Sys.Date()`" package: mistyR output: rmarkdown::pdf_document: df_print: kable extra_dependencies: nowidow: ["defaultlines=3", "all"] vignette: > %\VignetteIndexEntry{mistyR and SpatialExperiment} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction `r Githubpkg("saezlab/mistyR")` can be used to analyze spatial omics data sets stored in `r Biocpkg("SpatialExperiment")` object with just a couple of functions. In this vignette we demonstrate how to build a user friendly workflow starting from data preprocessing, through running `r Githubpkg("saezlab/mistyR")`, to analysis of results, i.e., the spatial interactions between markers stored in alternative assays and specific locations. The functions provided in this notebook can be adapted to the user preference but the main objective is to exploit as much as possible the flexibility of workflow creation from `r Githubpkg("saezlab/mistyR")` and object manipulation from `r Biocpkg("SpatialExperiment")` and `Biocpkg("SingleCellExperiment")`. ```{r setup, message = FALSE} # MISTy library(mistyR) library(future) # SpatialExperiment library(SpatialExperiment) library(SingleCellExperiment) library(SummarizedExperiment) # data manipulation library(Matrix) library(tibble) library(dplyr) library(purrr) # normalization library(sctransform) # resource library(progeny) # plotting library(ggplot2) # setup parallel execution plan(multisession) ``` ## The skeleton of mistyR pipelines For user convenience and to facilitate the use of `r Githubpkg("saezlab/mistyR")` and `r Biocpkg("SpatialExperiment")`, `run_misty_spe()` is a function describing a general skeleton of a mistyR workflow for analysing a 10x Visium slide given in a SpatialExperiment object. The function allows for: 1) Defining a number of assays/views to be used in the model. 2) Defining the type of spatial context for each view and their parameters. 3) Defining the specific assay and features to be used for the view creation of each view. 4) Defining the specific spots where the model will be built. ```{r} run_misty_spe <- function(slide, # SpatialExperiment object with spatial omics data. view.assays, # Named list of assays for each view. view.features = NULL, # Named list of features/markers to use. # Use all by default. view.types, # Named list of the type of view to construct # from the assay. view.params, # Named list with parameters (NULL or value) # for each view. spot.ids = NULL, # spot IDs to use. Use all by default. out.alias = "results" # folder name for output ) { # Extracting geometry geometry <- as.data.frame(spatialData(slide)) %>% select(array_row, array_col) # Extracting data view.data <- map(view.assays, extract_spe_data, geometry = geometry, slide = slide ) # Constructing and running a workflow build_misty_pipeline( view.data = view.data, view.features = view.features, view.types = view.types, view.params = view.params, geometry = geometry, spot.ids = spot.ids, out.alias = out.alias ) } ``` ## Extracting specific information from SpatialExperiment objects These are helper functions that allow to extract from `r Biocpkg("SpatialExperiment")` objects specific assays and transform them into `tibble` which is a preferred format for `r Githubpkg("saezlab/mistyR")`. ```{r} # Extracts data from an specific assay from a SpatialExperiment object # and aligns the IDs to the geometry extract_spe_data <- function(slide, assay, geometry) { data <- altExp(slide, assay) %>% assay() %>% t() %>% as_tibble(rownames = NA) return(data %>% dplyr::slice(match(rownames(.), rownames(geometry)))) } # Filters data to contain only features of interest filter_data_features <- function(data, features) { if (is.null(features)) features <- colnames(data) return(data %>% rownames_to_column() %>% select(rowname, all_of(features)) %>% rename_with(make.names) %>% column_to_rownames()) } ``` ## View creation This helper function wraps the three options available by default for view creation in `r Githubpkg("saezlab/mistyR")` with additional features that allow for creating views for specific spots. ```{r} # Builds views depending on the paramaters defined create_default_views <- function(data, view.type, view.param, view.name, spot.ids, geometry) { view.data.init <- create_initial_view(data) if (!(view.type %in% c("intra", "para", "juxta"))) { view.type <- "intra" } if (view.type == "intra") { data.red <- view.data.tmp$data %>% rownames_to_column() %>% filter(rowname %in% spot.ids) %>% select(-rowname) } else if (view.type == "para") { view.data.tmp <- view.data.init %>% add_paraview(geometry, l = view.param) data.ix <- paste0("paraview.", view.param) data.red <- view.data.tmp[[data.ix]]$data %>% mutate(rowname = rownames(data)) %>% filter(rowname %in% spot.ids) %>% select(-rowname) } else if (view.type == "juxta") { view.data.tmp <- view.data.init %>% add_juxtaview( positions = geometry, neighbor.thr = view.param ) data.ix <- paste0("juxtaview.", view.param) data.red <- view.data.tmp[[data.ix]]$data %>% mutate(rowname = rownames(data)) %>% filter(rowname %in% spot.ids) %>% select(-rowname) } if (is.null(view.param) == TRUE) { misty.view <- create_view( paste0(view.name), data.red ) } else { misty.view <- create_view( paste0(view.name, "_", view.param), data.red ) } return(misty.view) } ``` ## Building a mistyR pipeline and running the model This wrapper function `build_misty_pipeline()` allows for building automatically a `r Githubpkg("saezlab/mistyR")` workflow from a list of data frames, with specified spatial context, parameters, features and locations. ```{r} # Builds automatic MISTy workflow and runs it build_misty_pipeline <- function(view.data, view.features, view.types, view.params, geometry, spot.ids = NULL, out.alias = "default") { # Adding all spots ids in case they are not defined if (is.null(spot.ids)) { spot.ids <- rownames(view.data[[1]]) } # First filter the features from the data view.data.filt <- map2(view.data, view.features, filter_data_features) # Create initial view views.main <- create_initial_view(view.data.filt[[1]] %>% rownames_to_column() %>% filter(rowname %in% spot.ids) %>% select(-rowname)) # Create other views view.names <- names(view.data.filt) all.views <- pmap(list( view.data.filt[-1], view.types[-1], view.params[-1], view.names[-1] ), create_default_views, spot.ids = spot.ids, geometry = geometry ) pline.views <- add_views( views.main, unlist(all.views, recursive = FALSE) ) # Run MISTy run_misty(pline.views, out.alias) } ``` ## Basic visualization function Finally, we adapted the function `plotMolecules()` from `r Githubpkg("lmweber/ggspavis")` to visualize features from custom assays for the purposes of the use case. ```{r} plotMolecules_adapted <- function(spe, molecule = NULL, x_coord = "array_col", y_coord = "array_row", palette = NULL, alt_assay = "lognorm") { if (is.null(palette)) { palette <- "yellow" } if (!is.null(palette) && length(palette) == 1) { palette <- c("black", palette) } df_plot <- spatialData(spe)[, c(x_coord, y_coord), drop = FALSE] mRNA_counts <- as.numeric(assay(altExp(spe, alt_assay))[molecule, , drop = FALSE]) stopifnot(length(mRNA_counts) == nrow(df_plot)) df_plot <- cbind(df_plot, expression = mRNA_counts) df_plot <- as.data.frame(df_plot) %>% mutate(array_row = array_row * -1) p <- ggplot( df_plot, aes_string(x = x_coord, y = y_coord, color = "expression") ) + geom_point(size = 2.5) + scale_color_gradient( low = palette[1], high = palette[2], trans = "sqrt" ) + coord_fixed() + ggtitle(molecule) + theme_void() p } ``` # Use case As an example, we will analyze a 10X Visium spatial gene expression dataset of one breast cancer section (Block A Section 1) available here []. We assume that the required files [Feature/cell matrix HDF5 (filtered)]{.ul} and the [Spatial imaging data]{.ul} (extracted) are in a folder named 'breast_A\_1' in the current working directory. ```{r get_data, include=FALSE} cleanup <- FALSE if (!dir.exists("breast_A_1")) { download.file("https://www.dropbox.com/s/igby4csbt9u4uuf/breast_A_1.tgz?dl=1", destfile = "breast_A_1.tgz", mode ="wb", quiet = TRUE ) untar("breast_A_1.tgz", tar = "internal") cleanup <- TRUE } ``` We will explore the spatial interactions of the Hypoxia pathway responsive genes with the Estrogen pathway responsive genes. To this end we will use the model matrix with top significant genes for each pathway from the package `r Biocpkg("progeny")` and the previously described functions. Please note that the `r Biocpkg("SpatialExperiment")` function `read10xVisium()` requires a fixed file structure that we create programmatically for this e xample. Furthermore, in this example we dropped all repeated symbols, however, the user must define what's the best solution for their analysis. ## Loading and normalizing the data sets ```{r, warning=FALSE} # Load and normalize using SCT folder <- "breast_A_1" # rename to create the required file structure file.rename( "breast_A_1/V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5", "breast_A_1/filtered_feature_bc_matrix.h5" ) spe <- read10xVisium(folder, type = "HDF5", data = "filtered", images = "lowres") # normalize data # counts are of class DelayedMatrix which is incompatible with vst sct.data <- vst(as(counts(spe), "dgCMatrix"), verbosity = 0)$y # Dropping duplicates spe <- spe[!duplicated(rowData(spe)), ] # Getting relevant genes gene.dict <- as_tibble(rowData(spe), rownames = NA) %>% rownames_to_column("ENSEMBL") %>% filter(ENSEMBL %in% rownames(sct.data)) # Re-naming normalized data with gene symbols sct.data <- sct.data[gene.dict %>% pull("ENSEMBL"), colnames(spe)] rownames(sct.data) <- gene.dict %>% pull(symbol) # undo file rename file.rename( "breast_A_1/filtered_feature_bc_matrix.h5", "breast_A_1/V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5" ) ``` ## Filtering genes that are expressed in at least 5% of the spots ```{r} coverage <- rowSums(sct.data > 0) / ncol(sct.data) slide.markers <- names(coverage[coverage >= 0.05]) ``` ## Defining Hypoxia and Estrogen responsive genes For this simple example we will pick the top 15 most significantly responsive genes of each pathway from the model matrix from the package `progeny`. ```{r} estrogen.footprints <- getModel(top = 15) %>% rownames_to_column("gene") %>% filter(Estrogen != 0, gene %in% slide.markers) %>% pull(gene) hypoxia.footprints <- getModel(top = 15) %>% rownames_to_column("gene") %>% filter(Hypoxia != 0, gene %in% slide.markers) %>% pull(gene) ``` Note that for this use case we assume that all normalizations and assays used by `mistyR` are defined as **alternative experiments with identical names in the assay and the experiment**. ```{r} sct.exp <- SummarizedExperiment(sct.data[slide.markers, ]) assayNames(sct.exp) <- "SCT" altExp(spe, "SCT") <- sct.exp ``` ## Defining the parameters of the workflow In this example we will explain the expression of hypoxia responsive genes in terms of three different views: 1) Main (intrinsic) view (containing genes to be predicted): intrinsic expression of `hypoxia.footprints` 2) Paraview - hypoxia genes: expression of hypoxia markers in a significance radius of 10 spots 3) Paraview - estrogen genes: expression of estrogen markers in a significance radius of 10 spots ```{r} # Define assay for each view view.assays <- list( "main" = "SCT", "para.hypoxia" = "SCT", "para.estrogen" = "SCT" ) # Define features for each view view.features <- list( "main" = hypoxia.footprints, "para.hypoxia" = hypoxia.footprints, "para.estrogen" = estrogen.footprints ) # Define spatial context for each view view.types <- list( "main" = "intra", "para.hypoxia" = "para", "para.estrogen" = "para" ) # Define additional parameters (l in the case of paraview) view.params <- list( "main" = NULL, "para.hypoxia" = 10, "para.estrogen" = 10 ) misty.out <- "vignette_model_spe" ``` ## Run MISTy pipeline and collect results Now that we have preprocessed the data and have decided on a question to analyze, we can create and run a `r Githubpkg("saezlab/mistyR")` workflow. ```{r, warning=FALSE} misty.results <- run_misty_spe( slide = spe, view.assays = view.assays, view.features = view.features, view.types = view.types, view.params = view.params, spot.ids = NULL, # Using the whole slide out.alias = misty.out ) %>% collect_results() ``` ## Interpretation and downstream analysis MISTy gives explanatory answers to three general questions: **1. How much can the broader spatial context explain the expression of markers (in contrast to the intraview)?** This can be observed in the gain in R2 (or RMSE) of using the multiview model in contrast to the single `main` view model. ```{r, warning=FALSE} misty.results %>% plot_improvement_stats("gain.R2") %>% plot_improvement_stats("gain.RMSE") ``` In this example, PGK1 is a marker whose expression can be explained better by modeling the broader spatial context around each spot. We can further inspect the significance of the gain in variance explained, by the assigned p-value of improvement based on cross-validation. ```{r} misty.results$improvements %>% filter(measure == "p.R2") %>% arrange(value) ``` In general, the significant gain in R2 can be interpreted as the following: "We can better explain the expression of marker X, when we consider additional views, other than the intrinsic view." **2.How much do different view components contribute to explaining the expression?** ```{r} misty.results %>% plot_view_contributions() misty.results$contributions.stats %>% filter(target == "PGK1") ``` In the case of PGK1, we observe that around 37% of the contribution in the final model comes from the expression of other markers of hypoxia intrinsically or from the broader tissue structure. The rest (63%) comes from the expression of estrogen and hypoxia responsive genes from the broader tissue structure. **3.What are the specific relations that can explain the contributions?** To explain the contributions, we can visualize the importances of markers coming from each view separately as predictors of the expression of the intrinsic markers of hypoxia. First, the intrinsic importances of the hypoxia markers. ```{r} misty.results %>% plot_interaction_heatmap(view = "intra") ``` These importances are associated to the relationship between markers in the same spot. Let's pick the best predictor of PGK1 to confirm this: ```{r} misty.results$importances.aggregated[["intra"]] %>% select(Predictor, PGK1) %>% arrange(-PGK1) ``` ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "PGK1", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "NDRG1", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` Second, the paraview importances of the hypoxia markers. ```{r} misty.results %>% plot_interaction_heatmap(view = "para.hypoxia_10") ``` These importances are associated to the relationship between markers in the spot and markers in the neighborhood (controlled by our parameter l). ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "PGK1", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "PFKFB4", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` As expected, the expression of PFKFB4 (the best predictor from this view) in the neighborhood of each spot allows to explain the expression of PGK1. Finally, the paraview importances of the estrogen markers. We will inspect the best predictor in this view. ```{r} misty.results %>% plot_interaction_heatmap(view = "para.estrogen_10") ``` ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "PGK1", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` ```{r, warning=FALSE, dev='jpeg'} plotMolecules_adapted(spe, molecule = "TPD52L1", x_coord = "array_col", y_coord = "array_row", alt_assay = "SCT" ) ``` It is visible that in some areas the local expression of TPD52L1 overlaps with the areas with the highest expression of PGK1. ## Important notes - The relationships captured in the importances are not to assumed or interpreted as linear or casual. - 1-to-1 importances between predictor and markers should always be interpreted in the context of the other predictors, since training MISTy models is multivariate predictive task. # Other use cases The shown example is not the only way to use mistyR to analyze spatial transcriptomics data. Similar and complementary workflows can be constructed to describe different aspects of biology, for example: - Spatial interactions between pathway activities and putative ligands, as shown [here](https://doi.org/10.1101/2020.05.08.084145). - Spatial interactions between cell-state lineage markers and putative ligands, as shown [here](https://doi.org/10.1101/2020.12.08.411686). - Spatial interactions between cell-type abundances leveraging deconvolution methods and creating descriptions of cell colocalization and tissue architecture. Additionally, `r Githubpkg("saezlab/mistyR")` through the function `collect_results()` allows you to group the results of multiple slides, allowing for a more robust, integrative or comparative analysis of spatial interactions. --- # See also {-} ## More examples {-} `browseVignettes("mistyR")` [Online articles](https://saezlab.github.io/mistyR/articles/) ## Publication {-} *`r format(citation("mistyR"), "textVersion")`* # Session info Here is the output of `sessionInfo()` at the point when this document was compiled: ```{r info, echo=FALSE} sessionInfo() ``` ```{r cleanup, include=FALSE} if (cleanup) { unlink(c("breast_A_1.tgz", "breast_A_1/"), recursive = TRUE ) } unlink("vignette_model_spe", recursive = TRUE) ```