--- title: "The SpatialDatasets package" author: - name: Nicholas Robertson affiliation: - Sydney Precision Data Science Centre, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia - name: Farhan Ameen affiliation: - Sydney Precision Data Science Centre, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia - name: Alex Qin affiliation: - Sydney Precision Data Science Centre, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia - Westmead Institute for Medical Research, University of Sydney, Australia - name: Ellis Patrick affiliation: - Sydney Precision Data Science Centre, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia - Westmead Institute for Medical Research, University of Sydney, Australia package: SpatialDatasets output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{The SpatialDataset package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Introduction The `SpatialDatasets` package contains a collection of spatially-resolved omics datasets, which have been formatted into the [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment), [MoleculeExperiment](https://bioconductor.org/packages/MoleculeExperiment) or [CytoImageList](https://bioconductor.org/packages/cytomapper) Bioconductor classes, for use in examples, demonstrations, and tutorials. The datasets are from several different platforms including IMC, MIBI-TOF, Xenium, CosMx and MERFISH. They have been sourced from various publicly available sources. # Installation To install the `SpatialDatasets` package from GitHub: ```{r, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SpatialDatasets") ``` # Datasets The package contains the following datasets: - `spe_Keren_2018`: A study on triple negative breast cancer containing 40 samples measured using MIBI-TOF published by [Keren et al. (2018)](https://doi.org/10.1016/j.cell.2018.08.039). - `Ferguson_Images`: A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332). - `spe_Ferguson_2022`: A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332). - `spe_Schurch_2020`: A study on advanced colorectal cancer containing 140 samples measured using CODEX published by [Schurch et al. (2020)](https://doi.org/10.1016/j.cell.2020.07.005). - `spe_Ali_2020`: A study on breast cancer containing 483 samples measured using IMC published by [Ali et al. (2020)](https://doi.org/10.1038/s43018-020-0026-6). - `spe_Amancherla_2025`: A study on heart transplant rejection containing 35 samples measured with Xenium published by [Amancherla et al. (2025)](https://doi.org/10.1101/2025.02.28.640852) - `spe_Vannan_2025`: A study on pulmonary fibrosis containing 35 lung samples measured with Xenium published by [Vannan et al. (2025)](https://doi.org/10.1038/s41588-025-02080-x). # Load data The following examples show how to load the example datasets as `SpatialExperiment` objects in an R session. There are two options for loading the datasets: either using named accessor functions or by querying the ExperimentHub database. ## Load using named accessors ```{r, message=FALSE} library(SpatialExperiment) library(SpatialDatasets) ``` ### Keren et al. (2018) A study on triple negative breast cancer containing 40 samples measured using MIBI-TOF published by [Keren et al. (2018)](https://doi.org/10.1016/j.cell.2018.08.039). ```{r, message=FALSE} # load object spe <- spe_Keren_2018() # check object spe ``` ### Ferguson et al. (2022) A study on head and neck cutaneous squamous cell carcinoma containing 44 samples measured using IMC published by [Ferguson et al. (2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332). #### Ferguson Images In the chunk below, we've provided code for generating a `CytoImageList` object from the images `zip` file provided by the `Ferguson_Images()` function. ```{r, message=FALSE} # load object zip <- Ferguson_Images() tmp <- tempfile() unzip(zip, exdir = tmp) images <- cytomapper::loadImages( tmp, single_channel = TRUE, on_disk = TRUE, h5FilesPath = HDF5Array::getHDF5DumpDir() ) # check object images ``` #### Ferguson SpatialExperiment Object ```{r, message=FALSE} # load object spe <- spe_Ferguson_2022() # check object spe ``` ### Schurch et al. (2020) A study on advanced colorectal cancer containing 140 samples measured using CODEX published by [Schurch et al. (2020)](https://doi.org/10.1016/j.cell.2020.07.005). ```{r, message=FALSE} # load object spe <- spe_Schurch_2020() # check object spe ``` ### Ali et al. (2020) A study on breast cancer containing 483 samples measured using IMC published by [Ali et al. (2020)](https://doi.org/10.1038/s43018-020-0026-6). ```{r, message=FALSE} # load object spe <- spe_Ali_2020() # check object spe ``` ### Amancherla et al. (2025) A study on heart transplant rejection containing 62 (49 adult and 13 pediatric) samples measured with Xenium published by [Amancherla et al. (2025)](https://doi.org/10.1101/2025.02.28.640852). ```{r, message = FALSE} # load object spe <- spe_Amancherla_2025() # check object spe ``` ### Vannan et al. (2025) A study on pulmonary fibrosis containing 35 lung samples measured with Xenium published by [Vannan et al. (2025)](https://doi.org/10.1038/s41588-025-02080-x). ```{r, message = FALSE} # load object spe <- spe_Vannan_2025() # check object spe ``` # Generating objects from raw data files For reference, we include code scripts to generate the `SpatialExperiment`, `MoleculeExperiment` or `CytoImageList` objects from the raw data files. These scripts are saved in `/inst/scripts/` in the source code of the `SpatialDatasets` package. The scripts include references and links to the data files from the original sources for each dataset. # Session information ```{r} sessionInfo() ```