--- title: "SpatialExperimentIO - Data Reader Package Overview" author: - name: Yixing E. Dong affiliation: "University of Lausanne, Lausanne, Switzerland" output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r format(Sys.Date(), '%b %d, %Y')`" vignette: > %\VignetteIndexEntry{SpatialExperimentIO Reader Package Overview} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{=html} ``` ```{r setup, include = FALSE} knitr::opts_chunk$set(cache = TRUE, autodep = TRUE, cache.lazy = FALSE) ``` This package allows the user to decide to load data from single-cell level spatial transcriptomics technologies, such as Xenium, CosMx, MERSCOPE, STARmapPLUS, or seqFISH as either `SpatialExperiment (SPE)` or `SingleCellExperiment (SCE)` object. The only difference between the two object types are where to store the spatial coordinates. For the current version of `SpatialExperiment`, the `spatialCoords(spe)` are stored in a separate slot other than `colData(spe)`. On the other hand, `SingleCellExperiment` stores the `spatialCoords()` inside of `colData(spe)`. After reading in the data, we need to look at the landscape of other downstream analysis tools. For example, `library(BayesSpace)` is a clustering tool developed for spatial transcriptomics data, but it only takes a `SCE` object and looks for the spatial coordinates in the `colData(sce)`. Other spatial visualization packages, such as `library(ggspavis)` in its newest version, is compatible with both `SPE` and `SCE` objects. Therefore, to avoid the pain of object conversion, we give the flexibility to let the user decide what object type to return. # Setup ```{r, warning=FALSE, message=FALSE} suppressMessages({ library(SpatialExperimentIO) library(SpatialExperiment) }) ``` # Quick Start - For Xenium data: ```{r, eval=FALSE} ### DO NOT RUN. Example code. xepath <- "/path/to/folder" # Xenium as SPE, or SCE by setting `returnType = "SCE"`. xe_spe <- readXeniumSXE(dirName = xepath) ``` - For CosMx data: ```{r, eval=FALSE} ### DO NOT RUN. Example code. cospath <- "/path/to/folder" # CosMx as SPE, or SCE by setting `returnType = "SCE"`. cos_spe <- readCosmxSXE(dirName = cospath) ``` - For MERSCOPE data: ```{r, eval=FALSE} ### DO NOT RUN. Example code. merpath <- "/path/to/folder" # MERSCOPE as SPE, or SCE by setting `returnType = "SCE"`. mer_spe <- readMerscopeSXE(dirName = merpath) ``` - For STARmap PLUS data: ```{r, eval=FALSE} ### DO NOT RUN. Example code. starpath <- "/path/to/folder" # STARmapPLUS as SPE, or SCE by setting `returnType = "SCE"`. star_spe <- readStarmapplusSXE(dirName = starpath) ``` - For seqFISH data: ```{r, eval=FALSE} ### DO NOT RUN. Example code. seqfpath <- "/path/to/folder" # seqFISH as SPE, or SCE by setting `returnType = "SCE"`. seqf_spe <- readSeqfishSXE(dirName = merpath) ``` That is pretty much all you need. To learn more details, please read the below sections for each technology. # Xenium Xenium is an imaging-based in-situ sequencing technology developed by 10x Genomics. Compared to the full transcriptome coverage sequencing-based technology Visium, Xenium allows for transcript-level resolution count detection but with less genes. The transcripts are segmented into single cells and `SpatialExperimentIO` returns the cell-level `SPE` or `SCE` object. To read more about Xenium technology workflow, please refer to the [Xenium technology overview](https://www.10xgenomics.com/blog/spatially-resolved-transcriptomics-an-introductory-overview-of-spatial-gene-expression-profiling-methods). For more publicly available Xenium data, please refer to [Xenium data download](https://www.10xgenomics.com/datasets?query=Xenium&page=1&configure%5BhitsPerPage%5D=50&configure%5BmaxValuesPerFacet%5D=1000). The object constructor assumes the downloaded unzipped Xenium Output Bundle contains the mandatory file of `cells.parquet` and either a folder `/cell_feature_matrix` or a .h5 file `cell_feature_matrix.h5`. ```{r} xepath <- system.file( file.path("extdata", "Xenium_small"), package = "SpatialExperimentIO") list.files(xepath) ``` ## Read Xenium as a `SpatialExperiment` object We display the default specification of each variable in `readXeniumSXE()`. To read in Xenium as a `SpatialExperiment` object with count matrix, column data, and all other common raw download files, you would only need to provide a valid directory name. ```{r, eval=FALSE} # read the count matrix .h5 file - automatically DropletUtils::read10xCounts(type = "HDF5") xe_spe <- readXeniumSXE(dirName = xepath, returnType = "SPE", countMatPattern = "cell_feature_matrix.h5", metaDataPattern = "cells.parquet", coordNames = c("x_centroid", "y_centroid"), addExperimentXenium = TRUE, # set to TRUE to load experiment.xenium altExps = c("NegControlProbe", "UnassignedCodeword", "NegControlCodeword", "antisense", "BLANK"), addParquetPaths = TRUE #, # set TRUE to load all .parquet below # ... # takes arguments as below ) xe_spe <- addParquetPathsXenium(sxe = xe_spe, dirName = xepath, addTx = TRUE, txMetaNames = "transcripts", txPattern = "transcripts.parquet", addCellBound = TRUE, cellBoundMetaNames = "cell_boundaries", cellBoundPattern = "cell_boundaries.parquet", addNucBound = TRUE, NucBoundMetaNames = "nucleus_boundaries", NucBoundPattern = "nucleus_boundaries.parquet") ``` Note that `experiment.xenium` is by default added to `metaata()`, and you can disable it by setting `addExperimentXenium = FALSE`. ```{r} xe_spe <- readXeniumSXE(dirName = xepath) xe_spe ``` Additionally, Xenium gives different control genes in their gene panel (check with `altExpNames(xe_spe)` to see). In this mini example, we obtain a Xenium dataset with 4 genes all with type of `"Gene Expression"` in their `rowData(xe_spe)` and 6 cells just for illustration. In a real Xenium data download, possible `altExp(xe_spe)` for Xenium can have the default categories, depending on the experiment setup. For example, below we manually specified gene names contains "TestGene" will be put into `altExp()`. We add any other .parquet files (`cell_boundaries.parquet`, `nucleus_boundaries.parquet`), except `transcripts.parquet`, to the `metadata()`. You can control what path to add with additional boolean arguments - `addTx`, `addCellBound`, `addNucBound`. See the documentation for the helper function `addParquetPathsXenium()`. ```{r} xe_spe <- readXeniumSXE(xepath, altExps = c("TestGene"), addParquetPaths = TRUE, addTx = FALSE) xe_spe ``` If you do not have `cell_feature_matrix.h5` but the folder `/cell_feature_matrix` instead, it should contain the following files. ```{r, eval=FALSE} list.files(file.path(xepath, "cell_feature_matrix")) # "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz" ``` For this example, we only provide `cell_feature_matrix.h5` for demonstration. However, alternatively you can read in Xenium by specifying `countMatPattern` as the folder `"cell_feature_matrix"`. You should also subset to `"Gene Expression"` gene type like previously. ```{r, eval=FALSE} # or read the count matrix folder - automatically DropletUtils::read10xCounts(type = "sparse") xe_spe <- readXeniumSXE(dirName = xepath, countMatPattern = "cell_feature_matrix") ``` ## Read Xenium as a `SingleCellExperiment` object Instead, if you are interested in storing the `spatialCoords()` columns in `colData` and read Xenium in as a `SingleCellExperiment`, you need to change `readXeniumSXE(returnType = )` to `"SCE"`. It is also required to subset to `"Gene Expression"` gene type. We end up with an `SCE` object with 248 genes. ```{r} xe_sce <- readXeniumSXE(dirName = xepath, returnType = "SCE") xe_sce ``` This is a mock Xenium datasets with 4 genes by 6 cells. Some Xenium data set can have a dimension of 313 genes and around 110,000 cells in the [Xenium human breast cancer data](https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast). For more visualization tools for spatial transcriptomics downstream data analysis, including helpers for QC, marker gene expression level and clustering results on reduced dimensions or its spatial distribution, please refer to `BiocManager::install("ggspavis")`. # CosMx CosMx is an imaging-based in-situ sequencing technology by Nanostring. To read more about the CosMx technology workflow, please refer to the [technology overview](https://nanostring.com/products/cosmx-spatial-molecular-imager/single-cell-imaging-overview/). For more publicly available data sets, please refer to the CosMx data download [website](https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/?utm_source=google&utm_medium=paidsearch&utm_campaign=dynamic&utm_id=NSTG_DynamicSearch&utm_source=google&utm_medium=cpc&utm_campaign=1765548394&utm_agid=132844584222&utm_term=&creative=592729435866&device=c&placement=&network=g&gad_source=1&gclid=EAIaIQobChMI5M-sztjIggMVZZFoCR1MLgFiEAAYASAAEgJ1L_D_BwE). The object constructor assumes the data download folder contains two mandatory files with `exprMat_file.csv` and `metadata_file.csv` in the names. ```{r} cospath <- system.file( file.path("extdata", "CosMx_small"), package = "SpatialExperimentIO") list.files(cospath) ``` ## Read CosMx as a `SpatialExperiment` object We commented out the default specification of each variable in `readCosmxSXE()`. To read in CosMx as a `SpatialExperiment` object, you would only need to provide a valid directory name. ```{r, eval=FALSE} cos_spe <- readCosmxSXE(dirName = cospath, returnType = "SPE", countMatPattern = "exprMat_file.csv", metaDataPattern = "metadata_file.csv", coordNames = c("CenterX_global_px", "CenterY_global_px"), addFovPos = TRUE, # set to TRUE to add fov position columns to colData() fovPosPattern = "fov_positions_file.csv", altExps = c("NegPrb", "Negative", "SystemControl", "FalseCode"), addParquetPaths = TRUE # , # set TRUE to add all .parquet below # ... # takes arguments as below ) cos_spe <- addParquetPathsCosmx(sxe = cos_spe, dirName = cospath, addTx = TRUE, txMetaNames = "transcripts", txPattern = "tx_file.csv", addPolygon = TRUE, polygonMetaNames = "polygons", polygonPattern = "polygons.csv") ``` With this example dataset, we obtained a CosMx `SPE` object with 8 genes and 9 cells. Here is a demonstration of adding all recommended files as path to parquet in the `metadata()`, except there is no polygon file in this example data. ```{r} cos_spe <- readCosmxSXE(cospath, addPolygon = FALSE) cos_spe ``` ## Read CosMx as a `SingleCellExperiment` object Alternatively, you can also read CosMx in as a `SCE`. ```{r} cos_sce <- readCosmxSXE(dirName = cospath, addPolygon = FALSE, returnType = "SCE") cos_sce ``` In reality, a CosMx data set can have a dimension of 980 genes and around 100,000 cells for the [human lung cancer data](https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset/?utm_source=google&utm_medium=paidsearch&utm_campaign=dynamic&utm_id=NSTG_DynamicSearch&utm_source=google&utm_medium=cpc&utm_campaign=1765548394&utm_agid=132844584222&utm_term=&creative=592729435866&device=c&placement=&network=g&gad_source=1&gclid=EAIaIQobChMIv-DvtO_IggMVlotoCR0qtgdxEAAYASAAEgKi0vD_BwE). # MERSCOPE MERSCOPE integrated MERFISH spatial transcriptomics technology with high resolution spatial imaging, fluidics, image processing, and is a product by Vizgen. To understand more about the MERFISH technology behind MERSCOPE, please refer to the [MERFISH Technology Overview](https://vizgen.com/technology/#merfish). For more publicly available MERSCOPE data, please see [MERSCOPE data download page](https://info.vizgen.com/ffpe-showcase). The object constructor assumes the data download folder contains two mandatory files with `cell_by_gene.csv` and `cell_metadata.csv` in the names. ```{r} merpath <- system.file( file.path("extdata", "MERSCOPE_small"), package = "SpatialExperimentIO") list.files(merpath) ``` ## Read MERSCOPE as a `SpatialExperiment` object We commented out the default specification of each variable in `readMerscopeSXE()`. To read in MERSCOPE as a `SpatialExperiment` object, you would only need to provide a valid directory name. With this example dataset, we obtained a MERSCOPE `SPE` object with 9 genes and 8 cells. ```{r} # mer_spe <- readMerscopeSXE(dirName = merpath, # returnType = "SPE", # countMatPattern = "cell_by_gene.csv", # metaDataPattern = "cell_metadata.csv", # coordNames = c("center_x", "center_y")) mer_spe <- readMerscopeSXE(dirName = merpath) mer_spe ``` ## Read MERSCOPE as a `SingleCellExperiment` object Alternatively, you can also read MERSCOPE in as a `SCE`. ```{r} mer_sce <- readMerscopeSXE(dirName = merpath, returnType = "SCE") mer_sce ``` In reality, a MERSCOPE data set can have a dimension of 550 genes and around 250,000 cells for the [human ovarian cancer data](https://console.cloud.google.com/storage/browser/vz-ffpe-showcase/HumanOvarianCancerPatient2Slice1?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))). # STARmap PLUS STARmap PLUS is an imaging-based in-situ sequencing technology that has been introduced by [Zeng et al.](https://pubmed.ncbi.nlm.nih.gov/36732642/). The object constructor assumes the data download folder contains two mandatory files with `raw_expression_pd.csv` and `spatial.csv` in the names. ```{r} starpath <- system.file( file.path("extdata", "STARmapPLUS_small"), package = "SpatialExperimentIO") list.files(starpath) ``` ## Read STARmap PLUS as a `SpatialExperiment` object We comment out the default parameters for your reference. In this example dataset, we provide a sample with 8 genes and 9 cells just for illustration. ```{r} # readStarmapplusSXE <- function(dirName = dirName, # returnType = "SPE", # countMatPattern = "raw_expression_pd.csv", # metaDataPattern = "spatial.csv", # coordNames = c("X", "Y", "Z")) star_spe <- readStarmapplusSXE(dirName = starpath) star_spe ``` ## Read STARmap PLUS as a `SingleCellExperiment` object Alternatively, you can also return a `SingleCellExperiment` object. ```{r} star_sce <- readStarmapplusSXE(dirName = starpath, returnType = "SCE") star_sce ``` STARmap PLUS has a gene panel of around 1000 with up to millions of cells depending on the size of the tissue. There are 20 sample on mouse brain with tissue region annotated published by [Shi et al.](https://www.nature.com/articles/s41586-023-06569-5). Their data is avaiable to downloaded from [Zenodo](https://zenodo.org/records/8327576). # seqFISH seqFISH is a product by Spatial Genomics, generated by the GenePS device. Like Xenium, it is claimed to work on intact cells and tissues. To understand more about the seqFISH technology, please refer to their website [seqFISH Technology Overview](https://spatialgenomics.com/). For more publicly available seqFISH data, please see [seqFISH data download page](https://spatialgenomics.com/data/). The object constructor assumes the data download folder contains two mandatory files with `CellxGene.csv` and `CellCoordinates.csv` in the names. ```{r} seqfpath <- system.file( file.path("extdata", "seqFISH_small"), package = "SpatialExperimentIO") list.files(seqfpath) ``` ## Read seqFISH as a `SpatialExperiment` object We commented out the default specification of each variable in `readSeqfishSXE()`. To read in seqFISH as a `SpatialExperiment` object, you would only need to provide a valid directory name. With this example dataset, we obtained a seqFISH `SPE` object with 9 genes and 14 cells. ```{r} # seqf_spe <- readSeqfishSXE(dirName = seqfpath, # returnType = "SPE", # countMatPattern = "cell_by_gene.csv", # metaDataPattern = "cell_metadata.csv", # coordNames = c("center_x", "center_y")) seqf_spe <- readSeqfishSXE(dirName = seqfpath) seqf_spe ``` ## Read seqFISH as a `SingleCellExperiment` object Alternatively, you can also read seqFISH in as a `SCE`. ```{r} seqf_sce <- readSeqfishSXE(dirName = seqfpath, returnType = "SCE") seqf_sce ``` So far, publicly available seqFISH data sets have 220 or 1092 genes in a panel for mouse kidney, for about 669,842 cells. There are other mouse indications, such as brain, liver, and intestine. There might be human colorectal and gastric cancer samples in the upcoming future. # Session Info {.smaller} ```{r tidy = TRUE} sessionInfo() ```