--- title: "Using anndataR" author: - name: Robrecht Cannoodt - name: Luke Zappia - name: Martin Morgan - name: Louise Deconinck package: anndataR output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Using anndataR to read and convert} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r options, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # already load SeuratObject and SingleCellExperiment # so warnings generated by these packages are not shown library(SeuratObject) library(SingleCellExperiment) ``` # Introduction `r Biocpkg("anndataR")` allows users to work with `.h5ad` files, interact with `AnnData` objects and convert to/from `SingleCellExperiment` or `Seurat` objects. This enables users to move data easily between the different programming languages and analysis ecosystems needed to perform single-cell data analysis. This package builds on our experience developing and using other interoperability packages and aims to provide a first-class R `AnnData` experience. ## Relationship to other packages Existing packages provide similar functionality to `r Biocpkg("anndataR")` but there are some important differences: - `r Biocpkg("zellkonverter")` provides conversion of `SingleCellExperiment` objects to/from `AnnData` and reading/writing of `.h5ad` files. This is facilitated via `r CRANpkg("reticulate")` using `r Biocpkg("basilisk")` to manage Python environments (native reading of `.h5ad` files is also possible). In contrast, `r Biocpkg("anndataR")` provides a native R H5AD interface, removing the need for Python dependencies. Conversion to/from `Seurat` objects is also supported. - `r CRANpkg("anndata")` (on CRAN) is a wrapper around the Python _anndata_ package. It provides a nicer interface from within R but still requires a Python environment. `r Biocpkg("anndataR")` provides a pure R implementation of the `AnnData` data structure with different back ends and conversion to more common R objects. - Other interoperability packages typically also require Python dependencies, have limited features or flexibility and/or are not available from a central package repository There is significant overlap in functionality between these packages. We anticipate that over time they will become more specialised and work better together or be deprecated in favour of `r Biocpkg("anndataR")`. # Installation Install `r Biocpkg("anndataR")` using `r CRANpkg("BiocManager")`: ```{r install, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("anndataR") ``` # Usage These sections briefly show how to use `r Biocpkg("anndataR")`. ## Read from disk First, we fetch an example `.h5ad` file included in the package: ```{r h5ad-path} library(anndataR) h5ad_path <- system.file("extdata", "example.h5ad", package = "anndataR") ``` By default, a H5AD is read to an in-memory `AnnData` object: ```{r read-in-memory} adata <- read_h5ad(h5ad_path) ``` It can also be read as a `SingleCellExperiment` object (see `vignette("usage_singlecellexperiment")`): ```{r read-as-SingleCellExperiment} sce <- read_h5ad(h5ad_path, as = "SingleCellExperiment") ``` Or as a `Seurat` object (see `vignette("usage_seurat")`): ```{r read-as-Seurat} obj <- read_h5ad(h5ad_path, as = "Seurat") ``` There is also a HDF5-backed `AnnData` object: ```{r read-on-disk} adata <- read_h5ad(h5ad_path, as = "HDF5AnnData") ``` See `r vignette("usage_python")` for interacting with a Python `AnnData` via `r CRANpkg("reticulate")`. ## Using `AnnData` objects View structure: ```{r show-structure} adata ``` Access `AnnData` slots: ```{r access-slots} dim(adata$X) adata$obs[1:5, 1:6] adata$var[1:5, 1:6] ``` [Subsetting](#subsetting) `AnnData` objects is covered below. See ```?`AnnData-usage` ```for more details on how to work with `AnnData` objects. ## Interoperability Convert the `AnnData` object to a `SingleCellExperiment` object (see `vignette("usage_singlecellexperiment")`): ```{r as-SingleCellExperiment} sce <- adata$as_SingleCellExperiment() sce ``` Convert the `AnnData` object to a `Seurat` object (see `vignette("usage_seurat")`): ```{r as-Seurat} obj <- adata$as_Seurat() obj ``` Convert a `SingleCellExperiment` object to an `AnnData` object (see `vignette("usage_singlecellexperiment")`): ```{r as-AnnData-from-SingleCellExperiment} adata <- as_AnnData(sce) adata ``` Convert a `Seurat` object to an `AnnData` object (see `vignette("usage_seurat")`): ```{r as-AnnData-from-Seurat} adata <- as_AnnData(obj) adata ``` ## Manually create an `AnnData` object ```{r manually-create-object} adata <- AnnData( X = matrix(rnorm(100), nrow = 10), obs = data.frame( cell_type = factor(rep(c("A", "B"), each = 5)) ), var = data.frame( gene_name = paste0("gene_", 1:10) ) ) adata ``` ## Write to disk Write an `AnnData` object to disk: ```{r write-to-disk} tmpfile <- tempfile(fileext = ".h5ad") adata$write_h5ad(tmpfile) # Alternatively, write_h5ad(adata, tmpfile) ``` Write a `SingleCellExperiment` object to disk: ```{r write-SingleCellExperiment-to-disk} tmpfile <- tempfile(fileext = ".h5ad") write_h5ad(sce, tmpfile) ``` Write a `Seurat` object to disk: ```{r write-Seurat-to-disk} tmpfile <- tempfile(fileext = ".h5ad") write_h5ad(obj, tmpfile) ``` ## Subsetting `AnnData` objects {#subsetting} `r Biocpkg("anndataR")` provides standard R subsetting methods that work with familiar bracket notation. These methods return `AnnDataView` objects that provide lazy evaluation for efficient memory usage. ### Basic subsetting Subset observations (rows) using logical conditions: ```{r subset-obs} # Create some sample data adata <- AnnData( X = matrix(rnorm(50), nrow = 10, ncol = 5), obs = data.frame( cell_type = factor(rep(c("A", "B", "C"), c(3, 4, 3))), score = runif(10) ), var = data.frame( gene_name = paste0("gene_", 1:5), highly_variable = c(TRUE, FALSE, TRUE, TRUE, FALSE) ) ) # Subset to cell type "A" view1 <- adata[adata$obs$cell_type == "A", ] view1 ``` Subset variables (columns) using logical conditions: ```{r subset-vars} # Subset to highly variable genes view2 <- adata[, adata$var$highly_variable] view2 ``` ### Combined subsetting Subset both observations and variables simultaneously: ```{r subset-both} # Subset to cell type "A" and highly variable genes view3 <- adata[adata$obs$cell_type == "A", adata$var$highly_variable] view3 ``` ### Using different index types ```{r subset-types} # Numeric indices view4 <- adata[1:5, 1:3] view4 # Character names (if available) rownames(adata) <- paste0("cell_", 1:10) colnames(adata) <- paste0("gene_", 1:5) view5 <- adata[c("cell_1", "cell_2"), c("gene_1", "gene_3")] view5 ``` ### Working with views Views maintain access to all original data slots: ```{r view-properties} # Access dimensions dim(view3) nrow(view3) ncol(view3) # Access names rownames(view3) colnames(view3) # Access data matrices and metadata view3$X view3$obs view3$var # Views can be converted to concrete implementations concrete <- view3$as_InMemoryAnnData() concrete ``` # Citing _anndataR_ If you use `r Biocpkg("anndataR")` in your work, please cite [_"anndataR improves interoperability between R and Python in single-cell transcriptomics"_](https://doi.org/10.1101/2025.08.18.669052): ```{r citation} citation("anndataR") ``` # Session info ```{r sessioninfo} sessionInfo() ```