---
title: "Working With Human Cell Atlas Manifests"
author: "Maya Reed McDaniel"
date: "September 2nd, 2021"
output: BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{Working With Human Cell Atlas Manifests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE
)
```

# Motivation & Introduction

The purpose of this vignette is to explore the file manifests
available from the [Human Cell Atlas][] project.

These files provide a metadata summary for a collection of files in a
tabular format, including but not limited to information about process
and workflow used to generate the file, information about the
specimens the file data were derived from, and identifiers connect
specific projects, files, and specimens.

The [WARP][] (WDL Analysis Research Pipelines) repository contains
information on a variety of pipelines, and can be used alongside a
manifest to better understand the metadata.

[Human Cell Atlas]: https://data.humancellatlas.org/
[WARP]: https://broadinstitute.github.io/warp/docs/get-started

## Installation and getting started

Evaluate the following code chunk to install packages required for
this vignette.

```{r install, eval = FALSE}
## install from Bioconductor if you haven't already
pkgs <- c("LoomExperiment", "hca")
pkgs_needed <- pkgs[!pkgs %in% rownames(installed.packages())]
BiocManager::install(pkgs_needed)
```

Load the packages into your _R_ session.

```{r setup, message = FALSE}
library(dplyr)
library(SummarizedExperiment)
library(LoomExperiment)
library(hca)
```

# Example: manifests

The manifest for all files available can be obtained with (this can
takes several minutes to complete)

```{r, eval = FALSE}
default_manifest_tbl <- hca::manifest()
default_manifest_tbl
```

This is seldom useful; instead, create a filter identifying the files
of interest.

```{r}
manifest_filter <- hca::filters(
    projectId = list(is = "4a95101c-9ffc-4f30-a809-f04518a23803"),
    fileFormat = list(is = "loom"),
    workflow = list(is = c("optimus_v4.2.2", "optimus_v4.2.3"))
)
```

Retrieve the manifest

```{r}
manifest_tibble <- hca::manifest(filters = manifest_filter)
manifest_tibble
```

And perform additional filtering, e.g., identifying the specimen
organs represented in the files.

```{r}
manifest_tibble |>
    dplyr::count(specimen_from_organism.organ)
```

# Example: Using manifest data to select files

- view the files described in `manifest_tibble` and select one for download

```{r}
manifest_tibble
```

- select a file for which more than one specimen contributes

```{r}
file_uuid <- "24a8a323-7ecd-504e-a253-b0e0892dd730"
```

- obtain the `file_hca_tbl` for the file based on it's uuid

```{r}
file_filter <- hca::filters(
    fileId = list(is = file_uuid)
)

file_tbl <- hca::files(filters = file_filter)

file_tbl
```

- download the file and obtain it's file path

```{r}
file_location <-
    file_tbl |>
    hca::files_download()
file_location
```

- import the file as a `LoomExperiment` object

```{r}
loom <- LoomExperiment::import(file_location)
metadata(loom) |>
    dplyr::glimpse()
colData(loom) |>
    dplyr::as_tibble() |>
    dplyr::glimpse()
```

# Example: Using manifest data to annotate a `.loom` file

The function `optimus_loom_annotation()` takes in the file path of a
`.loom` file generated by the [Optimus pipeline][] and returns a
`LoomExperiment` object whose `colData` has been annotated with
additional specimen data extracted from a manifest.

[Optimus pipeline]: https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README

```{r}
annotated_loom <- optimus_loom_annotation(file_location)
annotated_loom


## new metadata
setdiff(
    names(metadata(annotated_loom)),
    names(metadata(loom))
)
metadata(annotated_loom)$manifest

## new colData columns
setdiff(
    names(colData(annotated_loom)),
    names(colData(loom))
)
```

# Session info

```{r sessionInfo}
sessionInfo()
```