---
title: "Using UCell with SingleCellExperiment"
author:
- name: Massimo Andreatta
  affiliation: Ludwig Institute for Cancer Research, Lausanne Branch, and Department of Oncology, CHUV and University of Lausanne, Epalinges 1066, Switzerland; and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- name: Santiago J. Carmona
  affiliation: Ludwig Institute for Cancer Research, Lausanne Branch, and Department of Oncology, CHUV and University of Lausanne, Epalinges 1066, Switzerland; and Swiss Institute of Bioinformatics, Lausanne, Switzerland
output:
  BiocStyle::html_document:
    toc_float: true
  BiocStyle::pdf_document: default
package: UCell
vignette: |
  %\VignetteIndexEntry{2. Using UCell with SingleCellExperiment}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

[SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) is Bioconductor's data structure of choice for storing single-cell experiment data. The function `ScoreSignatures_UCell()` allows performing signature scoring with UCell directly on `sce` objects. UCell scores are returned in a altExp object: `altExp(sce, 'UCell')`

# Get some testing data

For this demo, we will download a single-cell dataset of lung cancer ([Zilionis et al. (2019) Immunity](https://pubmed.ncbi.nlm.nih.gov/30979687/)) through the [scRNA-seq](https://bioconductor.org/packages/3.15/data/experiment/html/scRNAseq.html) package. This dataset contains >170,000 single cells; for the sake of simplicity, in this demo will we focus on immune cells, according to the annotations by the authors, and downsample to 5000 cells.

```{r message=F, warning=F, results=F}
library(scRNAseq)
lung <- ZilionisLungData()
immune <- lung$Used & lung$used_in_NSCLC_immune
lung <- lung[,immune]
lung <- lung[,1:5000]

exp.mat <- Matrix::Matrix(counts(lung),sparse = TRUE)
colnames(exp.mat) <- paste0(colnames(exp.mat), seq(1,ncol(exp.mat)))
```

# Define gene signatures

Here we define some simple gene sets based on the "Human Cell Landscape" signatures [Han et al. (2020) Nature](https://www.nature.com/articles/s41586-020-2157-4). You may edit existing signatures, or add new one as elements in a list.
```{r}
signatures <- list(
    Tcell = c("CD3D","CD3E","CD3G","CD2","TRAC"),
    Myeloid = c("CD14","LYZ","CSF1R","FCER1G","SPI1","LCK-"),
    NK = c("KLRD1","NCR1","NKG7","CD3D-","CD3E-"),
    Plasma_cell = c("MZB1","DERL3","CD19-")
)
```

# Run UCell on sce object

```{r message=F, warning=F}
library(UCell)
library(SingleCellExperiment)

sce <- SingleCellExperiment(list(counts=exp.mat))
sce <- ScoreSignatures_UCell(sce, features=signatures, 
                                 assay = 'counts', name=NULL)
altExp(sce, 'UCell')
```

Dimensionality reduction and visualization
```{r message=F, warning=F}
library(scater)
library(patchwork)
#PCA
sce <- logNormCounts(sce)
sce <- runPCA(sce, scale=TRUE, ncomponents=10)

#UMAP
set.seed(1234)
sce <- runUMAP(sce, dimred="PCA")
```

Visualize UCell scores on low-dimensional representation (UMAP)
```{r fig.width=7, fig.height=5, dpi=60}
pll <- lapply(names(signatures), function(x) {
    plotUMAP(sce, colour_by = x, by_exprs_values = "UCell",
             point_size=0.2) + theme(aspect.ratio = 1)
})
wrap_plots(pll)
```

# Signature smoothing

Single-cell data are sparse. It can be useful to 'impute' scores by neighboring cells and partially correct this sparsity. The function `SmoothKNN` performs smoothing of single-cell scores by weighted average of the k-nearest neighbors in a given dimensionality reduction. It can be applied directly on SingleCellExperiment objects to smooth UCell scores:

```{r}
sce <- SmoothKNN(sce, signature.names = names(signatures), reduction="PCA")
```

```{r fig.wide=TRUE, dpi=60}
a <- plotUMAP(sce, colour_by="Myeloid", by_exprs_values="UCell",
         point_size=0.2) + ggtitle("UCell") + theme(aspect.ratio = 1)

b <- plotUMAP(sce, colour_by="Myeloid_kNN", by_exprs_values="UCell_kNN",
         point_size=0.2) + ggtitle("Smoothed UCell") + theme(aspect.ratio = 1)

a | b
```

# Resources

Please report any issues at the [UCell GitHub repository](https://github.com/carmonalab/UCell).

More demos available on the [Bioc landing page](https://bioconductor.org/packages/release/bioc/html/UCell.html) and at the [UCell demo repository](https://github.com/carmonalab/UCell_demo).

If you find UCell useful, you may also check out the [scGate package](https://github.com/carmonalab/scGate), which relies on UCell scores to automatically purify populations of interest based on gene signatures.

See also [SignatuR](https://github.com/carmonalab/SignatuR) for easy storing and retrieval of gene signatures.

# References

* Andreatta, M., Carmona, S. J. (2021) *UCell: Robust and scalable single-cell gene signature scoring* Computational and Structural Biotechnology Journal
* Zilionis, R., Engblom, C., ..., Klein, A. M. (2019) *Single-Cell Transcriptomics of Human and Mouse Lung Cancers Reveals Conserved Myeloid Populations across Individuals and Species* Immunity
* Amezquita, Robert A., et al. (2020) *"Orchestrating single-cell analysis with Bioconductor."* Nature methods

# Session Info

```{r}
sessionInfo()
```