---
title: "Quality Control"
date: "`r BiocStyle::doc_date()`"
package: sesame
output: BiocStyle::html_document
fig_width: 6
fig_height: 5
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{1. Quality Control}
  %\VignetteEncoding{UTF-8}
---

```{r message=FALSE, warning=FALSE, results="hide"}
library(sesame)
sesameDataCache()
```

# Calculate Quality Metrics

The main function to calculate the quality metrics is `sesameQC_calcStats`.
This function takes a SigDF, calculates the QC statistics, and returns a single
S4 `sesameQC` object, which can be printed directly to the console. To calculate
QC metrics on a given list of samples or all IDATs in a folder, one can use
`sesameQC_calcStats` within the standard `openSesame` pipeline. When used with
`openSesame`, a list of `sesameQC`s will be returned. Note that one should turn
off preprocessing using `prep=""`:

```{r qc1, eval=FALSE}
## calculate metrics on all IDATs in a specific folder
sesameQCtoDF(openSesame(idat_dir, prep="", func=sesameQC_calcStats))
## or a list of prefixes, with parallel processing
sesameQCtoDF(openSesame(sprintf("%s/%s", idat_dir, idat_prefixes), prep="",
    func=sesameQC_calcStats, BPPARAM=BiocParallel::MulticoreParam(24)))
```

The results display `frac_dt_cg`, `RGratio`, `RGdistort` by default. For other
QC metrics, SeSAMe divides sample quality metrics into multiple groups. These
groups are listed below and can be referred to by short keys. For example,
"intensity" generates signal intensity-related quality metrics.

```{r echo=FALSE}
library(knitr)
kable(data.frame(
    "Short Key" = c(
        "detection",
        "numProbes",
        "intensity",
        "channel",
        "dyeBias",
        "betas"),
    "Description" = c(
        "Signal Detection",
        "Number of Probes",
        "Signal Intensity",
        "Color Channel",
        "Dye Bias",
        "Beta Value")))
```

By default, `sesameQC_calcStats` calculates all QC groups. To save time, one
can compute a specific QC group by specifying one or multiple short keys in
the `funs=` argument:

```{r qc2}
sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] # get two examples
## only compute signal detection stats
qcs = openSesame(sdfs, prep="", func=sesameQC_calcStats, funs="detection")
qcs[[1]]
```

> We consider signal detection the most important QC metric.

One can retrieve the actual stat numbers from `sesameQC` using the
sesameQC_getStats (the following generates the fraction of probes with
detection success):

```{r qc3}
sesameQC_getStats(qcs[[1]], "frac_dt")
```

After computing the QCs, one can optionally combine the `sesameQC` objects into
a data frame for easy comparison.

```{r qc4}
## combine a list of sesameQC into a data frame
head(do.call(rbind, lapply(qcs, as.data.frame)))
```

Note that when the input is an `SigDF` object, calling `sesameQC_calcStats`
within `openSesame` and as a standalone function are equivalent.

```{r qc5, message=FALSE}
sdf <- sesameDataGet('EPIC.1.SigDF')
qc = openSesame(sdf, prep="", func=sesameQC_calcStats, funs=c("detection"))
## equivalent direct call
qc = sesameQC_calcStats(sdf, c("detection"))
qc
```

# Rank Quality Metrics

```{r qc6, echo=FALSE}
options(rmarkdown.html_vignette.check_title = FALSE)
```

SeSAMe features comparison of your sample with public data sets. The
`sesameQC_rankStats()` function ranks the input `sesameQC` object with
`sesameQC` calculated from public datasets. It shows the rank percentage of the
input sample as well as the number of datasets compared.

```{r qc7}
sdf <- sesameDataGet('EPIC.1.SigDF')
qc <- sesameQC_calcStats(sdf, "intensity")
qc
sesameQC_rankStats(qc, platform="EPIC")
```

# Quality Control Plots

SeSAMe provides functions to create QC plots. Some functions takes sesameQC as
input while others directly plot the SigDF objects. Here are some examples:

- `sesameQC_plotBar()` takes a list of sesameQC objects and creates bar
plot for each metric calculated.

- `sesameQC_plotRedGrnQQ()` graphs the dye bias between the two color channels.

- `sesameQC_plotIntensVsBetas()` plots the relationship between $\beta$ values
  and signal intensity and can be used to diagnose artificial readout and
  influence of signal background.

- `sesameQC_plotHeatSNPs()` plots SNP probes and can be used to detect sample
  swaps.

More about quality control plots can be found in [Supplemental
Vignette](https://zhou-lab.github.io/sesame/v1.16/supplemental.html#qc).

# Session Info

```{r}
sessionInfo()
```