---
title: "ASICS User's Guide"
author: "Gaëlle Lefort, Rémi Servien and Nathalie Vialaneix"
date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{ASICS}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---


This user's guide provides an overview of the package `ASICS`. `ASICS` is a 
fully automated procedure to identify and quantify metabolites in $^1$H 1D-NMR 
spectra of biological mixtures (Tardivel *et al.*, 2017). It will enable 
empowering NMR-based metabolomics by quickly and accurately helping experts to 
obtain metabolic profiles. In addition to the quantification method, several 
functions allowing spectrum preprocessing or statistical analyses of quantified 
metabolites are available.

```{r ASICSload, warning=FALSE}
library(ASICS)
library(ASICSdata)
```

# Dataset

In this user's guide, a subset of the public datasets from Salek *et al.* (2007)
is used. The experiment has been designed to improve the understanding of early 
stage of type 2 diabetes mellitus (T2DM) development. 
In the dataset used, $^1$H-NMR human metabolome was obtained from 25 healthy 
volunteers and 25 T2DM patients. Raw 1D Bruker spectral data files were found in 
the MetaboLights database (https://www.ebi.ac.uk/metabolights/, study MTBLS1). 

# Parallel environment

For most time consumming functions, a parallel implementation is available for
unix-like OS using the BiocParallel package of Bioconductor. The number of used 
cores is set with the option `ncores` of the corresponding functions (default 
to `1`, no parallel environment).

# Library of pure NMR metabolite spectrum

An object of class `PureLibrary` with spectra of pure metabolites is required to
perform the quantification. Such a reference library is provided in `ASICS` with
191 pure metabolite spectra. These spectra are metabolite spectra used as 
references for quantification: only metabolites that are present in the library 
object can be identified and quantified with `ASICS`.

The default library is automatically loaded at package start. Available 
metabolites are displayed with:
```{r lib}
head(getSampleName(pure_library), n = 8)
```

This library can be complemented or another library can be created with new 
spectra of pure metabolites. These spectra are imported from Bruker files and
a new library can be created with:
```{r import_create_lib, results='hide'}
pure_spectra <- importSpectraBruker(system.file("extdata", "example_library", 
                                                package = "ASICS"))
new_pure_library <- createPureLibrary(pure_spectra, 
                                        nb.protons = c(5, 4))
```
A new library can also be created from txt or csv files, with samples in columns 
and chemical shifts in rows (see help page of `createPureLibrary` function for
all details).


The newly created library can be used for quantification or merged with another 
one:
```{r select_merge_lib}
merged_pure_library <- c(pure_library[1:10], new_pure_library)
```
The `PureLibrary` `merged_pure_library` contains the first ten spectra of the
default library and the two newly imported spectra.



# Identification and quantification of metabolites with ASICS

First, data are imported in a data frame from Bruker files with the 
`importSpectraBruker` function. These spectra are baseline corrected 
(Wang *et al*, 2013) and normalised by the area under the curve.
```{r import_from_Bruker, results='hide'}
spectra_data <- importSpectraBruker(system.file("extdata", 
                                                "Human_diabetes_example", 
                                                package = "ASICSdata"))
```

Data can also be imported from other file types with `importSpectra` function. 
The only constraint is to have a data frame with spectra in columns 
(column names are sample names) and chemical shifts in rows (row names 
correspond to the ppm grid).
```{r import_from_txt, results='hide'}
diabetes <- system.file("extdata", package = "ASICSdata")
spectra_data_txt <- importSpectra(name.dir = diabetes, 
                                  name.file = "spectra_diabetes_example.txt",
                                  type = "txt")
```

Several functions for the preprocessing of spectra are also available: 
normalisation and alignment on a reference spectrum 
(based on Vu et al. (2011)). 

Many types of normalisation are available. By default, spectra are normalised
to a constant sum (`type.norm = "CS"`). Otherwise, a normalisation method 
implemented in the `PepsNMR` package could be used. For example:
```{r normalisation}
spectra_norm <- normaliseSpectra(spectra_data_txt, type.norm = "pqn")
```

The alignment algorithm is based on Vu et al. (2011). To find the reference 
spectrum, the FFT cross-correlation is used. Then the 
alignment is performed using the FFT cross-correlation and a hierarchical
classification.
```{r alignment, results='hide', cache=TRUE}
spectra_align <- alignSpectra(spectra_norm)
```

Finally, from the data frame, a `Spectra` object is created. This is a required 
step for the quantification.
```{r create_spectra, results='hide'}
spectra_obj <- createSpectra(spectra_align)
```

Identification and quantification of metabolites can now be carried out using 
only the function `ASICS`. All the steps described in the following figure are
included:

![Steps of the quantification workflow](./workflow_quantif.png)

Recently, new methods for reference library alignment and metabolite 
quantification were added. Thus, multiple scenarios can be performed:

![Scenarios available in ASICS](./scenarii.png)
The method provided in the first version of the package is given in red. It can
now be used by setting `joint.align = FALSE` and `quantif.method = "FWER"`. To
perform a joint alignment (blue, green and yellow scenarios), `joint.align`
needs to be set to `TRUE`. The yellow scenario that performs joint 
quantification based on a simple joint alignment is obtained by additionally 
setting `quantif.method = "Lasso"`. Finally, the green scenario performs a joint
quantification using metabolites identified with a first step consisting of 
independent quantification. It is obtained by setting `quantif.method = "both"`.

With `quantif.method = "both"`, the number of identified metabolites can be
controlled using `clean.thres`. If `clean.thres = 10`, only the metabolites 
identified in at least 10% of the complex spectra (during the first independant
quantification step) are used in the joint quantification. 

More details on these new algorithms can be found in Lefort *et al.* (2020).

`ASICS` function takes approximately 2 minutes per 
spectrum to run. To control randomness in the algorithm (used in the estimation 
of the significativity of a given metabolite concentration), the `set.seed` 
parameter can be used.

```{r ASICS, results='hide', cache=TRUE}
# part of the spectrum to exclude (water and urea)
to_exclude <- matrix(c(4.5, 5.1, 5.5, 6.5), ncol = 2, byrow = TRUE)
ASICS_results <- ASICS(spectra_obj, exclusion.areas = to_exclude)
```

Summary of ASICS results:
```{r summary_res}
ASICS_results
```

The quality of the results can be assessed by stacking the original and 
the reconstructed spectra on one plot. A pure metabolite spectrum can also be 
added for visual comparison. For example, the first spectrum with Creatinine:
```{r plot_spectrum, warning=FALSE, fig.width=12, fig.height=8}
plot(ASICS_results, idx = 1, xlim = c(2.8, 3.3), add.metab = "Creatinine")
```

Relative concentrations of identified metabolites are saved in a data frame
accessible via the `get_quantification` function:
```{r rel_conc}
head(getQuantification(ASICS_results), 10)[, 1:2]
```



# Analysis on relative quantifications

Some analysis functions are also available in `ASICS`.

First, a design data frame is imported. In this data frame, the first column 
needs to correspond to sample names of all spectra. 
```{r design}
design <- read.table(system.file("extdata", "design_diabete_example.txt", 
                                 package = "ASICSdata"), header = TRUE)
```

Then, a preprocessing is performed on relative quantifications: metabolites with
more than 75% of null quantifications are removed as well as two samples that 
are considered as outliers.
```{r analyses_obj}
analysis_data <- formatForAnalysis(getQuantification(ASICS_results),
                                   design = design, zero.threshold = 75,
                                   zero.group = "condition", 
                                   outliers = c("ADG10003u_007", 
                                                "ADG19007u_163"))
```

To explore results of ASICS quantification, a PCA can be performed on results of
preprocessing with: 
```{r pca, fig.width=10}
resPCA <- pca(analysis_data)
plot(resPCA, graph = "ind", col.ind = "condition")
plot(resPCA, graph = "var")
```


It is also possible to find differences between two conditions with an OPLS-DA 
(Thevenot *et al*, 2015) or with Kruskall-Wallis tests:
```{r oplsda}
resOPLSDA <- oplsda(analysis_data, condition = "condition", orthoI = 1)
resOPLSDA
```

```{r oplsda_plot, fig.width=10, results='hide'}
plot(resOPLSDA)
```

Results of Kruskall-Wallis tests and Benjamini-Hochberg correction:
```{r perform_tests}
resTests <- kruskalWallis(analysis_data, "condition")
resTests
```

```{r test_plot, fig.width=10, fig.height=10, results='hide'}
plot(resTests)
```

# Analysis on buckets

An analysis on buckets can also be performed. An alignment is required before 
the spectrum bucketing:
```{r align_and_binning, results='hide', cache=TRUE}
spectra_align <- alignSpectra(spectra_norm)
spectra_bucket <- binning(spectra_align)
```

Alignment visualization:
```{r plot_align}
spectra_obj_align <- createSpectra(spectra_align)

plotAlignment(spectra_obj, xlim = c(3.5,4))
plotAlignment(spectra_obj_align, xlim = c(3.5,4))
```

Then, a `SummarizedExperiment` object is created with the `formatForAnalysis` 
function as for quantification:
```{r create_ana_obj}
analysis_data_bucket <- formatForAnalysis(spectra_bucket, design = design,
                                          zero.threshold = 75)
```

Finally, all analyses can be carried out on this object with the parameter 
`type.data` set to `buckets`. For example, the OPLS-DA is performed with:
```{r oplsda_buckets}
resOPLSDA_buckets <- oplsda(analysis_data_bucket, condition = "condition",
                            type.data = "buckets")
resOPLSDA_buckets
```

Moreover, another plot with the median spectrum and OPLS-DA results can be 
produced with the option `graph = "buckets"`:
```{r oplsda_buckets_plot, fig.width=10}
plot(resOPLSDA_buckets, graph = "buckets")
```



# References 

Lefort G., Liaubet L., Marty-Gasset N., Canlet C., Vialaneix N., Servien R. \emph{Joint automatic metabolite identification and quantification of a set of $^1$H~NMR spectra}. 2020. Pre-print, https://www.biorxiv.org/content/10.1101/2020.10.08.331090v1.


Tardivel P., Canlet C., Lefort G., Tremblay-Franco M., Debrauwer L., Concordet 
D., Servien R. (2017). ASICS: an automatic method for identification and 
quantification of metabolites in complex 1D 1H NMR spectra. *Metabolomics*,
**13**(10), 109. https://doi.org/10.1007/s11306-017-1244-5

Salek, R. M., Maguire, M. L., Bentley, E., Rubtsov, D. V., Hough, T., 
Cheeseman, M., ... & Connor, S. C. (2007). A metabolomic comparison of urinary 
changes in type 2 diabetes in mouse, rat, and human. *Physiological genomics*, 
**29**(2), 99-108.

Wang, K. C., Wang, S. Y., Kuo, C. H., Tseng, Y. J. (2013). Distribution-based 
classification method for baseline correction of metabolomic 1D proton nuclear 
magnetic resonance spectra. *Analytical Chemistry*, **85**(2), 1231–1239.


Vu, T. N., Valkenborg, D., Smets, K., Verwaest, K. A., Dommisse, R., Lemiere, 
F., ... & Laukens, K. (2011). An integrated workflow for robust alignment and 
simplified quantitative analysis of NMR spectrometry data. 
*BMC bioinformatics*, **12**(1), 405.

Thevenot, E.A., Roux, A., Xu, Y., Ezan, E., Junot, C. 2015. Analysis of the 
human adult urinary metabolome variations with age, body mass index and gender 
by implementing a comprehensive workflow for univariate and OPLS statistical 
analyses. *Journal of Proteome Research*. **14**, 3322-3335.

# Session information

This user's guide has been created with the following system configuration:
```{r sysinfo}
sessionInfo()
```