--- title: "Real Time Annotation" date: "2020-10-11" package: peakPantheR output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Real Time Annotation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{peakPantheR,faahKO,pander,BiocStyle} %\VignettePackage{peakPantheR} %\VignetteKeywords{mass spectrometry, metabolomics} --- ```{r biocstyle, echo = FALSE, results = "asis" } BiocStyle::markdown() ``` ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **Package**: `r Biocpkg("peakPantheR")`
**Authors**: Arnaud Wolfer
```{r init, message = FALSE, echo = FALSE, results = "hide" } ## Silently loading all packages library(BiocStyle) library(peakPantheR) library(faahKO) library(pander) ``` # Introduction The `peakPantheR` package is designed for the detection, integration and reporting of pre-defined features in MS files (_e.g. compounds, fragments, adducts, ..._). The **Real Time Annotation** is set to detect and integrate **multiple** compounds in **one** file at a time. It therefore can be deployed on a LC-MS instrument to integrate a set of pre-defined features (_e.g. spiked standards_) as soon as the acquisition of a sample is completed. Using the `r Biocpkg("faahKO")` raw MS dataset as an example, this vignette will: * Detail the **Real Time Annotation** concept * Apply the **Real Time Annotation** to a subset of pre-defined features in the `r Biocpkg("faahKO")` dataset ## Abbreviations - **ROI**: _Regions Of Interest_ * reference _RT_ / _m/z_ windows in which to search for a feature - **uROI**: _updated Regions Of Interest_ * modifed ROI adapted to the current dataset which override the reference ROI - **FIR**: _Fallback Integration Regions_ * _RT_ / _m/z_ window to integrate if no peak is found - **TIC**: _Total Ion Chromatogram_ * the intensities summed across all masses for each scan - **EIC**: _Extracted Ion Chromatogram_ * the intensities summed over a mass range, for each scan # Real Time Annotation Concept Real time compound integration is set to process **multiple** compounds in **one** file at a time. To achieve this, `peakPantheR` will: * load a list of expected _RT_ / _m/z_ regions of interest (**ROI**) * detect features in each ROI and keep the highest intensity one * determine peak statistics for each feature * return: + TIC + a table with all detected compounds for that file (_row: compound, col: statistic_) + EIC for each ROI + sample acquisition date-time from the mzML metadata (_if available_) + save EIC plots to disk # Real Time Annotation Example In the following example we will target two pre-defined features in a single raw MS spectra file from the `r Biocpkg("faahKO")` package. For more details on the installation and input data employed, please consult the [Getting Started with peakPantheR](getting-started.html) vignette. ## Input Data The path to a MS file from the `r Biocpkg("faahKO")` is located and used as input spectra: ```{r} library(faahKO) ## file paths input_spectraPath <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO")) input_spectraPath ``` Two targeted features (_e.g. compounds, fragments, adducts, ..._) are defined and stored in a table with as columns: * `cpdID` (numeric) * `cpdName` (character) * `rtMin` (sec) * `rtMax` (sec) * `rt` (sec, optional / `NA`) * `mzMin` (m/z) * `mzMax` (m/z) * `mz` (m/z, optional / `NA`) ```{r, eval = FALSE} # targetFeatTable input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", "mz", "mzMax"))), stringsAsFactors=FALSE) input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222) input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962) input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], as.numeric) ``` ```{r, results = "asis", echo = FALSE} # use pandoc for improved readability input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", "mz", "mzMax"))), stringsAsFactors=FALSE) input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222) input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962) input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], as.numeric) rownames(input_targetFeatTable) <- NULL pander::pandoc.table(input_targetFeatTable, digits = 9) ``` ## Run Single File Annotation `peakPantheR_singleFileSearch()` takes as input a `singleSpectraDataPath` pointing to the file to process and `targetFeatTable` defining the features to integrate. The resulting annotation contains all the fitting and integration properties: ```{r} library(peakPantheR) annotation <- peakPantheR_singleFileSearch( singleSpectraDataPath = input_spectraPath, targetFeatTable = input_targetFeatTable, peakStatistic = TRUE, curveModel = 'skewedGaussian', verbose = TRUE) ``` ```{r} annotation$TIC ``` ```{r} ## acquisition time cannot be extracted from NetCDF files annotation$acquTime ``` ```{r, eval = FALSE} annotation$peakTable ``` ```{r, results = "asis", echo = FALSE} # use pandoc for improved readability pander::pandoc.table(annotation$peakTable, digits = 7) ``` ```{r} annotation$curveFit ``` ```{r} annotation$ROIsDataPoint ``` `peakPantheR_singleFileSearch()` takes multiple parameters that can alter the file annotation: * `peakStatistic` if `TRUE` calculates additional peak statistics: _'ppm_error'_, _'rt_dev_sec'_, _'tailing factor'_ and _'asymmetry factor'_ * `plotEICsPath` if not `NA` will save a `.png` of all ROI EICs at the path provided (expects `'filepath/filename.png'` for example). If `NA` no plot is saved * `getAcquTime` if `TRUE` the sample acquisition date-time is extracted from the `mzML` metadata. Acquisition time cannot be extracted from other file formats. The additional file access will impact run time * `FIR` if not `NULL`, defines the Fallback Integration Regions (**FIR**) to integrate when a feature is not found. * `curveModel`, defines the peak-shape model to fit to each EIC. By default, a _'skewedGaussian'_ model is used. The other alternative is the exponentially modified gaussian _'emgGaussian'_ model. * `verbose` if `TRUE` messages calculation progress, time taken and number of features found (_total and matched to targets_) * `...` passes arguments to `findTargetFeatures` to alter peak-picking parameters (e.g. the curveModel, the sampling or fitting parameters) The summary plot generated by `plotEICsPath`, corresponding to the EICs of each integrated regions of interest is as follow: ```{r, out.width = "700px", echo = FALSE} knitr::include_graphics("../man/figures/singleFileSearch_EICsPlot.png") ``` > EICs plot: Each panel correspond to a targeted feature, with the EIC extracted on the `mzMin`, `mzMax` range found. The red dot marks the RT peak apex, and the red line highlights the RT peakwidth range found (`rtMin`, `rtMax`) # See Also * [Getting Started with peakPantheR](getting-started.html) * [Parallel Annotation](parallel-annotation.html) * [Graphical user interface use](peakPantheR-GUI.html) # Session Information ```{r, echo = FALSE} devtools::session_info() ```