---
title: "Getting Started with the peakPantheR package"
date: "2019-10-01"
package: peakPantheR
output:
BiocStyle::html_document:
toc_float: true
bibliography: references.bib
vignette: >
%\VignetteIndexEntry{Getting Started with the peakPantheR package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{peakPantheR,faahKO,pander,BiocStyle}
%\VignettePackage{peakPantheR}
%\VignetteKeywords{mass spectrometry, metabolomics}
---
```{r biocstyle, echo = FALSE, results = "asis" }
BiocStyle::markdown()
```
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
**Package**: `r Biocpkg("peakPantheR")`
**Authors**: Arnaud Wolfer
```{r init, message = FALSE, echo = FALSE, results = "hide" }
## Silently loading all packages
library(BiocStyle)
library(peakPantheR)
library(faahKO)
library(pander)
```
Package for _Peak Picking and ANnoTation of High resolution Experiments in R_,
implemented in `R` and `Shiny`
# Overview
`peakPantheR` implements functions to detect, integrate and report pre-defined
features in MS files (_e.g. compounds, fragments, adducts, ..._).
It is designed for:
* **Real time** feature detection and integration (see
[Real Time Annotation](real-time-annotation.html))
+ process `multiple` compounds in `one` file at a time
* **Post-acquisition** feature detection, integration and reporting (see
[Parallel Annotation](parallel-annotation.html))
+ process `multiple` compounds in `multiple` files in `parallel`, store
results in a `single` object
`peakPantheR` can process LC/MS data files in _NetCDF_, _mzML_/_mzXML_ and
_mzData_ format as data import is achieved using Bioconductor's
`r Biocpkg("mzR")` package.
# Installation
To install `peakPantheR` from Bioconductor:
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("peakPantheR")
```
Install the development version of `peakPantheR` directly from GitHub with:
```{r, eval = FALSE}
# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("phenomecentre/peakPantheR")
```
# Input Data
Both real time and parallel compound integration require a common set of
information:
* Path(s) to `netCDF` / `mzML` MS file(s)
* An expected region of interest (`RT` / `m/z` window) for each compound.
## MS files
For demonstration purpose we can annotate a set a set of raw MS spectra (in
_NetCDF_ format) provided by the `r Biocpkg("faahKO")` package. Briefly, this
subset of the data from [@Saghatelian04] invesigate the metabolic consequences
of knocking out the fatty acid amide hydrolase (FAAH) gene in mice. The dataset
consists of samples from the spinal cords of 6 knock-out and 6 wild-type mice.
Each file contains data in centroid mode acquired in positive ion mode form
200-600 m/z and 2500-4500 seconds.
Below we install the `r Biocpkg("faahKO")` package and locate raw CDF files of
interest:
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("faahKO")
```
```{r}
library(faahKO)
## file paths
input_spectraPaths <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
```
## Expected regions of interest
Expected regions of interest (targeted features) are specified using the
following information:
* `cpdID` (numeric)
* `cpdName` (character)
* `rtMin` (sec)
* `rtMax` (sec)
* `rt` (sec, optional / `NA`)
* `mzMin` (m/z)
* `mzMax` (m/z)
* `mz` (m/z, optional / `NA`)
Below we define 2 features of interest that are present in the
`r Biocpkg("faahKO")` dataset and can be employed in subsequent vignettes:
```{r, eval=FALSE}
# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(),
c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin",
"mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c(1, "Cpd 1", 3310., 3344.888, 3390., 522.194778,
522.2, 522.205222)
input_targetFeatTable[2,] <- c(2, "Cpd 2", 3280., 3385.577, 3440., 496.195038,
496.2, 496.204962)
input_targetFeatTable[,c(1,3:8)] <- sapply(input_targetFeatTable[,c(1,3:8)],
as.numeric)
```
```{r, results = "asis", echo = FALSE}
# use pandoc for improved readability
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(),
c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin",
"mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c(1, "Cpd 1", 3310., 3344.888, 3390., 522.194778,
522.2, 522.205222)
input_targetFeatTable[2,] <- c(2, "Cpd 2", 3280., 3385.577, 3440., 496.195038,
496.2, 496.204962)
input_targetFeatTable[,c(1,3:8)] <- sapply(input_targetFeatTable[,c(1,3:8)],
as.numeric)
rownames(input_targetFeatTable) <- NULL
pander::pandoc.table(input_targetFeatTable, digits = 9)
```
# See Also
* [Real Time Annotation](real-time-annotation.html)
* [Parallel Annotation](parallel-annotation.html)
# References