---
title: "LC-MS/MS data analysis with xcms"
package: xcms
output:
BiocStyle::html_document:
toc_float: true
includes:
in_header: xcms-lcms-ms.bioschemas.html
vignette: >
%\VignetteIndexEntry{LC-MS/MS data analysis with xcms}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{xcms,msdata,BiocStyle,pander,Spectra,MsBackendMgf,MetaboCoreUtils}
%\VignettePackage{xcms}
%\VignetteKeywords{mass spectrometry, metabolomics}
bibliography: references.bib
csl: biomed-central.csl
---
```{r biocstyle, echo = FALSE, results = "asis"}
BiocStyle::markdown()
```
**Package**: `r Biocpkg("xcms")`
**Authors**: Johannes Rainer, Michael Witting
**Modified**: `r file.info("xcms-lcms-ms.Rmd")$mtime`
**Compiled**: `r date()`
```{r init, message = FALSE, echo = FALSE, results = "hide"}
## Silently loading all packages
library(BiocStyle)
library(xcms)
library(Spectra)
library(pander)
register(SerialParam())
```
# Introduction
Metabolite identification is an important step in non-targeted metabolomics and
requires different steps. One involves the use of tandem mass spectrometry to
generate fragmentation spectra of detected metabolites (LC-MS/MS), which are
then compared to fragmentation spectra of known metabolites. Different
approaches exist for the generation of these fragmentation spectra, whereas the
most used is data dependent acquisition (DDA) also known as the top-n method. In
this method the top N most intense ions (m/z values) from a MS1 scan are
selected for fragmentation in the next N scans before the cycle starts
again. This method allows to generate clean MS2 fragmentation spectra on the fly
during acquisition without the need for further experiments, but suffers from
poor coverage of the detected metabolites (since only a limited number of ions
are fragmented) and less accurate quantification of the compounds (since fewer
MS1 scans are generated).
Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters
MSe don't use such a preselection, but rather fragment all detected molecules at
once. They are using alternating schemes with scan of low and high collision
energy to collect MS1 and MS2 data. Using this approach, there is no problem in
coverage, but the relation between the precursor and fragment masses is lost
leading to chimeric spectra. Sequential Window Acquisition of all Theoretical
Mass Spectra (or SWATH [@Ludwig:2018hv]) combines both approaches through a
middle-way approach. There is no precursor selection and acquisition is
independent of acquired data, but rather than isolating all precusors at once,
defined windows (i.e. ranges of m/z values) are used and scanned. This reduces
the overlap of fragment spectra while still keeping a high coverage.
This document showcases the analysis of two small LC-MS/MS data sets using
`r Biocpkg("xcms")`. The data files used are reversed-phase LC-MS/MS runs from the
Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH
acquisition mode. For comparison a DDA file from the same sample is included.
# Analysis of DDA data
Below we load the example DDA data set and create a total ion chromatogram of
its MS1 data.
```{r load-dda-data, message = FALSE}
library(xcms)
library(MsExperiment)
dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
dda_data <- readMsExperiment(dda_file)
chr <- chromatogram(dda_data, aggregationFun = "sum", msLevel = 1L)
```
According to the TIC most of the signal is measured between ~ 200 and 600
seconds (see plot below). We thus filter the DDA data to this retention time
range.
```{r, fig.caption = "Total ion chromatogram (MS1) of the DDA data."}
plot(chr)
abline(v = c(230, 610))
## filter the data
dda_data <- filterRt(dda_data, rt = c(230, 610))
```
The variable `dda_data` contains now all MS1 and MS2 spectra from the specified
mzML file. The number of spectra for each MS level is listed below. Note that we
subset the experiment to the first data file (using `[1]`) and then access
directly the spectra within this sample with the `spectra()` function (which
returns a `Spectra` object from the `r Biocpkg("Spectra")` package). Note that
we use the pipe operator `|>` for better readability.
```{r dda-table-mslevel}
dda_data[1] |>
spectra() |>
msLevel() |>
table()
```
For the MS2 spectra we can get the m/z of the precursor ion with the
`precursorMz()` function. Below we first subset the data set again to a single
sample and filter to spectra from MS level 2 extracting then their precursor m/z
values.
```{r precursor}
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorMz() |>
head()
```
With the `precursorIntensity()` function it is also possible to extract the
intensity of the precursor ion.
```{r precursor-intensity}
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
```
Some manufacturers (like Sciex) don't define/export the precursor intensity and
thus either `NA` or `0` is reported. We can however use the
`estimatePrecursorIntensity()` function from the `r Biocpkg("Spectra")` package
to determine the precursor intensity for a MS 2 spectrum based on the intensity
of the respective ion in the previous MS1 scan (note that with `method =
"interpolation"` the precursor intensity would be defined based on interpolation
between the intensity in the previous and subsequent MS1 scan). Below we
estimate the precursor intensities, on the full data (for MS1 spectra a `NA`
value is reported).
```{r estimate-precursor}
prec_int <- estimatePrecursorIntensity(spectra(dda_data))
```
We next set the precursor intensity in the spectrum metadata of `dda_data`. So
that it can be extracted later with the `precursorIntensity()` function.
```{r set-precursor-intensity}
spectra(dda_data)$precursorIntensity <- prec_int
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
```
Next we perform the chromatographic peak detection on the MS level 1 data with
the `findChromPeaks()` method. Below we define the settings for a
*centWave*-based peak detection and perform the analysis.
```{r dda-find-chrom-peaks-ms1, message = FALSE}
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp, msLevel = 1L)
```
In total `r nrow(chromPeaks(dda_data))` peaks were identified in the present
data set.
The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the
corresponding MS2 spectra measured. Thus, for some of the ions (identified as
MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the
annotation of the respective MS1 chromatographic peaks (or MS1 features after a
correspondence analysis). Spectra for identified chromatographic peaks can be
extracted with the `chromPeakSpectra()` method. MS2 spectra with their precursor
m/z and retention time within the rt and m/z range of the chromatographic peak
are returned.
```{r dda-spectra, message = FALSE}
library(Spectra)
dda_spectra <- chromPeakSpectra(dda_data, msLevel = 2L)
dda_spectra
```
By default `chromPeakSpectra()` returns all spectra associated with a MS1
chromatographic peak, but parameter `method` allows to choose and return only
one spectrum per peak (have a look at the `?chromPeakSpectra` help page for more
details). Also, it would be possible to extract MS1 spectra for each peak by
specifying `msLevel = 1L` in the call above (e.g. to evaluate the full MS1
signal at the peak's apex position).
The returned `Spectra` contains also the reference to the respective
chromatographic peak as additional *spectra variable* `"peak_id"` that contains
the identifier for the chromatographic peak (i.e. its row name in the
`chromPeaks` matrix).
```{r peak_id}
dda_spectra$peak_id
```
Note also that with `return.type = "List"` a list parallel to the `chromPeaks`
matrix would be returned, i.e. each element in that list would contain the
spectra for the chromatographic peak with the same index. Such data
representation might eventually simplify further processing.
We next use the MS2 information to aid in the annotation of a chromatographic
peak. As an example we use a chromatographic peak of an ion with an m/z of
304.1131 which we extract in the code block below.
```{r dda-ms2-example, message = FALSE}
ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
```
A search of potential ions with a similar m/z in a reference database
(e.g. [Metlin](https://metlin.scripps.edu)) returned a large list of potential
hits, most with a very small ppm. For two of the hits,
[Flumazenil](https://en.wikipedia.org/wiki/Flumazenil) (Metlin ID 2724) and
[Fenamiphos](https://en.wikipedia.org/wiki/Fenamiphos) (Metlin ID 72445)
experimental MS2 spectra are available. Thus, we could match the MS2 spectrum
for the identified chromatographic peak against these to annotate our ion. Below
we extract all MS2 spectra that were associated with the candidate
chromatographic peak using the ID of the peak in the present data set.
```{r dda-ms2-get-ms2, message = FALSE}
ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra
```
There are 5 MS2 spectra representing fragmentation of the ion(s) measured
in our candidate chromatographic peak. We next reduce this to a single MS2
spectrum using the `combineSpectra()` method employing the `combinePeaks()`
function to determine which peaks to keep in the resulting spectrum (have a look
at the `?combinePeaks` help page for details). Parameter `f` allows to specify
which spectra in the input object should be combined into one. Note that this
combination of multiple fragment spectra into a single spectrum might not be
generally the best approach or suggested for all types of data.
```{r dda-ms2-consensus, message = FALSE}
ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8,
intensityFun = median, mzFun = median,
f = ex_spectra$peak_id)
ex_spectrum
```
Mass peaks from all input spectra with a difference in m/z smaller 20 ppm
(parameter `ppm`) were combined into one peak and the median m/z and intensity
is reported for these. Due to parameter `minProp = 0.8`, the resulting MS2
spectrum contains only peaks that were present in 80% of the input spectra.
A plot of this *consensus* spectrum is shown below.
```{r dda-ms2-consensus-plot, message = FALSE, fig.cap = "Consensus MS2 spectrum created from all measured MS2 spectra for ions of chromatographic peak CP53.", fig.width = 8, fig.height = 8}
plotSpectra(ex_spectrum)
```
We could now match the consensus spectrum against a database of MS2 spectra. In
our example we simply load MS2 spectra for the two compounds with matching m/z
exported from Metlin. For each of the compounds MS2 spectra created with
collision energies of 0V, 10V, 20V and 40V are available. Below we import the
respective data and plot our candidate spectrum against the MS2 spectra of
Flumanezil and Fenamiphos (from a collision energy of 20V). To import files in
MGF format we have to load the `r Biocpkg("MsBackendMgf")` Bioconductor package
which adds MGF file support to the *Spectra* package.
Prior plotting we *scale* our experimental spectra to replace all peak
intensities with values relative to the maximum peak intensity (which is set to
a value of 100).
```{r normalize}
scale_fun <- function(z, ...) {
z[, "intensity"] <- z[, "intensity"] /
max(z[, "intensity"], na.rm = TRUE) * 100
z
}
ex_spectrum <- addProcessing(ex_spectrum, FUN = scale_fun)
```
```{r dda-ms2-metlin-match, fig.cap = "Mirror plots for the candidate MS2 spectrum against Flumanezil (left) and Fenamiphos (right). The upper panel represents the candidate MS2 spectrum, the lower the target MS2 spectrum. Matching peaks are indicated with a dot.", fig.width = 12, fig.height = 6}
library(MsBackendMgf)
flumanezil <- Spectra(
system.file("mgf", "metlin-2724.mgf", package = "xcms"),
source = MsBackendMgf())
fenamiphos <- Spectra(
system.file("mgf", "metlin-72445.mgf", package = "xcms"),
source = MsBackendMgf())
par(mfrow = c(1, 2))
plotSpectraMirror(ex_spectrum, flumanezil[3], main = "against Flumanezil",
ppm = 40)
plotSpectraMirror(ex_spectrum, fenamiphos[3], main = "against Fenamiphos",
ppm = 40)
```
Our candidate spectrum matches Fenamiphos, thus, our example chromatographic
peak represents signal measured for this compound. In addition to plotting the
spectra, we can also calculate similarities between them with the
`compareSpectra()` method (which uses by default the normalized dot-product to
calculate the similarity).
```{r dda-ms2-dotproduct}
compareSpectra(ex_spectrum, flumanezil, ppm = 40)
compareSpectra(ex_spectrum, fenamiphos, ppm = 40)
```
Clearly, the candidate spectrum does not match Flumanezil, while it has a high
similarity to Fenamiphos. While we performed here the MS2-based annotation on a
single chromatographic peak, this could be easily extended to the full list of
MS2 spectra (returned by `chromPeakSpectra()`) for all chromatographic peaks in
an experiment. See also [here](https://jorainer.github.io/SpectraTutorials/) or
[here](https://jorainer.github.io/MetaboAnnotationTutorials) for alternative
tutorials on matching experimental fragment spectra against a reference.
In the present example we used only a single data file and we did thus not need
to perform a sample alignment and correspondence analysis. These tasks could
however be performed similarly to *plain* LC-MS data, retention times of
recorded MS2 spectra would however also be adjusted during alignment based on
the MS1 data. After correspondence analysis (peak grouping) MS2 spectra for
*features* can be extracted with the `featureSpectra()` function which returns
all MS2 spectra associated with any chromatographic peak of a feature.
Note also that this workflow can be included into the *Feature-Based
Molecular Networking*
[FBMN](https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking/)
to match MS2 spectra against [GNPS](https://gnps.ucsd.edu/). See
[here](https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking-with-xcms3/)
for more details and examples.
# DIA (SWATH) data analysis
In this section we analyze a small SWATH data set consisting of a single mzML
file with data from the same sample analyzed in the previous section but
recorded in SWATH mode. We again read the data with the `readMsExperiment()`
function. The resulting object will contain all recorded MS1 and MS2
spectra in the specified file. Similar to the previous data file, we filter the
file to signal between 230 and 610 seconds.
```{r load-swath-data, message = FALSE}
swath_file <- system.file("TripleTOF-SWATH",
"PestMix1_SWATH.mzML",
package = "msdata")
swath_data <- readMsExperiment(swath_file)
swath_data <- filterRt(swath_data, rt = c(230, 610))
```
Below we determine the number of MS level 1 and 2 spectra in the present data
set.
```{r swath-table-mslevel}
spectra(swath_data) |>
msLevel() |>
table()
```
As described in the introduction, in SWATH mode all ions within pre-defined
isolation windows are fragmented and MS2 spectra measured. The definition of
these isolation windows (SWATH pockets) is imported from the mzML files and
available as additional *spectra variables*. Below we inspect the respective
information for the first few spectra. The upper and lower isolation window m/z
is available with spectra variables `"isolationWindowLowerMz"` and
`"isolationWindowUpperMz"` respectively and the *target* m/z of the isolation
window with `"isolationWindowTargetMz"`. We can use the `spectraData()` function
to extract this information from the spectra within our `swath_data` object.
```{r fdata-isolationwindow}
spectra(swath_data) |>
spectraData(c("isolationWindowTargetMz", "isolationWindowLowerMz",
"isolationWindowUpperMz", "msLevel", "rtime")) |>
head()
```
We could also access these variables directly with the dedicated
`isolationWindowLowerMz()` and `isolationWindowUpperMz()` functions.
```{r}
head(isolationWindowLowerMz(spectra(swath_data)))
head(isolationWindowUpperMz(spectra(swath_data)))
```
In the present data set we use the value of the *isolation window target m/z* to
define the individual SWATH pockets. Below we list the number of spectra that
are recorded in each pocket/isolation window.
```{r}
table(isolationWindowTargetMz(spectra(swath_data)))
```
We have thus between 422 and 423 MS2 spectra measured in each isolation window.
To inspect the data we can also extract chromatograms from both the measured MS1
as well as MS2 data. For MS2 data we have to set parameter `msLevel = 2L` and,
for SWATH data, in addition also specify the isolation window from which we want
to extract the data. Below we extract the TIC of the MS1 data and of one of the
isolation windows (isolation window target m/z of 270.85) and plot these.
```{r, fig.cap = "TIC for MS1 (upper panel) and MS2 data from the isolation window with target m/z 270.85 (lower panel)."}
tic_ms1 <- chromatogram(swath_data, msLevel = 1L, aggregationFun = "sum")
tic_ms2 <- chromatogram(swath_data, msLevel = 2L, aggregationFun = "sum",
isolationWindowTargetMz = 270.85)
par(mfrow = c(2, 1))
plot(tic_ms1, main = "MS1")
plot(tic_ms2, main = "MS2, isolation window m/z 270.85")
```
Without specifying the `isolationWindowTargetMz` parameter, all MS2 spectra
would be considered in the chromatogram extraction which would result in a
*chimeric* chromatogram such as the one shown below:
```{r, fig.cap = "TIC considering **all** MS2 spectra (from all isolation windows)."}
tic_all_ms2 <- chromatogram(swath_data, msLevel = 2L, aggregationFun = "sum")
plot(tic_all_ms2, main = "MS2, all isolation windows")
```
For MS2 data without specific, **different**, m/z isolation windows (such as
e.g. Waters MSe data) parameter `isolationWindowTargetMz` can be omitted in the
`chromatograms()` call in which case, as already stated above, all MS2 spectra
are considered in the chromatogram calculation. Alternatively, if the isolation
window is not provided or specified in the original data files, it would be
possible to manually define a value for this spectra variable, such as in the
example below (from which the code is however not evaluated) were we assign the
value of the precursor m/z to the spectra's isolation window target m/z.
```{r, eval = FALSE}
spectra(swath_data)$isolationWindowTargetMz <- precursorMz(spectra(swath_data))
```
## Chromatographic peak detection in MS1 and MS2 data
Similar to a *conventional* LC-MS analysis, we perform first a chromatographic
peak detection (on the MS level 1 data) with the `findChromPeaks()`
method. Below we define the settings for a *centWave*-based peak detection and
perform the analysis.
```{r find-chrom-peaks-ms1, message = FALSE}
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaks(swath_data, param = cwp)
swath_data
```
Next we perform a chromatographic peak detection in MS level 2 data separately
for each individual isolation window. We use the
`findChromPeaksIsolationWindow()` function employing the same peak detection
algorithm reducing however the required signal-to-noise ratio. The
`isolationWindow` parameter allows to specify which MS2 spectra belong to which
isolation window and hence defines in which set of MS2 spectra chromatographic
peak detection should be performed. As a default the `"isolationWindowTargetMz"`
variable of the object's spectra is used.
```{r find-chrom-peaks-ms2, message = FALSE}
cwp <- CentWaveParam(snthresh = 3, noise = 10, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaksIsolationWindow(swath_data, param = cwp)
swath_data
```
The `findChromPeaksIsolationWindow()` function added all peaks identified in the
individual isolation windows to the `chromPeaks` matrix containing already the
MS1 chromatographic peaks. These newly added peaks can be identified through the
`"isolationWindow"` column in the object's `chromPeakData`.
```{r}
chromPeakData(swath_data)
```
Below we count the number of chromatographic peaks identified within each
isolation window (the number of chromatographic peaks identified in MS1 is
`r sum(chromPeakData(swath_data)$ms_level == 1)`).
```{r}
table(chromPeakData(swath_data)$isolationWindow)
```
We thus successfully identified chromatographic peaks in the different MS levels
and isolation windows. As a next step we have to identify which of the measured
signals represents data from the same original compound to *reconstruct*
fragment spectra for each MS1 signal (chromatographic peak).
## Reconstruction of MS2 spectra
Identifying the signal of the fragment ions for the precursor measured by each
MS1 chromatographic peak is a non-trivial task. The MS2 spectrum of the fragment
ion for each MS1 chromatographic peak has to be reconstructed from the available
MS2 signal (i.e. the chromatographic peaks identified in MS level 2). For SWATH
data, fragment ion signal should be present in the same isolation window that
contains the m/z of the precursor ion and the chromatographic peak shape of the
MS2 chromatographic peaks of fragment ions of a specific precursor should have a
similar retention time and peak shape than the precursor's MS1 chromatographic
peak.
After detection of MS1 and MS2 chromatographic peaks has been performed, we can
reconstruct the MS2 spectra using the `reconstructChromPeakSpectra()`
function. This function defines an MS2 spectrum for each MS1 chromatographic
peak based on the following approach:
- Identify MS2 chromatographic peaks in the isolation window containing the m/z
of the ion (the MS1 chromatographic peak) that have approximately the same
retention time than the MS1 chromatographic peak (the accepted difference in
retention time can be defined with the `diffRt` parameter).
- Extract the MS1 chromatographic peak and all MS2 chromatographic peaks
identified by the previous step and correlate the peak shapes of the candidate
MS2 chromatographic peaks with the shape of the MS1 peak. MS2 chromatographic
peaks with a correlation coefficient larger than `minCor` are retained.
- Reconstruct the MS2 spectrum using the m/z of all above selected MS2
chromatographic peaks and their intensity; each MS2 chromatographic peak
selected for an MS1 peak will thus represent one **mass peak** in the
reconstructed spectrum.
To illustrate this process we perform the individual steps on the example of
fenamiphos (exact mass 303.105800777 and m/z of [M+H]+ adduct 304.113077). As
a first step we extract the chromatographic peak for this ion.
```{r fena-extract-peak}
fenamiphos_mz <- 304.113077
fenamiphos_ms1_peak <- chromPeaks(swath_data, mz = fenamiphos_mz, ppm = 2)
fenamiphos_ms1_peak
```
Next we identify all MS2 chromatographic peaks that were identified in the
isolation window containing the m/z of fenamiphos. The information on the
isolation window in which a chromatographic peak was identified is available in
the `chromPeakData`.
```{r fena-identify-ms2}
keep <- chromPeakData(swath_data)$isolationWindowLowerMz < fenamiphos_mz &
chromPeakData(swath_data)$isolationWindowUpperMz > fenamiphos_mz
```
We also require the retention time of the MS2 chromatographic peaks to be
similar to the retention time of the MS1 peak and extract the corresponding peak
information. We thus below select all chromatographic peaks for which the
retention time range contains the retention time of the apex position of the MS1
chromatographic peak.
```{r fena-check-rt}
keep <- keep &
chromPeaks(swath_data)[, "rtmin"] < fenamiphos_ms1_peak[, "rt"] &
chromPeaks(swath_data)[, "rtmax"] > fenamiphos_ms1_peak[, "rt"]
fenamiphos_ms2_peak <- chromPeaks(swath_data)[which(keep), ]
```
In total `r sum(keep, na.rm = TRUE)` MS2 chromatographic peaks match all the
above conditions. Next we extract the ion chromatogram of the MS1 peak and of
all selected candidate MS2 signals. To ensure chromatograms are extracted from
spectra in the correct isolation window we need to specify the respective
isolation window by passing its isolation window target m/z to the
`chromatogram()` function (in addition to setting `msLevel = 2`). This can be
done by either getting the `isolationWindowTargetMz` of the spectra after the
data was subset using `filterIsolationWindow()` (as done below) or by selecting
the `isolationWindowTargetMz` closest to the m/z of the compound of interest.
```{r fena-eic-extract, warning = FALSE}
rtr <- fenamiphos_ms1_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms1_peak[, c("mzmin", "mzmax")]
fenamiphos_ms1_chr <- chromatogram(swath_data, rt = rtr, mz = mzr)
rtr <- fenamiphos_ms2_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms2_peak[, c("mzmin", "mzmax")]
## Get the isolationWindowTargetMz for spectra containing the m/z of the
## compound of interest
swath_data |>
filterIsolationWindow(mz = fenamiphos_mz) |>
spectra() |>
isolationWindowTargetMz() |>
table()
```
The target m/z of the isolation window containing the m/z of interest is thus
299.1 and we can use this in the `chromatogram()` call below to extract the data
from the correct (MS2) spectra.
```{r}
fenamiphos_ms2_chr <- chromatogram(
swath_data, rt = rtr, mz = mzr, msLevel = 2L,
isolationWindowTargetMz = rep(299.1, nrow(rtr)))
```
We can now plot the extracted ion chromatogram of the MS1 and the extracted MS2
data.
```{r fena-eic-plot, fig.width = 10, fig.height = 5, fig.cap = "Extracted ion chromatograms for Fenamiphos from MS1 (blue) and potentially related signal in MS2 (grey)."}
plot(rtime(fenamiphos_ms1_chr[1, 1]),
intensity(fenamiphos_ms1_chr[1, 1]),
xlab = "retention time [s]", ylab = "intensity", pch = 16,
ylim = c(0, 5000), col = "blue", type = "b", lwd = 2)
#' Add data from all MS2 peaks
tmp <- lapply(fenamiphos_ms2_chr@.Data,
function(z) points(rtime(z), intensity(z),
col = "#00000080",
type = "b", pch = 16))
```
Next we can calculate correlations between the peak shapes of each MS2
chromatogram with the MS1 peak. We illustrate this process on the example of one
MS2 chromatographic peaks. Note that, because MS1 and MS2 spectra are recorded
consecutively, the retention times of the individual data points will differ
between the MS2 and MS1 chromatographic data and data points have thus to be
matched (aligned) before performing the correlation analysis. This is done
automatically by the `correlate()` function. See the help for the `align` method
for more information on alignment options.
```{r fena-cor}
compareChromatograms(fenamiphos_ms2_chr[1, 1],
fenamiphos_ms1_chr[1, 1],
ALIGNFUNARGS = list(method = "approx"))
```
After identifying the MS2 chromatographic peaks with shapes of enough high
similarity to the MS1 chromatographic peaks, an MS2 spectrum could be
*reconstructed* based on the m/z and intensities of the MS2 chromatographic
peaks (i.e., using their `"mz"` and `"maxo"` or `"into"` values).
Instead of performing this assignment of MS2 signal to MS1 chromatographic peaks
manually as above, we can use the `reconstructChromPeakSpectra()` function that
performs the exact same steps for all MS1 chromatographic peaks in a DIA data
set. Below we use this function to reconstruct MS2 spectra for our example data
requiring a peak shape correlation higher than `0.9` between the candidate MS2
chromatographic peak and the target MS1 chromatographic peak.
```{r reconstruct-ms2, message = FALSE}
swath_spectra <- reconstructChromPeakSpectra(swath_data, minCor = 0.9)
swath_spectra
```
As a result we got a `Spectra` object of length equal to the number of MS1 peaks
in our data. The length of a spectrum represents the number of peaks it
contains. Thus, a length of 0 indicates that no matching peak (MS2 signal) could
be found for the respective MS1 chromatographic peak.
```{r}
lengths(swath_spectra)
```
For reconstructed spectra additional annotations are available such as the IDs
of the MS2 chromatographic peaks from which the spectrum was reconstructed
(`"ms2_peak_id"`) as well as the correlation coefficient of their
chromatographic peak shape with the precursor's shape
(`"ms2_peak_cor"`). Metadata column `"peak_id"` contains the ID of the MS1
chromatographic peak:
```{r}
spectraData(swath_spectra, c("peak_id", "ms2_peak_id", "ms2_peak_cor"))
```
We next extract the MS2 spectrum for our example peak most likely representing
[M+H]+ ions of Fenamiphos using its chromatographic peak ID:
```{r fena-swath-peak}
fenamiphos_swath_spectrum <- swath_spectra[
swath_spectra$peak_id == rownames(fenamiphos_ms1_peak)]
```
We can now compare the reconstructed spectrum to the example consensus spectrum
from the DDA experiment in the previous section (variable `ex_spectrum`) as well
as to the MS2 spectrum for Fenamiphos from Metlin (with a collision energy of
10V). For better visualization we *normalize* also the peak intensities of the
reconstructed SWATH spectrum with the same function we used for the experimental
DDA spectrum.
```{r}
fenamiphos_swath_spectrum <- addProcessing(fenamiphos_swath_spectrum,
scale_fun)
```
```{r fena-swath-plot, fig.cap = "Mirror plot comparing the reconstructed MS2 spectrum for Fenamiphos (upper panel) against the measured spectrum from the DDA data and the Fenamiphhos spectrum from Metlin.", fig.width = 12, fig.height = 6}
par(mfrow = c(1, 2))
plotSpectraMirror(fenamiphos_swath_spectrum, ex_spectrum,
ppm = 50, main = "against DDA")
plotSpectraMirror(fenamiphos_swath_spectrum, fenamiphos[2],
ppm = 50, main = "against Metlin")
```
If we wanted to get the EICs for the MS2 chromatographic peaks used to generate
this MS2 spectrum we can use the IDs of these peaks which are provided with
`$ms2_peak_id` of the result spectrum.
```{r}
pk_ids <- fenamiphos_swath_spectrum$ms2_peak_id[[1]]
pk_ids
```
With these peak IDs available we can extract their retention time window and m/z
ranges from the `chromPeaks` matrix and use the `chromatogram()` function to
extract their EIC. Note however that for SWATH data we have MS2 signal from
different isolation windows. Thus we have to first filter the `swath_data`
object by the isolation window containing the precursor m/z with the
`filterIsolationWindow()` to subset the data to MS2 spectra related to the ion
of interest. In addition, we have to use `msLevel = 2L` in the `chromatogram()`
call because `chromatogram()` extracts by default only data from MS1 spectra and
we need to specify the target m/z of the isolation window containing the
fragment data from the compound of interest.
```{r}
rt_range <- chromPeaks(swath_data)[pk_ids, c("rtmin", "rtmax")]
mz_range <- chromPeaks(swath_data)[pk_ids, c("mzmin", "mzmax")]
pmz <- precursorMz(fenamiphos_swath_spectrum)[1]
## Determine the isolation window target m/z
tmz <- swath_data |>
filterIsolationWindow(mz = pmz) |>
spectra() |>
isolationWindowTargetMz() |>
unique()
ms2_eics <- chromatogram(
swath_data, rt = rt_range, mz = mz_range, msLevel = 2L,
isolationWindowTargetMz = rep(tmz, nrow(rt_range)))
```
Each row of this `ms2_eics` contains now the EIC of one of the MS2
chromatographic peaks. We can also plot these in an *overlay plot*.
```{r, fig.cap = "Overlay of EICs of chromatographic peaks used to reconstruct the MS2 spectrum for fenamiphos."}
plotChromatogramsOverlay(ms2_eics)
```
As a second example we analyze the signal from an [M+H]+ ion with an m/z of
376.0381 (which would match
[Prochloraz](https://en.wikipedia.org/wiki/Prochloraz)). We first identify the
MS1 chromatographic peak for that m/z and retrieve the reconstructed MS2
spectrum for that peak.
```{r pro-swath}
prochloraz_mz <- 376.0381
prochloraz_ms1_peak <- chromPeaks(swath_data, msLevel = 1L,
mz = prochloraz_mz, ppm = 5)
prochloraz_ms1_peak
prochloraz_swath_spectrum <- swath_spectra[
swath_spectra$peak_id == rownames(prochloraz_ms1_peak)]
lengths(prochloraz_swath_spectrum)
```
The MS2 spectrum for the (tentative) MS1 signal for prochloraz reconstructed
from the SWATH MS2 data has thus 9 peaks.
In addition we identify the corresponding MS1 peak in the DDA data set, extract
all measured MS2 chromatographic peaks and build the consensus spectrum from
these.
```{r pro-dda}
prochloraz_dda_peak <- chromPeaks(dda_data, msLevel = 1L,
mz = prochloraz_mz, ppm = 5)
prochloraz_dda_peak
```
The retention times for the chromatographic peaks from the DDA and SWATH data
match almost perfectly. Next we get the MS2 spectra for this peak.
```{r pro-dda-ms2}
prochloraz_dda_spectra <- dda_spectra[
dda_spectra$peak_id == rownames(prochloraz_dda_peak)]
prochloraz_dda_spectra
```
In total 5 spectra were measured, some with a relatively high number of
peaks. Next we combine them into a consensus spectrum.
```{r pro-dda-consensus}
prochloraz_dda_spectrum <- combineSpectra(
prochloraz_dda_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8, intensityFun = median, mzFun = median,
f = prochloraz_dda_spectra$peak_id)
```
At last we load also the Prochloraz MS2 spectra (for different collision
energies) from Metlin.
```{r prochloraz-metlin}
prochloraz <- Spectra(
system.file("mgf", "metlin-68898.mgf", package = "xcms"),
source = MsBackendMgf())
```
To validate the reconstructed spectrum we plot it against the corresponding DDA
spectrum and the MS2 spectrum for Prochloraz (for a collision energy of 10V)
from Metlin.
```{r pro-swath-plot, fig.cap = "Mirror plot comparing the reconstructed MS2 spectrum for Prochloraz (upper panel) against the measured spectrum from the DDA data and the Prochloraz spectrum from Metlin.", fig.width = 12, fig.height = 6}
prochloraz_swath_spectrum <- addProcessing(prochloraz_swath_spectrum, scale_fun)
prochloraz_dda_spectrum <- addProcessing(prochloraz_dda_spectrum, scale_fun)
par(mfrow = c(1, 2))
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz_dda_spectrum,
ppm = 40, main = "against DDA")
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz[2],
ppm = 40, main = "against Metlin")
```
The spectra fit relatively well. Interestingly, the peak representing the
precursor (the right-most peak) seems to have a slightly shifted m/z value in
the reconstructed spectrum. Also, by closer inspecting the spectrum two groups
of peaks with small differences in m/z can be observed (see plot below).
```{r, fig.cap = "SWATH-derived MS2 spectrum for prochloraz."}
plotSpectra(prochloraz_swath_spectrum)
```
These could represent fragments from isotopes of the original compound. DIA MS2
data, since all ions at a given retention time are fragmented, can contain
fragments from isotopes. We thus below use the `isotopologues()` function from
the `r Biocpkg("MetaboCoreUtils")` package to check for presence of potential
isotope peaks in the reconstructed MS2 spectrum for prochloraz.
```{r}
library(MetaboCoreUtils)
isotopologues(peaksData(prochloraz_swath_spectrum)[[1]])
```
Indeed, peaks 3, 4 and 5 as well as 6 and 7 have been assigned to a group of
potential isotope peaks. While this is no proof that the peaks are indeed
fragment isotopes of prochloraz it is highly likely (given their difference in
m/z and relative intensity differences). Below we thus define a function that
keeps only the monoisotopic peak for each isotope group in a spectrum.
```{r}
## Function to keep only the first (monoisotopic) peak for potential
## isotopologue peak groups.
rem_iso <- function(x, ...) {
idx <- isotopologues(x)
idx <- unlist(lapply(idx, function(z) z[-1]), use.names = FALSE)
if (length(idx))
x[-idx, , drop = FALSE]
else x
}
prochloraz_swath_spectrum2 <- addProcessing(prochloraz_swath_spectrum,
rem_iso)
```
```{r, fig.cap = "SWATH MS2 spectrum for prochloraz before (left) and after deisotoping (right)."}
par(mfrow = c(1, 2))
plotSpectra(prochloraz_swath_spectrum)
plotSpectra(prochloraz_swath_spectrum2)
```
Removing the isotope peaks from the SWATH MS2 spectrum increases also the
spectra similarity score (since reference spectra generally will contain only
fragments of the ion of interest, but not of any of its isotopes).
```{r}
compareSpectra(prochloraz_swath_spectrum, prochloraz_dda_spectrum)
compareSpectra(prochloraz_swath_spectrum2, prochloraz_dda_spectrum)
```
Similar to the DDA data, the reconstructed MS2 spectra from SWATH data could be
used in the annotation of the MS1 chromatographic peaks.
# Outlook
Currently, spectra data representation, handling and processing is being
re-implemented as part of the
[RforMassSpectrometry](https://rformassspectrometry.org) initiative aiming at
increasing the performance of methods and simplifying their use. Thus, parts of
the workflow described here will be changed (improved) in future.
Along with these developments, improved matching strategies for larger data sets
will be implemented as well as functionality to compare `Spectra` directly to
reference MS2 spectra from public annotation resources (e.g. Massbank or
HMDB). See for example [here](https://jorainer.github.io/SpectraTutorials) for
more information.
Regarding SWATH data analysis, future development will involve improved
selection of the correct MS2 chromatographic peaks considering also correlation
with intensity values across several samples.
# Session information
```{r sessionInfo}
sessionInfo()
```
# References