xcms 3.22.0
Package: xcms
Authors: Johannes Rainer
Modified: 2023-04-25 14:01:32.005268
Compiled: Tue Apr 25 19:14:17 2023
In a typical LC-MS-based metabolomics experiment compounds eluting from the
chromatography are first ionized before being measured by mass spectrometry
(MS). During the ionization different (multiple) ions can be generated from the
same compound which all will be measured by MS. In general, the resulting data
is then pre-processed to identify chromatographic peaks in the data and to group
these across samples in the correspondence analysis. The result are distinct
LC-MS features, characterized by their specific m/z and retention time
range. Different ions generated during ionization will be detected as different
features. Compounding aims now at grouping such features presumably
representing signal from the same originating compound to reduce data set
complexity (and to aid in subsequent annotation steps). General MS feature
grouping functionality if defined by the MsFeatures package with
additional functionality being implemented in the xcms
package to enable the
compounding of LC-MS data.
This document provides a simple compounding workflow using xcms
. Note that the
present functionality does not (yet) aggregate or combine the actual features
per values, but does only define the feature groups (one per compound).
We demonstrate the compounding (feature grouping) functionality on the simple
toy data set used also in the xcms package and provided through
the faahKO
package. This data set consists of samples from 4 mice with
knock-out of the fatty acid amide hydrolase (FAAH) and 4 wild type
mice. Pre-processing of this data set is described in detail in the xcms
vignette of the xcms
package. Below we load all required packages and the
result from this pre-processing updating also the location of the respective raw
data files on the current machine.
library(xcms)
library(faahKO)
library(MsFeatures)
data("xdata")
## Update the path to the files for the local system
dirname(xdata) <- c(rep(system.file("cdf", "KO", package = "faahKO"), 4),
rep(system.file("cdf", "WT", package = "faahKO"), 4))
Before performing the feature grouping we inspect the result object. With
featureDefinitions
we can extract the results from the correspondence
analysis.
featureDefinitions(xdata)
## DataFrame with 225 rows and 11 columns
## mzmed mzmin mzmax rtmed rtmin rtmax npeaks
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001 200.1 200.1 200.1 2901.63 2880.73 2922.53 2
## FT002 205.0 205.0 205.0 2789.39 2782.30 2795.36 8
## FT003 206.0 206.0 206.0 2788.73 2780.73 2792.86 7
## FT004 207.1 207.1 207.1 2718.12 2713.21 2726.70 7
## FT005 219.1 219.1 219.1 2518.82 2517.40 2520.81 3
## ... ... ... ... ... ... ... ...
## FT221 591.30 591.3 591.3 3005.03 2992.87 3006.05 5
## FT222 592.15 592.1 592.3 3022.11 2981.91 3107.59 6
## FT223 594.20 594.2 594.2 3418.16 3359.10 3427.90 3
## FT224 595.25 595.2 595.3 3010.15 2992.87 3013.77 6
## FT225 596.20 596.2 596.2 2997.91 2992.87 3002.95 2
## KO WT peakidx ms_level
## <numeric> <numeric> <list> <integer>
## FT001 2 0 287, 679,1577,... 1
## FT002 4 4 47,272,542,... 1
## FT003 3 4 32,259,663,... 1
## FT004 4 3 19,249,525,... 1
## FT005 1 2 639, 788,1376,... 1
## ... ... ... ... ...
## FT221 2 3 349,684,880,... 1
## FT222 1 3 86,861,862,... 1
## FT223 1 2 604, 985,1543,... 1
## FT224 2 3 67,353,876,... 1
## FT225 0 2 866,1447,1643,... 1
Each row in this data frame represents the definition of one feature, with its
average and range of m/z and retention time. Column "peakidx"
provides the
index of each chromatographic peak which is assigned to the feature in the
chromPeaks
matrix of the result object. The featureValues
function allows to
extract feature values, i.e. a matrix with feature abundances, one row per
feature and columns representing the samples of the present data set.
Below we extract the feature values with and without filled-in peak data. Without the gap-filled data only abundances from detected chromatographic peaks are reported. In the gap-filled data, for samples in which no chromatographic peak for a feature was detected, all signal from the m/z - retention time range defined based on the detected chromatographic peaks was integrated.
head(featureValues(xdata, filled = FALSE))
## ko15.CDF ko16.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF wt21.CDF
## FT001 NA 506848.9 NA 169955.6 NA NA NA
## FT002 1924712.0 1757151.0 1383416.7 1180288.2 2129885.1 1634342.0 1623589.2
## FT003 213659.3 289500.7 NA 178285.7 253825.6 241844.4 240606.0
## FT004 349011.5 451863.7 343897.8 208002.8 364609.8 360908.9 NA
## FT005 NA NA NA 107348.5 223951.8 NA NA
## FT006 286221.4 NA 164009.0 149097.6 255697.7 311296.8 366441.5
## wt22.CDF
## FT001 NA
## FT002 1354004.93
## FT003 185399.47
## FT004 221937.53
## FT005 84772.92
## FT006 271128.02
head(featureValues(xdata, filled = TRUE))
## ko15.CDF ko16.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF wt21.CDF
## FT001 159738.1 506848.88 113441.08 169955.6 216096.6 145509.7 230477.9
## FT002 1924712.0 1757150.96 1383416.72 1180288.2 2129885.1 1634342.0 1623589.2
## FT003 213659.3 289500.67 162897.19 178285.7 253825.6 241844.4 240606.0
## FT004 349011.5 451863.66 343897.76 208002.8 364609.8 360908.9 223322.5
## FT005 135978.5 25524.79 71530.84 107348.5 223951.8 134398.9 190203.8
## FT006 286221.4 289908.23 164008.97 149097.6 255697.7 311296.8 366441.5
## wt22.CDF
## FT001 140551.30
## FT002 1354004.93
## FT003 185399.47
## FT004 221937.53
## FT005 84772.92
## FT006 271128.02
In total 225 features have been defined in the present data set, many of which most likely represent signal from different ions (adducts or isotopes) of the same compound. The aim of the grouping functions of are now to define which features most likely come from the same original compound. The feature grouping functions base on the following assumptions/properties of LC-MS data:
The main method to perform the feature grouping is called groupFeatures
which
takes an XCMSnExp
object (result object from the xcms
pre-processing) as
input as well as a parameter object to chose the grouping algorithm and specify
its settings. xcms
provides and supports the following grouping approaches:
SimilarRtimeParam
: perform an initial grouping based on similar retention
time.AbundanceSimilarityParam
: perform a feature grouping based on correlation
of feature abundances (values) across samples.EicSimilarityParam
: perform a feature grouping based on correlation of
EICs.Calling groupFeatures
on an xcms
result object will perform a feature
grouping assigning each feature in the data set to a feature group. These
feature groups are stored as an additional column called "feature_group"
in
the featureDefinition
data frame of the result object and can be accessed with
the featureGroups
function. Any subsequent groupFeature
call will
sub-group (refine) the identified feature groups further. It is thus possible
to use a single grouping approach, or to combine multiple of them to generate
the desired feature grouping. While the individual feature grouping algorithms
can be called in any order, it is advisable to use the EicSimilarityParam
as
last refinement step, because it is the computationally most expensive one,
especially if applied to a result object without any pre-defined feature groups
or if the feature groups are very large. In the subsequent sections we will
apply the various feature grouping approaches subsequently.
Note also that we perform here a grouping of all defined features, but it would also be possible to just group a subset of interesting features (e.g. features found significant by a statistical analysis of the data set). This is described in the last section of this vignette.
The most intuitive and simple way to group features is based on their retention time. Before we perform this initial grouping we evaluate retention times and m/z of all features in the present data set.
plot(featureDefinitions(xdata)$rtmed, featureDefinitions(xdata)$mzmed,
xlab = "retention time", ylab = "m/z", main = "features",
col = "#00000080")
grid()
Several features with about the same retention time (but different m/z) can be seen, especially at the beginning of the LC. We thus below group features within a retention time window of 10 seconds into feature groups.
xdata <- groupFeatures(xdata, param = SimilarRtimeParam(10))
The results from the feature grouping can be accessed with the featureGroups
function. Below we determine the size of each of these feature groups (i.e. how
many features are grouped together).
table(featureGroups(xdata))
##
## FG.001 FG.002 FG.003 FG.004 FG.005 FG.006 FG.007 FG.008 FG.009 FG.010 FG.011
## 3 3 3 3 2 4 5 6 4 2 5
## FG.012 FG.013 FG.014 FG.015 FG.016 FG.017 FG.018 FG.019 FG.020 FG.021 FG.022
## 3 4 3 5 3 3 5 3 3 3 3
## FG.023 FG.024 FG.025 FG.026 FG.027 FG.028 FG.029 FG.030 FG.031 FG.032 FG.033
## 3 3 6 3 3 3 3 2 3 3 4
## FG.034 FG.035 FG.036 FG.037 FG.038 FG.039 FG.040 FG.041 FG.042 FG.043 FG.044
## 3 2 2 3 2 2 4 2 2 2 3
## FG.045 FG.046 FG.047 FG.048 FG.049 FG.050 FG.051 FG.052 FG.053 FG.054 FG.055
## 4 2 3 3 3 2 2 3 4 2 3
## FG.056 FG.057 FG.058 FG.059 FG.060 FG.061 FG.062 FG.063 FG.064 FG.065 FG.066
## 2 2 2 3 2 3 2 2 2 3 2
## FG.067 FG.068 FG.069 FG.070 FG.071 FG.072 FG.073 FG.074 FG.075 FG.076 FG.077
## 2 3 2 2 2 3 2 2 1 1 1
## FG.078 FG.079 FG.080 FG.081 FG.082 FG.083 FG.084
## 1 1 1 1 1 1 1
In addition we visualize these feature groups with the plotFeatureGroups
function which shows all features in the m/z - retention time space with grouped
features being connected with a line.
plotFeatureGroups(xdata)
grid()