MetaboDynamics 0.99.22
This package was developed to facilitate the analysis of longitudinal metabolomics data. Most tools only allow the comparison between two time points or experimental conditions and are using frequentist statistical methods.
As metabolomics data is often noisy and we generally have few replicates due to high costs, a robust method is needed for the estimation of mean concentrations at every time point. For this we employ a Bayesian hierarchical model that assumes normal distributions of log-transformed metabolite concentrations.
Here we want to show a complete workflow to analyze concentration tables with probabilistic models.
MetaboDynamics can be installed from the devel branch of Bioconductor. Open R 4.4 and execute the following code:
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
# The following initializes usage of Bioc devel
BiocManager::install(version = "devel")
BiocManager::install("MetaboDynamics")
library(MetaboDynamics)
library(SummarizedExperiment)
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, setequal, union
## The following object is masked from 'package:generics':
##
## explain
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:S4Vectors':
##
## expand
We have a simulated data (called “longitudinalMetabolomics”) set of 98 metabolites with three measurement replicates at four time points (1-4) across 3 experimental conditions (A-B). In the first step in this workflow we estimate the dynamics of every single metabolite at every experimental condition (p.e. radiation dose).
The simulated data is represented as SummarizedExperiment object where concentration tables of each experimental condition are stored in assays (raw concentrations in “concentrations”, log-transformed transformations in “log_con” and scaled log-transformed concentrations" in “scaled_log”) and metabolite names, KEGG IDs, experimental conditions and clustering solutions per experimental condition are stored in colData and timepoint specifications in rowData.
The next plot shows the raw data.
data("longitudinalMetabolomics")
# convert to dataframe
concentrations <- as.data.frame(cbind(
as.data.frame(t(assays(longitudinalMetabolomics)$concentrations)),
as.data.frame(colData(longitudinalMetabolomics))
))
concentrations <- concentrations %>% pivot_longer(
cols = seq_len(4), names_to = "time",
values_to = "concentration"
)
ggplot(concentrations, aes(x = concentration)) +
geom_density() +
theme_bw() +
facet_grid(cols = vars(time), rows = vars(condition)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ggtitle("raw data", "raw measurements")
The raw data is not distributed normally. So let’s log-transform the values. In the integrated simulated data set this is already done in the column “log_m”.
ggplot(concentrations, aes(x = log(concentration))) +
geom_density() +
theme_bw() +
facet_grid(cols = vars(time), rows = vars(condition)) +
ggtitle("data", "log-transformed values")
The next plot shows the raw dynamics of single metabolites.
ggplot(concentrations) +
geom_line(aes(
x = time, y = log(concentration), col = metabolite,
group = interaction(metabolite, replicate)
)) +
theme_bw() +
xlab("timepoint") +
theme(legend.position = "none") +
facet_grid(rows = vars(condition)) +
ggtitle("raw metabolite dynamics", "color=metabolite")
We define dynamics as deviations at the observed time points from the metabolite’s mean concentration. As the raw concentrations of metabolites can differ by orders of magnitude from each other, and we want to be able to compare dynamics of metabolites with each other, we standardize each metabolite at each experimental condition to a mean of zero and a standard deviation of one.
In the simulated data set the scaled measurements are in column “m_scaled”.
data("longitudinalMetabolomics")
# convert to dataframe
concentrations <- as.data.frame(cbind(
as.data.frame(t(assays(longitudinalMetabolomics)$scaled_log)),
as.data.frame(colData(longitudinalMetabolomics))
))
concentrations <- concentrations %>% pivot_longer(
cols = seq_len(4), names_to = "time",
values_to = "concentration"
)
ggplot(concentrations) +
geom_line(aes(
x = time,
y = concentration, col = metabolite,
group = interaction(metabolite, replicate)
)) +
theme_bw() +
xlab("timepoint") +
theme(legend.position = "none") +
facet_grid(rows = vars(condition)) +
ggtitle("standardized dynamics", "color=metabolite")
Now we can finally model the dynamics. This might take of the order of 10 minutes per experimental condition for 98 metabolites.
We employ a Bayesian hierarchical model with con = metabolite concentrations, m = metabolite, c = experimental condition and t = time point ID:
\[\begin{align*} \log(con_{m,c,t})&\sim {\sf normal}(\mu_{m,c,t},\sigma_{m,c,t}) \\ \mu_{m,c,t}&\sim {\sf normal}(0,2) \\ \sigma_{m,c,t}&\sim {\sf exponential}(\lambda_{m,c}) \\ \lambda_{m,c}&\sim {\sf exponential}(2) \end{align*}\]
The hierarchical Bayesian model provides a balance between over- and under fitting of the model to the experimental data points.
The code below shows how to fit the model and how to extract the diagnostic criteria from the model fits.
# we can hand a SummarizedExperiment object to the function
data("longitudinalMetabolomics")
# we only use a subsection of the simulated data (1 condition and subsample of
# the whole dataset) for demonstration purposes
samples <- c(
"UMP", "Taurine", "Succinate", "Phosphocreatine", "PEP",
"Malic acid", "L-Cystine", "CMP", "Citramalic acid", "2-Aminomuconate"
)
# only one condition
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
longitudinalMetabolomics$metabolite %in% samples]
# fit model
data <- fit_dynamics_model(
data = data, scaled_measurement = "m_scaled",
assay = "scaled_log", time = "time",
condition = "condition", max_treedepth = 10,
adapt_delta = 0.95, # default 0.95
iter = 5000,
cores = 1,
chains = 2 # only set to 2 for vignette, default = 4
)
## Is your data normalized and standardized?
## We recommend normalization by log-transformation.
## Scaling and centering (mean=0, sd=1) should be metabolite and condition specific.
##
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 5.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.55 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 5000 [ 0%] (Warmup)
## Chain 1: Iteration: 500 / 5000 [ 10%] (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%] (Warmup)
## Chain 1: Iteration: 1251 / 5000 [ 25%] (Sampling)
## Chain 1: Iteration: 1750 / 5000 [ 35%] (Sampling)
## Chain 1: Iteration: 2250 / 5000 [ 45%] (Sampling)
## Chain 1: Iteration: 2750 / 5000 [ 55%] (Sampling)
## Chain 1: Iteration: 3250 / 5000 [ 65%] (Sampling)
## Chain 1: Iteration: 3750 / 5000 [ 75%] (Sampling)
## Chain 1: Iteration: 4250 / 5000 [ 85%] (Sampling)
## Chain 1: Iteration: 4750 / 5000 [ 95%] (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.405 seconds (Warm-up)
## Chain 1: 3.201 seconds (Sampling)
## Chain 1: 4.606 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 3.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 5000 [ 0%] (Warmup)
## Chain 2: Iteration: 500 / 5000 [ 10%] (Warmup)
## Chain 2: Iteration: 1000 / 5000 [ 20%] (Warmup)
## Chain 2: Iteration: 1251 / 5000 [ 25%] (Sampling)
## Chain 2: Iteration: 1750 / 5000 [ 35%] (Sampling)
## Chain 2: Iteration: 2250 / 5000 [ 45%] (Sampling)
## Chain 2: Iteration: 2750 / 5000 [ 55%] (Sampling)
## Chain 2: Iteration: 3250 / 5000 [ 65%] (Sampling)
## Chain 2: Iteration: 3750 / 5000 [ 75%] (Sampling)
## Chain 2: Iteration: 4250 / 5000 [ 85%] (Sampling)
## Chain 2: Iteration: 4750 / 5000 [ 95%] (Sampling)
## Chain 2: Iteration: 5000 / 5000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 1.306 seconds (Warm-up)
## Chain 2: 3.361 seconds (Sampling)
## Chain 2: 4.667 seconds (Total)
## Chain 2:
This returns a list of model fits that are named by the experimental condition (“A”,“B”,“C”). If data is a summarizedExperiment the fits are stored in metadata(data)[[“dynamic_fits”]].
With diagnostics_dynamics() we can extract all the diagnostic criteria of MCMC runs to fit a Bayesian model (rhat, neff, divergences, max_treedepth) and visualize them. If data is a SummarizedExperiment the diagnostics are stored in metadata(data)[[“diagnostics_dynamics”]]. Additionally data frames for visual Posterior predictive checks (PPC) are prepared and plots for the PPCs and diagnostic criteria can be generated with plot_PPC() and plot_diagnostics().
# extract diagnostics
data <- diagnostics_dynamics(
data = data,
iter = 5000, # number of iterations used for model fitting
# the dynamic model
chains = 2 # number of chains used for model fitting
)
plot_diagnostics(
data = data
)
## $divergences
##
## $max_treedepth
##
## $Rhat
##
## $n_eff
There should be no divergent transitions (number of divergent transitions=0) and the maximum treedepth should not be exceeded (number of exceeded treedepth=0).
# PPCs can be accessed with
plot_PPC(
data = data
)
## $posterior_A
## Warning: Removed 18261 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
In the posterior predictive check all data points (black points) should lie within the posterior of the Bayesian model (violin).
After checking the diagnostic criteria and the PPC we can extract the estimates: If data is a SummarizedExperiment the estimates are stored in metadata(data)[[“estimates_dynamics”]]
# #extract estimates
data <- estimates_dynamics(
data = data, iter = 5000,
chains = 2, condition = "condition"
)
We get two major outputs: 1. the estimation of concentration differences between two subsequent time points of each metabolite at each experimental condition 2. the dynamic profiles of each metabolites at each experimental condition
# 1) the differences between two timepoints
plot_estimates(
data = data,
dynamics = FALSE
)
## $plot_timepoint_differences
If the 95% highest density interval (Credible Interval(CrI)) of the posterior does not include zero we can rather credibly state that there is a difference in mean concentrations between two time points. If the 95% HDI lies below zero we likely have a decrease in concentrations between the two time points, if it is above zero we likely have an increase in concentrations between time points.
# 2) dynamic profiles
plot_estimates(
data = data,
delta_t = FALSE
)
## $plot_dynamics
So we now have dynamic profiles of many metabolites at each radiation dose.
We could now cluster these metabolite specific dynamics vectors
(estimates[,c(“mu1.mean”:"mut.mean)]) to determine if groups of metabolites
have similar dynamics.
For the sake of demonstration we use from here on a clustering result (data(“cluster”)) on the full simulated data set (data(“longitudinalMetabolomics”)). In a real life example the optimal number of clusters (“k”) should be determined by optimal clustering criteria such as Gap statistics and average silhouette. The code below shows an example how the estimated dynamics profiles can be used for clustering.
# get distances between vectors
clust_A <- metadata(data)[["estimates_dynamics"]][["A"]]
clust_A <- clust_A %>%
select(metabolite.ID, condition, time.ID, mu_mean) %>%
pivot_wider(id_cols = c(metabolite.ID, condition), names_from = time.ID, values_from = mu_mean)
dist <- clust_A %>% select(-c(metabolite.ID, condition))
dd_A <- dist(dist,
method = "euclidean"
)
# hierarchical clustering
clust <- hclust(dd_A, method = "ward.D2")
clust_cut <- cutree(clust, k = 8)
# adding cluster ID to estimates
clust_A$cluster <- clust_cut
clust_A
## # A tibble: 10 × 7
## metabolite.ID condition `1` `2` `3` `4` cluster
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1 A 0.306 0.617 -0.243 -0.895 1
## 2 2 A -0.140 0.901 -0.204 -0.413 2
## 3 3 A -0.834 0.0719 -0.810 0.0164 3
## 4 4 A 1.03 0.415 0.438 -0.703 4
## 5 5 A -0.110 0.589 0.0893 -0.658 2
## 6 6 A 0.208 -0.764 -0.667 0.162 5
## 7 7 A -0.0585 -0.113 -0.572 -0.0486 6
## 8 8 A 0.0619 0.495 0.177 0.0759 7
## 9 9 A -0.277 0.291 -0.284 -0.0553 6
## 10 10 A 0.842 -0.218 0.766 0.204 8
The metadata of the SummarizedExperiment object “longitudinalMetabolomics” holds the clustering results of the whole simulated dataset data (“longitudinalMetabolomics”).
data <- metadata(longitudinalMetabolomics)[["cluster"]]
temp <- data
temp <- temp %>% pivot_longer(
cols = 4:7,
names_to = "timepoint", values_to = "mu_mean"
)
ggplot(temp, aes(
x = as.factor(as.numeric(as.factor(timepoint))),
y = mu_mean, group = metabolite
)) +
geom_line() +
xlab("timepoint") +
ylab("estimated mean concentration") +
theme_bw() +
ylim(-2,2)+
theme(legend.position = "none") +
facet_grid(rows = vars(condition), cols = vars(cluster)) +
ggtitle("clustered dynamics", "panels=cluster ID")
As we can see metabolites show different dynamics in different experimental conditions. Can we quantify the biological function of these dynamics clusters?
To quantify the possible biological function of these dynamics clusters we retrieved from the KEGG-database the following information with package KEGGREST: 1. to which functional modules our experimental metabolites are annotated and 2. which metabolites are annotated to functional modules in general.
The functional modules of the KEGG-database are organised in three hierarchies: upper, middle and lower. Here we will conduct functional analysis on the middle hierarchy. To facilitate analysis the data frames “metabolite_modules”, which holds the information about experimental metabolites, and “modules_compounds”, which holds the information about which metabolites are in general annotated to functional modules, were prepared. The needed data frame for different sets of experimental metabolites can be retrieved with the function get_ORA_annotations. This function also allows to update the background KEGG information stored in “modules_compounds”.
We load both data sets and can inspect the documentation.
data("metabolite_modules")
help("metabolite_modules")
head(metabolite_modules)
## # A tibble: 6 × 8
## ...1 metabolite KEGG module_id module_name upper_hierarchy middle_hierarchy
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 1-Aminocyc… C012… M00368 Ethylene b… Pathway modules Amino acid meta…
## 2 2 2-Aminomuc… C022… M00038 Tryptophan… Pathway modules Amino acid meta…
## 3 3 2-Phosphog… C006… M00001 Glycolysis… Pathway modules Carbohydrate me…
## 4 4 2-Phosphog… C006… M00002 Glycolysis… Pathway modules Carbohydrate me…
## 5 5 2-Phosphog… C006… M00003 Gluconeoge… Pathway modules Carbohydrate me…
## 6 6 2-Phosphog… C006… M00346 Formaldehy… Pathway modules Energy metaboli…
## # ℹ 1 more variable: lower_hierarchy <chr>
data("modules_compounds")
help("modules_compounds")
head(modules_compounds)
## module_id module_name KEGG
## 2 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00267
## 3 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00668
## 4 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00085
## 5 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00354
## 6 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00111
## 7 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00118
## upper_hierarchy middle_hierarchy lower_hierarchy
## 2 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 3 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 4 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 5 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 6 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 7 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
Here we have to keep in mind that not all KEGG modules are suitable for testing on every observed organism and experimental condition. For example the modules “Xenobiotics biodegradation”,“Biosynthesis of other secondary metabolites” and “Biosynthesis of terpenoids and polyketides” should not be analyzed in a human lung cancer cell line.
For the functional analysis we employ a hypergeometric model. We consider a functional module as over-represented in a cluster if the 95% inter-quantile range (ICR) of the log-transformed probabilities of OvEs (observed vs expected) lies above zero. OvE refers to the ratio of observed metabolites in a cluster being mapped to a functional module over the number of expected metabolites in a cluster being mapped to a module under the assumption of a hypergeometric distribution (=drawing without replacement). If data is a SummarizedExperiment object where the clustering solution is stored in metadata(data)[[“cluster”]] the ORA results are stored in metadata(data)[[“ORA_tested_column”]] We apply the functional analysis to the middle and lower hierarchy of functional modules.
data <- ORA_hypergeometric(
background = modules_compounds,
annotations = metabolite_modules,
data = longitudinalMetabolomics, tested_column = "middle_hierarchy"
)
plot_ORA(data, tested_column = "middle_hierarchy")
data <- ORA_hypergeometric(
background = modules_compounds,
annotations = metabolite_modules,
data = longitudinalMetabolomics,
tested_column = "lower_hierarchy"
)
plot_ORA(data, tested_column = "lower_hierarchy")
Great, we can now see which functional module is over- (green points and error-bars) or under-represented (none in this example) in which dynamics cluster! For instance in cluster 3 at condition A and C the modules “Energy metabolism” and “Carbohydrate metabolism” are over-represented.
We can not only conduct over-representation analysis of KEGG-functional modules but also compare dynamics clusters across different experimental conditions. For this we employ a Bayesian model that estimates the mean difference as well as the standard deviation of differences between dynamics clusters. If data is a SummarizedExperiment object the results of compare_metabolites are stored in metadata(data)[[“comparison_dynamics”]]
dist = vector of pairwise euclidean distances between each dynamic vector of cluster a and every dynamic vector of cluster b, ID = cluster pair ID \[\begin{align*} dist_{ID}&\sim {\sf normal}(\mu_{ID},\sigma_{ID}) \\ \mu_{ID}&\sim {\sf normal^+}(0,2) \\ \sigma_{ID}&\sim {\sf exponential}(1) \end{align*}\]
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
data = longitudinalMetabolomics,
dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
cores = 1
)
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000427 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.27 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 4.041 seconds (Warm-up)
## Chain 1: 8.489 seconds (Sampling)
## Chain 1: 12.53 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000352 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 3.52 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 4.114 seconds (Warm-up)
## Chain 2: 8.482 seconds (Sampling)
## Chain 2: 12.596 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000409 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 4.09 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 4.088 seconds (Warm-up)
## Chain 3: 8.62 seconds (Sampling)
## Chain 3: 12.708 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000423 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 4.23 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 4.124 seconds (Warm-up)
## Chain 4: 8.615 seconds (Sampling)
## Chain 4: 12.739 seconds (Total)
## Chain 4:
We can visualize the posterior estimates of the model with:
heatmap_dynamics(data = longitudinalMetabolomics)
The bigger and brighter a point, the smaller is the mean distance between dynamics clusters and the smaller is the standard deviation. That means big bright points indicate high dynamic similarity which small spread. Here B_8 and A_4 have high similarity in dynamics.
We can also compare metabolite composition of clusters. For this we employ the Jaccard Index which is a metric for similarity of two vectors. Values near 0 indicate low similarity, values near 1 high similarity. If data is a SummarizedExperiment object the results of compare_metabolites are stored in metadata(data)[[“comparison_metabolites”]].
longitudinalMetabolomics <- compare_metabolites(
data = longitudinalMetabolomics
)
heatmap_metabolites(data = longitudinalMetabolomics)
The brighter a tile, the higher is the similarity of two clusters in regard
to their metabolite composition.
We have two clusters that are similar in their metabolite composition:
C_6 and A_5. If we compare that to the dynamics profiles and ORA analysis
we see that similar functional modules are over-expressed as expected BUT
the dynamics differ between the two radiation doses.
Can we facilitate visualization?
dynamics <- metadata(longitudinalMetabolomics)[["comparison_dynamics"]][["estimates"]]
metabolites <- metadata(longitudinalMetabolomics)[["comparison_metabolites"]]
temp <- left_join(dynamics, metabolites, join_by("cluster_a", "cluster_b"))
x <- unique(c(temp[, "cluster_a"], temp[, "cluster_b"]))
temp <- temp %>% mutate(scale_Jaccard = scale(Jaccard))
ggplot(temp, aes(x = cluster_b, y = cluster_a)) +
geom_point(aes(size = Jaccard, col = mu_mean)) +
theme_bw() +
scale_color_viridis_c(option = "magma") +
scale_x_discrete(limits = x) +
xlab("") +
ylab("") +
scale_y_discrete(limits = x) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(col = "dynamics distance", size = "metabolite similarity") +
ggtitle("comparison of clusters")
We can find a cluster pair that is pretty similar in regards to their composing metabolites but dissimilar in regards to their dynamics. Their ORA profiles are quite similar as expected from the similar metabolite compositions but they show different dynamics between experimental conditions: B_7 and A_4
sessionInfo()
## R Under development (unstable) (2025-03-13 r87965)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidyr_1.3.1 dplyr_1.1.4
## [3] ggplot2_3.5.1 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 GenomicRanges_1.59.1
## [7] GenomeInfoDb_1.43.4 IRanges_2.41.3
## [9] S4Vectors_0.45.4 BiocGenerics_0.53.6
## [11] generics_0.1.3 MatrixGenerics_1.19.1
## [13] matrixStats_1.5.0 MetaboDynamics_0.99.22
## [15] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.47.0 gtable_0.3.6 xfun_0.51
## [4] bslib_0.9.0 QuickJSR_1.6.0 inline_0.3.21
## [7] lattice_0.22-6 vctrs_0.6.5 tools_4.6.0
## [10] curl_6.2.1 parallel_4.6.0 tibble_3.2.1
## [13] pkgconfig_2.0.3 Matrix_1.7-3 RcppParallel_5.1.10
## [16] lifecycle_1.0.4 GenomeInfoDbData_1.2.14 farver_2.1.2
## [19] compiler_4.6.0 stringr_1.5.1 Biostrings_2.75.4
## [22] tinytex_0.56 munsell_0.5.1 codetools_0.2-20
## [25] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [28] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
## [31] DelayedArray_0.33.6 cachem_1.1.0 magick_2.8.5
## [34] StanHeaders_2.32.10 abind_1.4-8 rstan_2.32.7
## [37] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
## [40] purrr_1.0.4 bookdown_0.42 labeling_0.4.3
## [43] fastmap_1.2.0 grid_4.6.0 colorspace_2.1-1
## [46] cli_3.6.4 SparseArray_1.7.6 magrittr_2.0.3
## [49] loo_2.8.0 S4Arrays_1.7.3 utf8_1.2.4
## [52] pkgbuild_1.4.6 withr_3.0.2 UCSC.utils_1.3.1
## [55] scales_1.3.0 rmarkdown_2.29 XVector_0.47.2
## [58] httr_1.4.7 gridExtra_2.3 png_0.1-8
## [61] evaluate_1.0.3 knitr_1.50 V8_6.0.2
## [64] viridisLite_0.4.2 rstantools_2.4.0 rlang_1.1.5
## [67] Rcpp_1.0.14 glue_1.8.0 BiocManager_1.30.25
## [70] jsonlite_1.9.1 R6_2.6.1