Contents

1 Background

This package was developed to facilitate the analysis of longitudinal metabolomics data. Most tools only allow the comparison between two time points or experimental conditions and are using frequentist statistical methods.

Here we want to show a complete workflow to analyze concentration tables.

As an example we have a data set of irradiated cancer cells lines that were observed over four timepoints.

2 Installation

MetaboDynamics can be installed from the devel branch of Bioconductor.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("MetaboDynamics")

2.1 Setup: load required packages

library(MetaboDynamics)
library(SummarizedExperiment)
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
## 
##     combine
## The following objects are masked from 'package:GenomicRanges':
## 
##     intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
## 
##     intersect
## The following objects are masked from 'package:IRanges':
## 
##     collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
## 
##     first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union
## The following object is masked from 'package:matrixStats':
## 
##     count
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:S4Vectors':
## 
##     expand

3 Load data and plot data overview

We have a simulated data (") set of 98 metabolites with three measurement replicates at four time points (1-4) across 3 experimental conditions (A-B). In the first step in this workflow we estimate the dynamics of every single metabolite at every experimental condition (here: radiation dose).

The simulated data is represented as SummarizedExperiment object.

As metabolomics data is often noisy and we generally have few replicates due to high costs, a robust method is needed for the estimation of mean concentrations at every time point. For this we employ a Bayesian hierarchical model that assumes normal distributions of log-transformed metabolite concentrations. The next plot shows the raw data.

data("longitudinalMetabolomics")
# convert to dataframe
longitudinalMetabolomics <- as.data.frame(SummarizedExperiment::colData(longitudinalMetabolomics))
ggplot(longitudinalMetabolomics, aes(x = measurement)) +
  geom_density() +
  theme_bw() +
  facet_grid(cols = vars(time), rows = vars(condition)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  ggtitle("raw data", "raw measurements")

The raw data is not distributed normally. So let’s log-transform the values. In the integrated simulated dataset this is already done in the column “log_m”.

# we standardize to a mean of zero and a standard deviation of one of log-transformed data
ggplot(longitudinalMetabolomics, aes(x = log_m)) +
  geom_density() +
  theme_bw() +
  facet_grid(cols = vars(time), rows = vars(condition)) +
  ggtitle("data", "log-transformed values")

The next plot shows the raw dynamics of single metabolites.

ggplot(longitudinalMetabolomics) +
  geom_line(aes(x = time, y = log_m, col = metabolite, 
                group = interaction(metabolite, replicate))) +
  theme_bw() +
  xlab("timepoint") +
  theme(legend.position = "none") +
  facet_grid(rows = vars(condition)) +
  ggtitle("raw metabolite dynamics", "color=metabolite")

We define dynamics as deviations at the observed time points from the metabolite’s mean concentration. As the raw concentrations of metabolites can differ by orders of magnitude from each other, and we want to be able to compare dynamics of metabolites with each other, we standardize each metabolite at each radiation dose to a mean of zero and a standard deviation of one. In the simulated data set the scaled measurements are in column “m_scaled”.

ggplot(longitudinalMetabolomics) +
  geom_line(aes(
    x = time,
    y = m_scaled, col = metabolite,
    group = interaction(metabolite, replicate)
  )) +
  theme_bw() +
  xlab("timepoint") +
  theme(legend.position = "none") +
  facet_grid(rows = vars(condition)) +
  ggtitle("standardized dynamics", "color=metabolite")

Now we can finally model the dynamics. This might take of the order of 10 minutes per experimental condition.

We employ a Bayesian hierarchical model with con = metabolite concentrations, m = metabolite, c = experimental condition and t = time point ID:

\[\begin{align*} \log(con_{m,c,t})&\sim {\sf normal}(\mu_{m,c,t},\sigma_{m,c,t}) \\ \mu_{m,c,t}&\sim {\sf normal}(0,2) \\ \sigma_{m,c,t}&\sim {\sf exponential}(\lambda_{m,c}) \\ \lambda_{m,c}&\sim {\sf exponential}(2) \end{align*}\]

The code below shows how to fit the model and how to extract the diagnostic criteria from the model fits.

4 Model dynamics

# we can hand a SummarizedExperiment object to the function
data(longitudinalMetabolomics)
# we only use a subsection of the simulated data (1 condition and subsample of
# the whole dataset) for demonstration purposes
samples <- sample(unique(longitudinalMetabolomics$metabolite),10)
longitudinalMetabolomics <- as.data.frame(SummarizedExperiment::colData(longitudinalMetabolomics))
longitudinalMetabolomics <- longitudinalMetabolomics[longitudinalMetabolomics$condition=="A",]
longitudinalMetabolomics <- longitudinalMetabolomics[longitudinalMetabolomics$metabolite%in%samples,]

# fit model
fits_dynamics <- fit_dynamics_model(
  data = longitudinalMetabolomics, scaled_measurement = "m_scaled", time = "time",
  condition = "condition", max_treedepth = 10,
  adapt_delta = 0.95, # default 0.95
  iter = 5000, 
  cores = 1, 
  chains = 2 # only set to 2 for vignette, default = 4
)
## 
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 1: Iteration: 1251 / 5000 [ 25%]  (Sampling)
## Chain 1: Iteration: 1750 / 5000 [ 35%]  (Sampling)
## Chain 1: Iteration: 2250 / 5000 [ 45%]  (Sampling)
## Chain 1: Iteration: 2750 / 5000 [ 55%]  (Sampling)
## Chain 1: Iteration: 3250 / 5000 [ 65%]  (Sampling)
## Chain 1: Iteration: 3750 / 5000 [ 75%]  (Sampling)
## Chain 1: Iteration: 4250 / 5000 [ 85%]  (Sampling)
## Chain 1: Iteration: 4750 / 5000 [ 95%]  (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.163 seconds (Warm-up)
## Chain 1:                2.651 seconds (Sampling)
## Chain 1:                3.814 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 2.4e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 2: Iteration: 1251 / 5000 [ 25%]  (Sampling)
## Chain 2: Iteration: 1750 / 5000 [ 35%]  (Sampling)
## Chain 2: Iteration: 2250 / 5000 [ 45%]  (Sampling)
## Chain 2: Iteration: 2750 / 5000 [ 55%]  (Sampling)
## Chain 2: Iteration: 3250 / 5000 [ 65%]  (Sampling)
## Chain 2: Iteration: 3750 / 5000 [ 75%]  (Sampling)
## Chain 2: Iteration: 4250 / 5000 [ 85%]  (Sampling)
## Chain 2: Iteration: 4750 / 5000 [ 95%]  (Sampling)
## Chain 2: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.207 seconds (Warm-up)
## Chain 2:                2.641 seconds (Sampling)
## Chain 2:                3.848 seconds (Total)
## Chain 2:

This returns a list of model fits that are named by the experimental condition (“A”,“B”,“C”). With diagnostics_dynamics) we can extract all the diagnostic criteria of MCMC runs to fit a Bayesian model (rhat, neff, divergences, max_treedepth) and visualize them. Additionally data frames for visual Posterior predictive checks (PPC) are prepared and Plots generated for the PPCs and diagnostic criteria.

# extract diagnostics
diagnostics_dynamics <- diagnostics_dynamics(
  data = longitudinalMetabolomics, 
  iter = 5000, # number of iterations used for model fitting
  # the dynamic model
  scaled_measurement = "m_scaled",
  fits = fits_dynamics, 
  chains = 2 # number of chains used for model fitting 
)

diagnostics_dynamics[["plot_divergences"]]

diagnostics_dynamics[["plot_treedepth_error"]]

diagnostics_dynamics[["plot_rhat"]]

diagnostics_dynamics[["plot_neff"]]

# PPCs can be accessed with
diagnostics_dynamics[["plot_PCC_A"]]
## Warning: Removed 16920 rows containing non-finite outside the scale range
## (`stat_ydensity()`).

After checking the diagnostic criteria and the PPC we can extract the estimates:

# #extract estimates
estimates_dynamics <- estimates_dynamics(
  condition = "condition",
  data = longitudinalMetabolomics, fits = fits_dynamics, samples = 1,
  iter = 5000, # number of iterations used for model fitting
  chains = 2 # number of chains used for model fitting
) 

We get two major outputs: 1) the estimation of concentration differences between two subsequent time points of each metabolite at each experimental condition 2) the dynamic profiles of each metabolites at each experimental condition

4.1 Differences between two timepoints

# 1) the differences between two timepoints
estimates_dynamics[["plot_timepoint_differences"]]

If the 95% highest density interval of the posterior does not include zero we can rather credibly state that there is a difference in mean concentrations between two time points. If the 95% HDI lies below zero we likely have a decrease in concentrations between the two time points, if it is above zero we likely have an increase in concentrations between time points.

4.2 Dynamic profiles

# 2) dynamic profiles
estimates_dynamics[["plot_dynamics"]]

So we now have dynamic profiles of many metabolites at each radiation dose. We could now cluster these metabolite specific dynamics vectors (estimates_dynamics[,c(“mu1.mean”:"mut.mean)]) to see if groups of metabolites have similar dynamics.

5 Cluster dynamics

For the sake of demonstration we use from here on a clustering result (data(“cluster”)) on the full simulated data set (data(“longitudinalMetabolomics”)). In a real life example the optimal number of clusters (“k”) should be determined by optimal clustering criteria such as Gap statistics and average silhouette. The code belows shows an example how the estimated dynamics profiles can be used for clustering.

# get distances between vectors
dd_A <- dist(
  estimates_dynamics[["A"]][, c(
    "mu1_mean", "mu2_mean",
    "mu3_mean", "mu4_mean"
  )],
  method = "euclidean"
)
# hierarchical clustering
clust <- hclust(dd_A, method = "ward.D2")
clust_cut <- cutree(clust, k = 8)
# adding cluster ID to estimates
clust_A <- estimates_dynamics[["A"]][, c(
  "metabolite", "condition", "mu1_mean", "mu2_mean",
  "mu3_mean", "mu4_mean"
)]
clust_A$cluster <- clust_cut
clust_A
##                metabolite condition    mu1_mean   mu2_mean    mu3_mean
## 1  2-Phosphoglyceric acid         A -0.47850910  0.8139021 -0.22551156
## 2   3-Hydroxybutyric acid         A -0.28717106  0.1257087  0.05441993
## 3   Argininosuccinic acid         A -0.35977180 -0.5391749  0.70081612
## 4              D-Sorbitol         A -0.23759578  0.1389519  0.61911634
## 5            L-Tryptophan         A  0.08818096 -0.3589640  0.08676367
## 6                L-Valine         A -0.43253834 -0.5973267 -0.26087298
## 7               Succinate         A -0.41871472  0.5730558 -0.02778282
## 8               Thymidine         A -0.14773574  0.4355281 -0.04209166
## 9                  Uracil         A  0.62075446 -0.3776548  0.27021761
## 10                   dCTP         A  1.05799368 -0.2972256 -0.13187296
##      mu4_mean cluster
## 1  -0.1939125       1
## 2   0.1529142       2
## 3   0.2733801       3
## 4  -0.6645267       4
## 5   0.2969577       5
## 6   1.0888562       6
## 7  -0.2181755       1
## 8  -0.2304880       1
## 9  -0.4063779       7
## 10 -0.7792338       8

The dataset “cluster” holds the clustering results of the whole simulated dataset data(“longitudinalMetabolomics”), with columns “metabolite”, “condition”, “mu1-t.mean” and “cluster”. “Cluster” refers to the cluster ID of the metabolite.

data("cluster")
temp <- cluster
temp <- temp %>% pivot_longer(
  cols = c(mu1_mean, mu2_mean, mu3_mean, mu4_mean),
  names_to = "timepoint", values_to = "mu_mean"
)
ggplot(temp, aes(
  x = as.factor(as.numeric(as.factor(timepoint))),
  y = mu_mean, group = metabolite
)) +
  geom_line() +
  xlab("timepoint") +
  ylab("estimated mean concentration") +
  theme_bw() +
  theme(legend.position = "none") +
  facet_grid(rows = vars(condition), cols = vars(cluster)) +
  ggtitle("clustered dynamics", "panels=cluster ID")

As we can see metabolites show different dynamics in different experimental conditions. Can we quantify the biological function of these dynamics clusters?

6 Over-representation analysis of functional modules in dynamics clusters

To quantify the possible biological function of these dynamics clusters we retrieved from the KEGG-database the following information with package KEGGREST: 1) to which functional modules our experimental metabolites are annotated and 2) which metabolites are annotated to functional modules in general.

The functional modules of the KEGG-database are organised in three hierarchies: upper, middle and lower. Here we will do functional analysis on the middle hierarchy. To facilitate analysis the data frames “metabolite_modules”, which holds the information about experimental metabolites, and “modules_compounds”, which holds the information about which metabolites are in general annotated to functional modules, were prepared. We load both data sets and can inspect the documentation.

data("metabolite_modules")
help("metabolite_modules")
head(metabolite_modules)
## # A tibble: 6 × 8
##    ...1 metabolite  KEGG  module_id module_name upper_hierarchy middle_hierarchy
##   <dbl> <chr>       <chr> <chr>     <chr>       <chr>           <chr>           
## 1     1 1-Aminocyc… C012… M00368    Ethylene b… Pathway modules Amino acid meta…
## 2     2 2-Aminomuc… C022… M00038    Tryptophan… Pathway modules Amino acid meta…
## 3     3 2-Phosphog… C006… M00001    Glycolysis… Pathway modules Carbohydrate me…
## 4     4 2-Phosphog… C006… M00002    Glycolysis… Pathway modules Carbohydrate me…
## 5     5 2-Phosphog… C006… M00003    Gluconeoge… Pathway modules Carbohydrate me…
## 6     6 2-Phosphog… C006… M00346    Formaldehy… Pathway modules Energy metaboli…
## # ℹ 1 more variable: lower_hierarchy <chr>
data("modules_compounds")
help("modules_compounds")
head(modules_compounds)
## # A tibble: 6 × 6
##    ...1 module_id kegg_id upper_hierarchy middle_hierarchy       lower_hierarchy
##   <dbl> <chr>     <chr>   <chr>           <chr>                  <chr>          
## 1     2 M00001    C00267  Pathway modules Carbohydrate metaboli… Central carboh…
## 2     3 M00001    C00668  Pathway modules Carbohydrate metaboli… Central carboh…
## 3     4 M00001    C05345  Pathway modules Carbohydrate metaboli… Central carboh…
## 4     5 M00001    C05378  Pathway modules Carbohydrate metaboli… Central carboh…
## 5     6 M00001    C00111  Pathway modules Carbohydrate metaboli… Central carboh…
## 6     7 M00001    C00118  Pathway modules Carbohydrate metaboli… Central carboh…

Here we have to keep in mind that not all KEGG modules are suitable for testing on every observed organism and experimental condition. For example the modules “Xenobiotics biodegradation”,“Biosynthesis of other secondary metabolites” and “Biosynthesis of terpenoids and polyketides” should not be analyzed in a human lung cancer cell line.

For the functional analysis we employ a hypergeometric model. We consider a functional module as over-represented in a cluster if the 95% inter-quantile range (ICR) of the log-transformed probabilities of OvEs (observed vs expected) lies above zero. OvE refers to the ratio of observed metabolites in a cluster being mapped to a functional module over the number of expected metabolites in a cluster being mapped to a module under the assumption of a hypergeometric distribution (=drawing without replacement). We apply the functional analysis to the middle and lower hierarchy of functional modules.

data("cluster")
ORA <- ORA_hypergeometric(
  background = modules_compounds,
  annotations = metabolite_modules,
  clusters = cluster, tested_column = "middle_hierarchy"
)
ORA[["plot_ORA"]]

ORA_lower <- ORA_hypergeometric(
  background = modules_compounds,
  annotations = metabolite_modules,
  clusters = cluster[cluster$condition == "A", ],
  tested_column = "lower_hierarchy"
)
ORA_lower[["plot_ORA"]]

Great, we can now see which functional module is over- (green points and error-bars) or under-represented (none in this example) in which dynamics cluster! For instance in cluster 3 at condition A and C the modules “Energy metabolism” and “Carbohydrate metabolism” are over-represented.

7 Comparison of clusters of different experimental conditions

7.1 Dynamics

We can not only do over-representation analysis of KEGG-functional modules but also compare dynamics clusters across different experimental conditions. For this we employ a Bayesian model that estimates the mean difference as well as the standard deviation of differences between dynamics clusters.

dist = vector of pairwise euclidean distances between each dynamic vector of cluster a and every dynamic vector of cluster b, ID = cluster pair ID \[\begin{align*} dist_{ID}&\sim {\sf normal}(\mu_{ID},\sigma_{ID}) \\ \mu_{ID}&\sim {\sf normal^+}(0,2) \\ \sigma_{ID}&\sim {\sf exponential}(1) \end{align*}\]

comparison_dynamics <- compare_dynamics(
  clusters = cluster,
  dynamics = c(
    "mu1_mean", "mu2_mean",
    "mu3_mean", "mu4_mean"),
    cores = 1 # only set to 1 for vignette
)
## 
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000413 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.13 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 3.033 seconds (Warm-up)
## Chain 1:                6.546 seconds (Sampling)
## Chain 1:                9.579 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000285 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.85 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 3.134 seconds (Warm-up)
## Chain 2:                6.612 seconds (Sampling)
## Chain 2:                9.746 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000286 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.86 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 3.116 seconds (Warm-up)
## Chain 3:                6.586 seconds (Sampling)
## Chain 3:                9.702 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0.000282 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.82 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 3.13 seconds (Warm-up)
## Chain 4:                6.625 seconds (Sampling)
## Chain 4:                9.755 seconds (Total)
## Chain 4:
comparison_dynamics[["plot_dynamic_comparison"]]

The bigger and brighter a point, the smaller is the mean distance between dynamics clusters and the smaller is the standard deviation. That means big bright points indicate high dynamic similarity which small spread. Here B_8 and A_4 have high similarity in dynamics.

7.2 Metabolites

comparison_metabolites <- compare_metabolites(clusters = cluster)
comparison_metabolites[["plot_metabolite_comparison"]]

We have two clusters that are very similar in their metabolite composition: C_6 and A_5. If we compare that to the dynamics profiles and ORA analysis we see that similar functional modules are over-expressed as expected BUT the dynamics differ between the two radiation doses.

Can we facilitate visualization?

7.3 Combine both

dynamics <- comparison_dynamics[["estimates"]]
metabolites <- comparison_metabolites[["Jaccard"]]
temp <- left_join(dynamics, metabolites, by = c("cluster_a", "cluster_b"))
x <- unique(c(temp[,"cluster_a"],temp[, "cluster_b"]))
temp <- temp %>% mutate(scale_Jaccard = scale(Jaccard))
ggplot(temp, aes(x = cluster_b, y = cluster_a)) +
  geom_point(aes(size = Jaccard, col = mu_mean)) +
  theme_bw() +
  scale_color_viridis_c(option = "magma") +
  scale_x_discrete(limits = x) +
  xlab("") +
  ylab("") +
  scale_y_discrete(limits = x) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  labs(col = "dynamics distance", size = "metabolite similarity") +
  ggtitle("comparison of clusters")

We can find a cluster pair that is pretty similar in regards to their composing metabolites but dissimilar in regards to their dynamics. Their ORA profiles are quite similar as expected from the similar metabolite compositions but they show different dynamics between experimental conditions: B_7 and A_4

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tidyr_1.3.1                 dplyr_1.1.4                
##  [3] ggplot2_3.5.1               SummarizedExperiment_1.35.4
##  [5] Biobase_2.65.1              GenomicRanges_1.57.2       
##  [7] GenomeInfoDb_1.41.2         IRanges_2.39.2             
##  [9] S4Vectors_0.43.2            BiocGenerics_0.51.3        
## [11] MatrixGenerics_1.17.0       matrixStats_1.4.1          
## [13] MetaboDynamics_0.99.2       BiocStyle_2.33.1           
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.45.1         gtable_0.3.5            QuickJSR_1.4.0         
##  [4] xfun_0.48               bslib_0.8.0             inline_0.3.19          
##  [7] lattice_0.22-6          vctrs_0.6.5             tools_4.4.1            
## [10] generics_0.1.3          curl_5.2.3              parallel_4.4.1         
## [13] tibble_3.2.1            fansi_1.0.6             highr_0.11             
## [16] pkgconfig_2.0.3         Matrix_1.7-1            RcppParallel_5.1.9     
## [19] lifecycle_1.0.4         GenomeInfoDbData_1.2.13 farver_2.1.2           
## [22] compiler_4.4.1          stringr_1.5.1           Biostrings_2.73.2      
## [25] tinytex_0.53            munsell_0.5.1           codetools_0.2-20       
## [28] htmltools_0.5.8.1       sass_0.4.9              yaml_2.3.10            
## [31] pillar_1.9.0            crayon_1.5.3            jquerylib_0.1.4        
## [34] cachem_1.1.0            DelayedArray_0.31.14    magick_2.8.5           
## [37] StanHeaders_2.32.10     abind_1.4-8             rstan_2.32.6           
## [40] tidyselect_1.2.1        digest_0.6.37           stringi_1.8.4          
## [43] purrr_1.0.2             bookdown_0.41           labeling_0.4.3         
## [46] fastmap_1.2.0           grid_4.4.1              colorspace_2.1-1       
## [49] cli_3.6.3               SparseArray_1.5.45      magrittr_2.0.3         
## [52] loo_2.8.0               S4Arrays_1.5.11         pkgbuild_1.4.4         
## [55] utf8_1.2.4              withr_3.0.1             scales_1.3.0           
## [58] UCSC.utils_1.1.0        rmarkdown_2.28          XVector_0.45.0         
## [61] httr_1.4.7              gridExtra_2.3           png_0.1-8              
## [64] evaluate_1.0.1          knitr_1.48              V8_6.0.0               
## [67] viridisLite_0.4.2       rstantools_2.4.0        rlang_1.1.4            
## [70] Rcpp_1.0.13             glue_1.8.0              BiocManager_1.30.25    
## [73] jsonlite_1.8.9          R6_2.5.1                zlibbioc_1.51.2