--- title: "ExpoRiskR: Exposure-aware multi-omics integration in Bioconductor" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{ExpoRiskR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} library(ExpoRiskR) library(BiocStyle) ``` # Abstract Environmental and lifestyle exposures play a central role in shaping host-associated biological systems, including the microbiome and metabolome. Integrative analysis of multi-omics data together with exposure information is therefore essential for understanding disease risk and environmental health mechanisms. However, many existing multi-omics integration tools focus on correlation or latent factor discovery without explicitly accounting for exposure variables. **ExpoRiskR** is a Bioconductor package designed for exposure-aware multi-omics integration, providing standardized workflows for aligning, preprocessing, and analyzing microbiome, metabolomics, and exposure data. The package emphasizes interoperability with Bioconductor data structures, particularly the `SummarizedExperiment` class, enabling seamless integration with existing Bioconductor workflows. ExpoRiskR supports reproducible and interpretable analysis of exposure-adjusted cross-omics associations, making it especially suitable for exposome, environmental epidemiology, and disease onset studies. # Introduction and Motivation High-throughput profiling technologies have enabled the simultaneous measurement of multiple molecular layers, such as the microbiome and metabolome, across large cohorts. In parallel, advances in exposure science have made it possible to quantify environmental and lifestyle factors that influence biological systems. Integrating these heterogeneous data types remains challenging due to differences in scale, structure, and confounding by exposures. Most multi-omics integration approaches aim to identify shared variation or discriminative features across omics layers. While powerful, these methods often do not explicitly incorporate exposure variables into the integration framework, making it difficult to disentangle intrinsic biological relationships from exposure-driven effects. ExpoRiskR was developed to address this gap by providing a coherent workflow to align samples across multiple omics layers and exposure metadata, preprocess heterogeneous data in a consistent manner, and enable downstream analysis of cross-omics associations in the context of measured exposures. # Why Bioconductor? ExpoRiskR is implemented within the Bioconductor ecosystem to leverage its mature infrastructure for high-throughput biological data analysis. Bioconductor provides standardized data containers, rigorous software review, and strong guarantees of interoperability and reproducibility. A key design principle of ExpoRiskR is native support for the `SummarizedExperiment` class. By operating directly on `SummarizedExperiment` objects, ExpoRiskR integrates naturally with existing Bioconductor workflows and can be combined with other Bioconductor packages without ad-hoc data transformations. # Related Bioconductor Packages and Comparison Several Bioconductor packages address multi-omics data integration, each with a distinct methodological focus. - **mixOmics / DIABLO** focuses on supervised and unsupervised multiblock integration for feature selection and classification but does not explicitly adjust for exposure variables. - **MOFA2** uses latent factor models to capture shared sources of variation across omics datasets, providing powerful dimensionality reduction but limited direct interpretability in exposure-aware contexts. - **MultiAssayExperiment** provides a flexible data container for managing multiple assays but does not perform integration or association analysis itself. In contrast, ExpoRiskR emphasizes exposure-aware and interpretable integration, aiming to identify biologically meaningful cross-omics relationships that persist after accounting for measured exposures. # Using ExpoRiskR with SummarizedExperiment Objects ExpoRiskR natively supports `SummarizedExperiment` objects through dedicated helper functions, enabling direct integration with Bioconductor workflows. ## Creating and Aligning SummarizedExperiment Inputs ```{r se-align} set.seed(7) d <- generate_dummy_exporisk( n = 12, p_micro = 5, p_metab = 6, p_expo = 3 ) aligned <- align_omics_se( microbiome = d$microbiome, metabolome = d$metabolome, exposures = d$exposures, meta = d$meta, id_col = "sample_id", strict = TRUE ) aligned$se_microbiome aligned$se_metabolome ``` ## Preprocessing SummarizedExperiment-based Data ```{r se-prep} prepped <- prep_omics_se(aligned) prepped ``` ## Network construction and visualization Below we construct an exposure-adjusted microbe–metabolite network from the preprocessed matrices and visualize the resulting graph. ```{r se-network, message=FALSE, warning=FALSE, fig.width=7, fig.height=5} X <- prepped$X Y <- prepped$Y E <- prepped$E net <- build_exposure_network( X = X, Y = Y, E = E, fdr = 0.8, # relaxed for vignette speed / non-empty illustration max_pairs = 1500, seed = 1 ) plot_exposure_network(net) ``` ## Exposure perturbation ranking ExpoRiskR summarizes how strongly each exposure perturbs the estimated cross-omics associations and ranks exposures accordingly. ```{r se-ranking, message=FALSE, warning=FALSE, fig.width=7, fig.height=4} scores <- exposure_perturbation_score( X = X, Y = Y, E = E, fdr = 0.8, max_pairs = 1500, seed = 1 ) print(plot_exposure_ranking(scores, top_n = 15)) ``` ## Exposure feature importance This example uses the simulated outcome included in the dummy metadata to illustrate simple exposure feature importance summaries. ```{r se-importance, message=FALSE, warning=FALSE, fig.width=7, fig.height=4} outcome <- d$meta$outcome names(outcome) <- d$meta$sample_id print(plot_feature_importance( E = E, outcome = outcome, top_n = 15 )) ``` ## Risk ROC curve As an illustrative end-to-end example, ExpoRiskR can visualize discrimination performance for a simple risk model derived from the exposure-adjusted network. ```{r se-roc, message=FALSE, warning=FALSE, fig.width=6, fig.height=4} print(plot_risk_roc( X = X, Y = Y, E = E, outcome = outcome, edges = net$edges, top_edges = 80 )) ``` ## Integration with Bioconductor Workflows Because ExpoRiskR operates on `SummarizedExperiment` objects, users can seamlessly integrate it with other Bioconductor packages for downstream analysis, including differential testing, visualization, or additional statistical modeling. # Summary ExpoRiskR provides a Bioconductor-native framework for exposure-aware multi-omics integration with a strong emphasis on interpretability and reproducibility. By supporting `SummarizedExperiment` objects and aligning with existing Bioconductor workflows, the package enables robust investigation of exposure-driven biological mechanisms in environmental health and disease studies. # Session information ```{r} sessionInfo() ```