--- title: "LimROTS: Survival Analysis with Cox and Competing Risks Models" bibliography: references.bib output: BiocStyle::html_document: fig_height: 7 fig_width: 7 toc: true toc_float: true toc_depth: 3 number_sections: true vignette: > %\VignetteIndexEntry{LimROTS: Survival Analysis with Cox and Competing Risks Models} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction LimROTS extends its reproducibility-optimized testing framework to survival analysis via the `LimROTS_survival()` function. It fits a Cox proportional hazards model @survival — or, optionally, a competing risks model via the Fine–Gray subdistribution approach @cmprsk — per feature across bootstrap resamples. The reproducibility-optimization step then selects the parameters $\alpha_1$ and $\alpha_2$ that maximize the overlap of top-ranked features over those resamples, yielding: $$t_{\alpha(p)} = \frac{\beta_{(p)}}{\alpha_1 + \alpha_2 \times SE_{(p)}}$$ where $\beta_{(p)}$ is the Cox model coefficient (log hazard ratio) and $SE_{(p)}$ is its standard error for feature $p$. P-values are derived from permutation-based null distributions using the `qvalue` package @qvalue. This vignette demonstrates both models using a Crohn's disease gut microbiome dataset. # Citation If you use `LimROTS` in your research, please cite: > Anwar, A. M., Jeba, A., Lahti, L., & Coffey, E. (2025). LimROTS: A > Hybrid Method Integrating Empirical Bayes and > Reproducibility-Optimized Statistics for Robust Differential > Expression Analysis. *Bioinformatics*, btaf570. > # Dataset Description We use the `crohn_survival` dataset from the `miaTime` package @miaTime. This is a simulated dataset based on a real Crohn's disease microbiome study [@Gevers2014; @coda4microbiome]. It contains **150 individuals** and **48 gut microbial taxa** stored as a `TreeSummarizedExperiment` object. The relevant `colData` columns are: | Column | Description | |-----------------|-------------------------------------------------------| | `diagnosis` | Group label (`"CD"` = Crohn's disease, `"Control"`) | | `event` | Binary event indicator (1 = Crohn's disease, 0 = censored) | | `event_time` | Time to event or censoring (unitless) | The analysis goal is to identify taxa whose abundance is associated with time to Crohn's disease diagnosis. ## Loading the Data ```{r load-packages, message=FALSE} library(LimROTS, quietly = TRUE) library(mia, quietly = TRUE) library(miaTime, quietly = TRUE) library(TreeSummarizedExperiment, quietly = TRUE) library(BiocParallel, quietly = TRUE) library(SummarizedExperiment, quietly = TRUE) library(ggplot2, quietly = TRUE) ``` ```{r load-dataset} data("crohn_survival") print(crohn_survival) ``` `crohn_survival` is a `TreeSummarizedExperiment`. We first apply the centered log-ratio (CLR) transformation using `mia::transformAssay()`, which stores the result as a named assay (`"clr"`) alongside the original counts. We then coerce to a plain `SummarizedExperiment` as required by `LimROTS_survival()`: ```{r clr-transform} crohn_survival <- transformAssay( crohn_survival, assay.type = "counts", method = "clr", pseudocount = TRUE, name = "clr" ) assayNames(crohn_survival) ``` ```{r convert-to-se} taxa_names <- rownames(crohn_survival) crohn_survival <- as(crohn_survival, "SummarizedExperiment") # as() coercion can drop rownames; restore them if (is.null(rownames(crohn_survival))) { rownames(crohn_survival) <- taxa_names } print(crohn_survival) ``` Sample names in `crohn_survival` contain `.` characters, which can cause issues in formula parsing. We replace them with `_` in both the assay column names and the `colData` row names: ```{r fix-sample-names} clean_names <- gsub("\\.", "_", colnames(crohn_survival)) colnames(crohn_survival) <- clean_names rownames(colData(crohn_survival)) <- clean_names # Rename event_time -> time as required by LimROTS_survival() colnames(colData(crohn_survival))[ colnames(colData(crohn_survival)) == "event_time"] <- "time" ``` Inspect the survival outcome distribution: ```{r inspect-coldata} table(colData(crohn_survival)$diagnosis, colData(crohn_survival)$event) ``` # Cox Proportional Hazards Analysis The standard Cox model tests each taxon for an association with time-to-event independently of any other covariates. ## Running `LimROTS_survival` ```{r set-seed-cox} set.seed(1234, kind = "default", sample.kind = "default") ``` ```{r run-cox} crohn_cox <- LimROTS_survival( x = crohn_survival, assay.type = "clr", niter = 30, # use >= 1000 for real analyses K = 10, # number of top taxa meta.info = c("time", "event"), formula.str = "~ Surv(time, event)", competing_risks = FALSE, verbose = FALSE ) ``` > **Note:** `niter = 30`is used here to reduce build time. For > production analyses, use `niter = 1000` or higher and increase the > number of parallel workers via the `BPPARAM` argument: > > ``` r > # Windows > BPPARAM <- SnowParam(workers = 4) > # Linux / macOS > BPPARAM <- MulticoreParam(workers = 4) > # Pass to LimROTS_survival(..., BPPARAM = BPPARAM) > ``` ## Results The results are appended to the `rowData` and `metadata` slots of the returned `SummarizedExperiment`. ```{r cox-results} cox_res <- data.frame( rowData(crohn_cox), row.names = rownames(crohn_cox) ) head(cox_res[order(cox_res$pvalue), ]) ``` Volcano plot of log hazard ratios versus $-\log_{10}$(q-value): ```{r volcano-cox} cox_res$significant <- cox_res$FDR < 0.05 ggplot(cox_res, aes( x = log(exp_coef), y = -log10(pvalue), color = significant )) + geom_point(alpha = 0.7, size = 2) + geom_hline( yintercept = -log10(0.05), linetype = "dashed", color = "steelblue" ) + scale_color_manual( values = c("FALSE" = "grey60", "TRUE" = "firebrick"), labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05") ) + labs( title = "LimROTS Survival \u2014 Cox Model", x = "Log Hazard Ratio", y = expression(-log[10](pvalue)), color = NULL ) + theme_bw(base_size = 12) + theme(legend.position = "top") ``` # Competing Risks Analysis When a competing event (e.g., death from another cause, remission) prevents the event of interest from occurring, a competing risks model is more appropriate than the standard Cox model. `LimROTS_survival()` supports the Fine–Gray subdistribution hazard model @cmprsk when `competing_risks = TRUE`. The event variable must be coded as: - **0** — censored - **1** — event of interest - **2** — competing event ## Preparing the Data The `crohn_survival` dataset has a binary event column (0/1). For illustration, we recode half of the censored observations as having experienced a competing event: ```{r prepare-competing} crohn_cr <- crohn_survival event_vec <- colData(crohn_cr)$event censored_idx <- which(event_vec == 0) set.seed(42) competing_idx <- sample(censored_idx, floor(length(censored_idx) / 2)) event_vec[competing_idx] <- 2L colData(crohn_cr)$event <- event_vec table(colData(crohn_cr)$event) ``` > **Note:** In a real study, the competing event column should reflect a > genuine competing risk recorded in the original data (e.g., surgical > intervention, withdrawal). The recoding above is purely for > demonstration purposes. ## Running `LimROTS_survival` ```{r set-seed-cr} set.seed(1234, kind = "default", sample.kind = "default") ``` ```{r run-competing} crohn_cr <- LimROTS_survival( x = crohn_cr, assay.type = "clr", niter = 30, K = 10, meta.info = c("time", "event"), formula.str = "~ Surv(time, event)", competing_risks = TRUE, verbose = FALSE ) ``` ## Results ```{r cr-results} cr_res <- data.frame( rowData(crohn_cr), row.names = rownames(crohn_cr) ) head(cr_res[order(cr_res$pvalue), ]) ``` Volcano plot for the competing risks model: ```{r volcano-cr} cr_res$significant <- cr_res$FDR < 0.05 ggplot(cr_res, aes( x = log(exp_coef), y = -log10(pvalue), color = significant )) + geom_point(alpha = 0.7, size = 2) + geom_hline( yintercept = -log10(0.05), linetype = "dashed", color = "steelblue" ) + scale_color_manual( values = c("FALSE" = "grey60", "TRUE" = "firebrick"), labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05") ) + labs( title = "LimROTS Survival \u2014 Competing Risks Model", x = "Log Subdistribution Hazard Ratio", y = expression(-log[10](pvalue)), color = NULL ) + theme_bw(base_size = 12) + theme(legend.position = "top") ``` # Quality Control ```{r qc-cox, results="hide", message=FALSE, warning=FALSE} plot(metadata(crohn_cr)[["q_values"]]) hist(metadata(crohn_cr)[["q_values"]]) print(summary(metadata(crohn_cr)[["q_values"]])) ``` ```{r session-info} sessionInfo() ``` # References