--- title: "Statistical Methods in bbssr" author: "Gosuke Homma" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Statistical Methods in bbssr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 6, fig.align = "center" ) ``` ## Introduction This vignette provides a detailed explanation of the statistical methods implemented in the `bbssr` package. We cover the theoretical foundations of blinded sample size re-estimation (BSSR) and the five exact statistical tests supported by the package. ```{r load_packages, message=FALSE, warning=FALSE} library(bbssr) library(dplyr) library(ggplot2) library(tidyr) ``` ## Theoretical Foundation of BSSR ### The Problem with Fixed Sample Sizes Traditional clinical trials use fixed sample sizes determined during the planning phase based on: - Assumed treatment effect size - Expected response rates in each group - Desired power and significance level However, these assumptions are often inaccurate, leading to: - **Underpowered studies** when the assumed effect size is too optimistic - **Overpowered studies** when the assumed effect size is too conservative - **Resource inefficiency** due to incorrect sample size planning - **Ethical concerns** about continuing underpowered trials ### The BSSR Solution Blinded Sample Size Re-estimation addresses these issues by: 1. **Interim Analysis**: Conducting an interim analysis using a fraction (ω) of the planned data 2. **Pooled Estimation**: Estimating the pooled response rate without unblinding treatment assignments 3. **Sample Size Adjustment**: Recalculating required sample size based on observed pooled data 4. **Controlled Type I Error**: Maintaining exact statistical control throughout the process ### Mathematical Framework Let's define the key parameters: - $n_1, n_2$: Initial sample sizes for groups 1 and 2 - $X_1, X_2$: Number of responders in groups 1 and 2 - $p_1, p_2$: True response probabilities - $\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}$: Pooled response rate (observable) - $\Delta = p_1 - p_2$: Treatment effect (risk difference) - $\omega$: Fraction of data used for interim analysis ## Exact Statistical Tests The `bbssr` package implements five exact statistical tests, each with different characteristics and optimal use cases. ### 1. Pearson Chi-squared Test (`'Chisq'`) The one-sided Pearson chi-squared test uses the test statistic: $$Z_{ij} = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$ where $\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}$ is the pooled proportion. **P-value formula:** $$p\text{-value} = P(Z \geq z_{\text{obs}}) = 1 - \Phi(z_{\text{obs}})$$ where $\Phi(\cdot)$ is the standard normal cumulative distribution function. ```{r chisq_example} # Example: Chi-squared test power_chisq <- BinaryPower( p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = 'Chisq' ) print(paste("Chi-squared test power:", round(power_chisq, 3))) ``` **Characteristics:** - Good asymptotic properties for large samples - Computationally efficient - May be anti-conservative for small samples ### 2. Fisher Exact Test (`'Fisher'`) The Fisher exact test conditions on the total number of successes and uses the hypergeometric distribution. **P-value formula:** $$p\text{-value} = P(X_1 \geq k | X_1 + X_2 = s) = \sum_{i=k}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}$$ where $k$ is the observed number of successes in group 1, and $s = X_1 + X_2$ is the total number of successes. The conditional probability mass function is: $$P(X_1 = i | X_1 + X_2 = s) = \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}$$ ```{r fisher_example} # Example: Fisher exact test power_fisher <- BinaryPower( p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = 'Fisher' ) print(paste("Fisher exact test power:", round(power_fisher, 3))) ``` **Characteristics:** - Exact Type I error control - Conservative (actual α < nominal α) - Widely accepted by regulatory agencies - Conditional test ### 3. Fisher Mid-p Test (`'Fisher-midP'`) The Fisher mid-p test reduces the conservatism of the Fisher exact test by including half the probability of the observed outcome. **P-value formula:** $$p\text{-value} = P(X_1 > k | X_1 + X_2 = s) + 0.5 \cdot P(X_1 = k | X_1 + X_2 = s)$$ This can be expressed as: $$p\text{-value} = \sum_{i=k+1}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}} + 0.5 \cdot \frac{\binom{n_1}{k}\binom{n_2}{s-k}}{\binom{n_1+n_2}{s}}$$ ```{r fisher_midp_example} # Example: Fisher mid-p test power_midp <- BinaryPower( p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = 'Fisher-midP' ) print(paste("Fisher mid-p test power:", round(power_midp, 3))) ``` **Characteristics:** - Less conservative than Fisher exact - Better power properties - Maintains approximate Type I error control ### 4. Z-pooled Exact Unconditional Test (`'Z-pool'`) This test uses the Z-statistic but calculates exact p-values by considering all possible values of the nuisance parameter $\theta$ (the common success probability under the null hypothesis). **P-value formula:** $$p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(Z \geq z_{\text{obs}})$$ where under the null hypothesis $H_0: p_1 = p_2 = \theta$: $$P_{\theta}(Z \geq z_{\text{obs}}) = \sum_{(x_1,x_2): z(x_1,x_2) \geq z_{\text{obs}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}$$ The test statistic is: $$z(x_1,x_2) = \frac{\frac{x_1}{n_1} - \frac{x_2}{n_2}}{\sqrt{\frac{x_1+x_2}{n_1 n_2} \cdot \left(1 - \frac{x_1+x_2}{n_1+n_2}\right)}}$$ ```{r zpool_example} # Example: Z-pooled test power_zpool <- BinaryPower( p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = 'Z-pool' ) print(paste("Z-pooled test power:", round(power_zpool, 3))) ``` **Characteristics:** - Unconditional test - Good balance between power and conservatism - Computationally more intensive than conditional tests ### 5. Boschloo Exact Unconditional Test (`'Boschloo'`) The Boschloo test is the most powerful exact unconditional test. It maximizes the p-value over all possible values of the nuisance parameter, but uses the Fisher exact p-value as the test statistic. **P-value formula:** $$p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}})$$ where $p_{\text{Fisher}}(x_1, x_2)$ is the Fisher exact p-value for the observation $(x_1, x_2)$: $$p_{\text{Fisher}}(x_1, x_2) = P(X_1 \geq x_1 | X_1 + X_2 = x_1 + x_2)$$ Under the null hypothesis $H_0: p_1 = p_2 = \theta$: $$P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}}) = \sum_{\substack{(x_1,x_2): \\ p_{\text{Fisher}}(x_1,x_2) \leq p_{\text{Fisher,obs}}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}$$ ```{r boschloo_example} # Example: Boschloo test power_boschloo <- BinaryPower( p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = 'Boschloo' ) print(paste("Boschloo test power:", round(power_boschloo, 3))) ``` **Characteristics:** - Most powerful exact unconditional test - Maintains exact Type I error control - Computationally intensive - Optimal choice when computational resources allow ### Mathematical Relationships The key insight is that: 1. **Conditional tests** (Fisher, Fisher mid-p) condition on the total number of successes 2. **Unconditional tests** (Z-pool, Boschloo) maximize over the nuisance parameter $\theta$ 3. **Boschloo test** uses Fisher p-values as the ordering statistic, then maximizes over $\theta$ 4. **Z-pooled test** uses the Z-statistic as the ordering statistic, then maximizes over $\theta$ ### Test Comparison ```{r test_comparison} # Compare all five tests tests <- c('Chisq', 'Fisher', 'Fisher-midP', 'Z-pool', 'Boschloo') powers <- sapply(tests, function(test) { BinaryPower(p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = test) }) comparison_df <- data.frame( Test = tests, Power = round(powers, 4), Type = c("Asymptotic", "Conditional", "Conditional", "Unconditional", "Unconditional"), Conservatism = c("Moderate", "High", "Moderate", "Moderate", "Low") ) print(comparison_df) ``` ## BSSR Methodology ### Design Approaches #### 1. Restricted Design (`restricted = TRUE`) In the restricted design, the final sample size must be at least the originally planned sample size: $N_{\text{final}} \geq N_{\text{planned}}$ This approach is conservative and ensures that the study duration doesn't exceed the originally planned timeline. #### 2. Unrestricted Design (`restricted = FALSE`) The unrestricted design allows both increases and decreases in sample size based on the interim data: $N_{\text{final}} = \max(N_{\text{interim}}, N_{\text{recalculated}})$ This provides maximum flexibility but may extend or shorten the study duration. #### 3. Weighted Design (`weighted = TRUE`) The weighted approach uses a weighted average across all possible interim scenarios: $N_{\text{final}} = \max\left(N_{\text{scenario}}, \sum_{scenarios} w_h \cdot N_h\right)$ where $w_h$ are weights based on the probability of each interim scenario. ### BSSR Implementation Example ```{r bssr_detailed} # Detailed BSSR example with different approaches bssr_results_list <- list() # Restricted approach bssr_results_list[["Restricted"]] <- BinaryPowerBSSR( asmd.p1 = 0.45, asmd.p2 = 0.09, p = seq(0.1, 0.9, by = 0.1), Delta.A = 0.36, Delta.T = 0.36, N1 = 24, N2 = 24, omega = 0.5, r = 1, alpha = 0.025, tar.power = 0.8, Test = 'Z-pool', restricted = TRUE, weighted = FALSE ) %>% mutate(approach = "Restricted") # Unrestricted approach bssr_results_list[["Unrestricted"]] <- BinaryPowerBSSR( asmd.p1 = 0.45, asmd.p2 = 0.09, p = seq(0.1, 0.9, by = 0.1), Delta.A = 0.36, Delta.T = 0.36, N1 = 24, N2 = 24, omega = 0.5, r = 1, alpha = 0.025, tar.power = 0.8, Test = 'Z-pool', restricted = FALSE, weighted = FALSE ) %>% mutate(approach = "Unrestricted") # Weighted approach bssr_results_list[["Weighted"]] <- BinaryPowerBSSR( asmd.p1 = 0.45, asmd.p2 = 0.09, p = seq(0.1, 0.9, by = 0.1), Delta.A = 0.36, Delta.T = 0.36, N1 = 24, N2 = 24, omega = 0.5, r = 1, alpha = 0.025, tar.power = 0.8, Test = 'Z-pool', restricted = FALSE, weighted = TRUE ) %>% mutate(approach = "Weighted") # Combine results bssr_results <- do.call(rbind, bssr_results_list) # Summary statistics bssr_summary <- bssr_results %>% group_by(approach) %>% summarise( mean_power_bssr = mean(power.BSSR), mean_power_trad = mean(power.TRAD), min_power_bssr = min(power.BSSR), max_power_bssr = max(power.BSSR), .groups = 'drop' ) print(bssr_summary) ``` ## Power Calculations ### Traditional vs BSSR Power ```{r power_comparison_plot, fig.width=8, fig.height=10, out.width="100%"} # Create comprehensive power comparison with vertical layout power_data <- bssr_results %>% select(approach, p, power.BSSR, power.TRAD) %>% pivot_longer( cols = c(power.BSSR, power.TRAD), names_to = "design_type", values_to = "power" ) %>% mutate( design_type = case_when( design_type == "power.BSSR" ~ "BSSR", design_type == "power.TRAD" ~ "Traditional" ), approach = factor(approach, levels = c("Restricted", "Unrestricted", "Weighted")) ) ggplot(power_data, aes(x = p, y = power, color = design_type)) + geom_line(linewidth = 1.2) + facet_wrap(~approach, ncol = 1, scales = "free_y") + # Vertical layout geom_hline(yintercept = 0.8, linetype = "dashed", color = "gray") + scale_color_manual( values = c("BSSR" = "#1F78B4", "Traditional" = "#E31A1C"), name = "Design Type" ) + scale_x_continuous( breaks = seq(0.2, 0.8, by = 0.2), labels = c("0.2", "0.4", "0.6", "0.8") ) + scale_y_continuous( breaks = seq(0.7, 1.0, by = 0.1), labels = c("0.7", "0.8", "0.9", "1.0") ) + labs( x = "Pooled Response Rate (θ)", y = "Power", title = "Power Comparison: Traditional vs BSSR Designs", subtitle = "Horizontal dashed line shows target power = 0.8" ) + theme_minimal() + theme( plot.title = element_text(size = 14, hjust = 0.5, margin = margin(b = 5)), plot.subtitle = element_text(size = 11, hjust = 0.5, margin = margin(b = 15)), strip.text = element_text(size = 12, face = "bold", margin = margin(t = 8, b = 8)), strip.background = element_rect(fill = "gray95", color = "gray80"), legend.position = "bottom", legend.title = element_text(size = 11), legend.text = element_text(size = 10), legend.margin = margin(t = 10), axis.title.x = element_text(size = 11, margin = margin(t = 10)), axis.title.y = element_text(size = 11, margin = margin(r = 10)), axis.text = element_text(size = 9), panel.grid.minor = element_blank(), panel.grid.major = element_line(color = "gray92", linewidth = 0.5), plot.margin = margin(t = 10, r = 10, b = 10, l = 10) ) ``` ## Practical Implementation Guidelines ### Choosing the Right Test | Scenario | Recommended Test | Rationale | |----------|------------------|-----------| | Small samples (n < 30 per group) | Boschloo | Most powerful exact test | | Moderate samples (30-100 per group) | Z-pool | Good balance of power and computation | | Large samples (n > 100 per group) | Chisq | Asymptotically optimal, fast | | Regulatory submission | Fisher | Widely accepted, conservative | | Exploratory analysis | Fisher-midP | Less conservative than Fisher | ### Choosing the BSSR Approach | Priority | Recommended Approach | Rationale | |----------|---------------------|-----------| | Timeline certainty | Restricted | Guarantees study doesn't extend | | Statistical efficiency | Unrestricted | Optimal sample size adaptation | | Robust performance | Weighted | Consistent across scenarios | ### Sample Size Planning ```{r sample_size_planning} # Sample size planning example planning_scenarios <- expand.grid( p1 = c(0.4, 0.5, 0.6), p2 = c(0.2, 0.3), test = c('Fisher', 'Z-pool', 'Boschloo') ) %>% filter(p1 > p2) # Calculate sample sizes for each scenario sample_size_results <- list() for(i in 1:nrow(planning_scenarios)) { result <- BinarySampleSize( p1 = planning_scenarios$p1[i], p2 = planning_scenarios$p2[i], r = 1, alpha = 0.025, tar.power = 0.8, Test = planning_scenarios$test[i] ) sample_size_results[[i]] <- result } # Combine results final_results <- do.call(rbind, sample_size_results) final_results <- final_results[, c("p1", "p2", "Test", "N1", "N2", "N", "Power")] print(final_results) ``` ## Regulatory Considerations ### Type I Error Control All methods in `bbssr` maintain exact Type I error control: ```{r type1_error} # Demonstrate Type I error control under null hypothesis null_powers <- sapply(c('Fisher', 'Z-pool', 'Boschloo'), function(test) { BinaryPower(p1 = 0.3, p2 = 0.3, N1 = 30, N2 = 30, alpha = 0.025, Test = test) }) names(null_powers) <- c('Fisher', 'Z-pool', 'Boschloo') print("Type I error rates under null hypothesis:") print(round(null_powers, 4)) ``` All values should be ≤ 0.025, confirming exact Type I error control. ### Documentation Requirements For regulatory submissions, document: 1. **Rationale for BSSR**: Why adaptive design is appropriate 2. **Test selection**: Justification for chosen statistical test 3. **Design approach**: Restricted vs unrestricted rationale 4. **Simulation studies**: Demonstrate operating characteristics 5. **Implementation plan**: Detailed interim analysis procedures ## Advanced Topics ### Multiple Allocation Ratios ```{r allocation_ratios} # Compare different allocation ratios ratios <- c(1, 2, 3) ratio_results <- sapply(ratios, function(r) { result <- BinarySampleSize( p1 = 0.5, p2 = 0.3, r = r, alpha = 0.025, tar.power = 0.8, Test = 'Boschloo' ) c(N1 = result$N1, N2 = result$N2, N_total = result$N) }) colnames(ratio_results) <- paste0("r=", ratios) print("Sample sizes for different allocation ratios:") print(ratio_results) ``` ### Sensitivity Analysis ```{r sensitivity_analysis} # Sensitivity analysis for key parameters sensitivity_data <- expand.grid( omega = c(0.3, 0.5, 0.7), alpha = c(0.01, 0.025, 0.05) ) %>% rowwise() %>% mutate( avg_power = mean(BinaryPowerBSSR( asmd.p1 = 0.45, asmd.p2 = 0.09, p = seq(0.2, 0.8, by = 0.1), Delta.A = 0.36, Delta.T = 0.36, N1 = 24, N2 = 24, omega = omega, r = 1, alpha = alpha, tar.power = 0.8, Test = 'Z-pool', restricted = FALSE, weighted = FALSE )$power.BSSR) ) print("Sensitivity analysis results:") print(sensitivity_data) ``` ## Conclusion The `bbssr` package provides a comprehensive toolkit for implementing blinded sample size re-estimation in clinical trials with binary endpoints. The choice of statistical test and design approach should be based on: 1. **Sample size considerations** 2. **Regulatory requirements** 3. **Computational constraints** 4. **Risk tolerance** All methods maintain exact statistical validity while providing the flexibility needed for efficient clinical trial conduct.