--- title: "Single Binary Endpoint" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Single Binary Endpoint} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, fig.path = "figures/single-bin-" ) library(BayesianQDM) ``` ## Motivating Scenario We use a hypothetical Phase IIa proof-of-concept (PoC) trial in moderate-to-severe rheumatoid arthritis (RA). The investigational drug is compared with placebo in a 1:1 randomised controlled trial with $n_t = n_c = 12$ patients per group. Endpoint: ACR20 response rate (proportion of patients achieving $\ge 20\%$ improvement in ACR criteria). Clinically meaningful thresholds (posterior probability): - Target Value (TV) = 0.20 (treatment–control difference) - Minimum Acceptable Value (MAV) = 0.05 Null hypothesis threshold (predictive probability): $\theta_{\mathrm{NULL}} = 0.10$. Decision thresholds: $\gamma_{\mathrm{go}} = 0.80$ (Go), $\gamma_{\mathrm{nogo}} = 0.20$ (NoGo). Observed data (used in Sections 2–4): $y_t = 8$ responders out of $n_t = 12$ (treatment); $y_c = 3$ responders out of $n_c = 12$ (control). --- ## 1. Bayesian Model: Beta-Binomial Conjugate ### 1.1 Prior Distribution Let $\pi_j$ denote the true response rate in group $j$ ($j = t$: treatment, $j = c$: control). We model the number of responders as $$y_j \mid \pi_j \;\sim\; \mathrm{Binomial}(n_j,\, \pi_j),$$ and place independent Beta priors on each rate: $$\pi_j \;\sim\; \mathrm{Beta}(a_j,\, b_j),$$ where $a_j > 0$ and $b_j > 0$ are the shape parameters. The Jeffreys prior corresponds to $a_j = b_j = 0.5$, giving a weakly informative, symmetric prior on $(0, 1)$. The uniform prior corresponds to $a_j = b_j = 1$. ### 1.2 Posterior Distribution By conjugacy of the Beta-Binomial model, the posterior after observing $y_j$ responders among $n_j$ patients is $$\pi_j \mid y_j \;\sim\; \mathrm{Beta}(a_j^*,\, b_j^*),$$ where the posterior shape parameters are $$a_j^* = a_j + y_j, \qquad b_j^* = b_j + n_j - y_j.$$ The posterior mean and variance are $$E[\pi_j \mid y_j] = \frac{a_j^*}{a_j^* + b_j^*}, \qquad \mathrm{Var}(\pi_j \mid y_j) = \frac{a_j^*\, b_j^*}{(a_j^* + b_j^*)^2\,(a_j^* + b_j^* + 1)}.$$ ### 1.3 Posterior of the Treatment Effect The treatment effect is $\theta = \pi_t - \pi_c$. Since the two groups are independent, $\theta \mid y_t, y_c$ follows the distribution of the difference of two independent Beta random variables: $$\pi_t \mid y_t \;\sim\; \mathrm{Beta}(a_t^*,\, b_t^*), \qquad \pi_c \mid y_c \;\sim\; \mathrm{Beta}(a_c^*,\, b_c^*).$$ There is no closed-form CDF for $\theta = \pi_t - \pi_c$, so the posterior probability $$P(\theta > \theta_0 \mid y_t, y_c) = P(\pi_t - \pi_c > \theta_0 \mid y_t, y_c)$$ is evaluated by the convolution integral implemented in `pbetadiff()`: $$P(\pi_t - \pi_c > \theta_0) = \int_0^1 F_{\mathrm{Beta}(a_c^*, b_c^*)}\!\left(x - \theta_0\right)\, f_{\mathrm{Beta}(a_t^*, b_t^*)}\!(x)\; dx,$$ where $F_{\mathrm{Beta}(\alpha,\beta)}$ is the Beta CDF and $f_{\mathrm{Beta}(\alpha,\beta)}$ is the Beta PDF. This one-dimensional integral is evaluated by adaptive Gauss-Kronrod quadrature via `stats::integrate()`. --- ## 2. Posterior Predictive Probability ### 2.1 Beta-Binomial Predictive Distribution Let $\tilde{y}_j \mid \pi_j \sim \mathrm{Binomial}(m_j, \pi_j)$ be future trial counts with $m_j$ future patients. Integrating out $\pi_j$ over its Beta posterior gives the Beta-Binomial predictive distribution: $$P(\tilde{y}_j = k \mid y_j) = \binom{m_j}{k} \frac{B(a_j^* + k,\; b_j^* + m_j - k)}{B(a_j^*,\; b_j^*)}, \quad k = 0, 1, \ldots, m_j,$$ where $B(\cdot, \cdot)$ is the Beta function. ### 2.2 Posterior Predictive Probability of Future Success The posterior predictive probability that the future treatment difference exceeds $\theta_{\mathrm{NULL}}$ is $$P\!\left(\frac{\tilde{y}_t}{m_t} - \frac{\tilde{y}_c}{m_c} > \theta_{\mathrm{NULL}} \;\Big|\; y_t, y_c\right) = \sum_{k_t=0}^{m_t} \sum_{k_c=0}^{m_c} \mathbf{1}\!\left[\frac{k_t}{m_t} - \frac{k_c}{m_c} > \theta_{\mathrm{NULL}}\right]\, P(\tilde{y}_t = k_t \mid y_t)\, P(\tilde{y}_c = k_c \mid y_c).$$ This double sum over all $(m_t + 1)(m_c + 1)$ outcome combinations is evaluated exactly by `pbetabinomdiff()`. --- ## 3. Study Designs ### 3.1 Controlled Design Standard parallel-group RCT with observed data from both groups. The posterior shape parameters for group $j$ are $$a_j^* = a_j + y_j, \qquad b_j^* = b_j + n_j - y_j,$$ and the posterior probability is $P(\pi_t - \pi_c > \theta_0 \mid y_t, y_c)$ computed via `pbetadiff()`. Posterior probability at TV and MAV: ```{r ctrl-post} # P(pi_t - pi_c > TV | data) p_tv <- pbayespostpred1bin( prob = 'posterior', design = 'controlled', theta0 = 0.20, n_t = 12, n_c = 12, y_t = 8, y_c = 3, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) # P(pi_t - pi_c > MAV | data) p_mav <- pbayespostpred1bin( prob = 'posterior', design = 'controlled', theta0 = 0.05, n_t = 12, n_c = 12, y_t = 8, y_c = 3, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) cat(sprintf("P(theta > TV | data) = %.4f -> Go criterion (>= %.2f): %s\n", p_tv, 0.80, ifelse(p_tv >= 0.80, "YES", "NO"))) cat(sprintf("P(theta <= MAV | data) = %.4f -> NoGo criterion (>= %.2f): %s\n", 1 - p_mav, 0.20, ifelse((1 - p_mav) >= 0.20, "YES", "NO"))) cat(sprintf("Decision: %s\n", ifelse(p_tv >= 0.80 & (1 - p_mav) < 0.20, "Go", ifelse(p_tv < 0.80 & (1 - p_mav) >= 0.20, "NoGo", ifelse(p_tv >= 0.80 & (1 - p_mav) >= 0.20, "Miss", "Gray"))))) ``` Posterior predictive probability (future Phase III: $m_t = m_c = 40$): ```{r ctrl-pred} p_pred <- pbayespostpred1bin( prob = 'predictive', design = 'controlled', theta0 = 0.10, n_t = 12, n_c = 12, y_t = 8, y_c = 3, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = 40, m_c = 40, z = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) cat(sprintf("Predictive probability (m_t = m_c = 40) = %.4f\n", p_pred)) ``` Vectorised computation over all possible outcomes: The function accepts vectors for `y_t` and `y_c`, enabling efficient computation of the posterior probability for every possible outcome pair $(y_t, y_c) \in \{0,\ldots,n_t\} \times \{0,\ldots,n_c\}$. ```{r ctrl-vector} grid <- expand.grid(y_t = 0:12, y_c = 0:12) p_all <- pbayespostpred1bin( prob = 'posterior', design = 'controlled', theta0 = 0.20, n_t = 12, n_c = 12, y_t = grid$y_t, y_c = grid$y_c, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) # Show results for a selection of outcome pairs sel <- data.frame(y_t = grid$y_t, y_c = grid$y_c, P_gt_TV = round(p_all, 4)) head(sel[order(-sel$P_gt_TV), ], 10) ``` ### 3.2 Uncontrolled Design (Single-Arm) When no concurrent control group is enrolled, a hypothetical control responder count $z$ is specified from historical knowledge. The control posterior is then $$\pi_c \mid z \;\sim\; \mathrm{Beta}(a_c + z,\; b_c + n_c - z),$$ where $n_c$ is the hypothetical control sample size and $z$ is the assumed number of hypothetical responders ($0 \le z \le n_c$). Only the treatment group data $y_t$ are observed; $y_c$ is set to `NULL`. ```{r unctrl-post} # Hypothetical control: z = 2 responders out of n_c = 12 p_unctrl <- pbayespostpred1bin( prob = 'posterior', design = 'uncontrolled', theta0 = 0.20, n_t = 12, n_c = 12, y_t = 8, y_c = NULL, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = 2L, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) cat(sprintf("P(theta > TV | data, uncontrolled, z=2) = %.4f\n", p_unctrl)) ``` Effect of the hypothetical control count: varying $z$ shows the sensitivity to the assumed control rate. ```{r unctrl-z-sens} z_seq <- 0:12 p_z <- sapply(z_seq, function(z) { pbayespostpred1bin( prob = 'posterior', design = 'uncontrolled', theta0 = 0.20, n_t = 12, n_c = 12, y_t = 8, y_c = NULL, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = z, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, lower.tail = FALSE ) }) data.frame(z = z_seq, P_gt_TV = round(p_z, 4)) ``` ### 3.3 External Design (Power Prior) Historical or external data are incorporated via a power prior with borrowing weight $\alpha_{0ej} \in (0, 1]$. The augmented prior shape parameters are $$a_j^* = a_j + \alpha_{0ej}\, y_{ej}, \qquad b_j^* = b_j + \alpha_{0ej}\,(n_{ej} - y_{ej}),$$ where $n_{ej}$ and $y_{ej}$ are the external sample size and responder count for group $j$ (corresponding to `ne_t`/`ne_c`, `ye_t`/`ye_c`, and `alpha0e_t`/`alpha0e_c`). These serve as the prior for the current PoC data, yielding a closed-form posterior: $$\pi_j \mid y_j \;\sim\; \mathrm{Beta}\!\left(a_j^* + y_j,\; b_j^* + n_j - y_j\right).$$ Setting $\alpha_{0ej} = 1$ corresponds to full borrowing (the external data are treated as if they came from the current trial); $\alpha_{0ej} \to 0$ recovers the result from the current data alone with the original prior. Here, external data for both groups (external design): $n_{et} = 15$, $y_{et} = 5$, $n_{ec} = 15$, $y_{ec} = 4$, borrowing weights $\alpha_{0et} = \alpha_{0ec} = 0.5$: ```{r ext-post} p_ext <- pbayespostpred1bin( prob = 'posterior', design = 'external', theta0 = 0.20, n_t = 12, n_c = 12, y_t = 8, y_c = 3, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = NULL, ne_t = 15L, ne_c = 15L, ye_t = 5L, ye_c = 4L, alpha0e_t = 0.5, alpha0e_c = 0.5, lower.tail = FALSE ) cat(sprintf("P(theta > TV | data, external, alpha0e=0.5) = %.4f\n", p_ext)) ``` Effect of the borrowing weight: varying $\alpha_{0ec}$ (control group only) with fixed $\alpha_{0et} = 0.5$: ```{r ext-borrowing} # alpha0e must be in (0, 1]; avoid alpha0e = 0 ae_seq <- c(0.01, seq(0.1, 1.0, by = 0.1)) p_ae <- sapply(ae_seq, function(ae) { pbayespostpred1bin( prob = 'posterior', design = 'external', theta0 = 0.20, n_t = 12, n_c = 12, y_t = 8, y_c = 3, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, m_t = NULL, m_c = NULL, z = NULL, ne_t = 15L, ne_c = 15L, ye_t = 5L, ye_c = 4L, alpha0e_t = 0.5, alpha0e_c = ae, lower.tail = FALSE ) }) data.frame(alpha0e_c = ae_seq, P_gt_TV = round(p_ae, 4)) ``` --- ## 4. Operating Characteristics ### 4.1 Definition For given true response rates $(\pi_t, \pi_c)$, the operating characteristics are computed by exact enumeration over all $(n_t + 1)(n_c + 1)$ possible outcome pairs $(y_t, y_c) \in \{0,\ldots,n_t\} \times \{0,\ldots,n_c\}$: $$\Pr(\mathrm{Go}) = \sum_{y_t=0}^{n_t} \sum_{y_c=0}^{n_c} \mathbf{1}\!\left[g_{\mathrm{Go}}(y_t, y_c) \ge \gamma_{\mathrm{go}} \;\text{ and }\; g_{\mathrm{NoGo}}(y_t, y_c) < \gamma_{\mathrm{nogo}}\right]\, \binom{n_t}{y_t} \pi_t^{y_t}(1-\pi_t)^{n_t-y_t}\, \binom{n_c}{y_c} \pi_c^{y_c}(1-\pi_c)^{n_c-y_c},$$ where $$g_{\mathrm{Go}}(y_t, y_c) = P(\pi_t - \pi_c > \theta_{\mathrm{TV}} \mid y_t, y_c),$$ $$g_{\mathrm{NoGo}}(y_t, y_c) = P(\pi_t - \pi_c \le \theta_{\mathrm{MAV}} \mid y_t, y_c),$$ and the decision outcomes are classified as: - Go: $g_{\mathrm{Go}} \ge \gamma_{\mathrm{go}}$ AND $g_{\mathrm{NoGo}} < \gamma_{\mathrm{nogo}}$ - NoGo: $g_{\mathrm{Go}} < \gamma_{\mathrm{go}}$ AND $g_{\mathrm{NoGo}} \ge \gamma_{\mathrm{nogo}}$ - Miss: $g_{\mathrm{Go}} \ge \gamma_{\mathrm{go}}$ AND $g_{\mathrm{NoGo}} \ge \gamma_{\mathrm{nogo}}$ - Gray: neither Go nor NoGo criterion met ### 4.2 Example: Controlled Design, Posterior Probability ```{r oc-controlled, fig.width = 8, fig.height = 6} oc_ctrl <- pbayesdecisionprob1bin( prob = 'posterior', design = 'controlled', theta_TV = 0.30, theta_MAV = 0.15, theta_NULL = NULL, gamma_go = 0.80, gamma_nogo = 0.20, pi_t = seq(0.10, 0.80, by = 0.05), pi_c = rep(0.10, length(seq(0.10, 0.80, by = 0.05))), n_t = 12, n_c = 12, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, z = NULL, m_t = NULL, m_c = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, error_if_Miss = TRUE, Gray_inc_Miss = FALSE ) print(oc_ctrl) plot(oc_ctrl, base_size = 20) ``` The same function supports `design = 'uncontrolled'` (with argument `z`), `design = 'external'` (with power prior arguments `ne_t`, `ne_c`, `ye_t`, `ye_c`, `alpha0e_t`, `alpha0e_c`), and `prob = 'predictive'` (with future sample size arguments `m_t`, `m_c` and `theta_NULL`). The function signature and output format are identical across all combinations. --- ## 5. Optimal Threshold Search ### 5.1 Objective and Algorithm The `getgamma1bin()` function finds the optimal pair $(\gamma_{\mathrm{go}}^*, \gamma_{\mathrm{nogo}}^*)$ by grid search over $\Gamma = \{\gamma^{(1)}, \ldots, \gamma^{(K)}\} \subset (0, 1)$ at a null scenario $(\pi_t^{\mathrm{null}}, \pi_c^{\mathrm{null}})$. A two-stage precompute-then-sweep strategy is used: Stage 1 (Precompute): all posterior/predictive probabilities $g_{\mathrm{Go}}(y_t, y_c)$ and $g_{\mathrm{NoGo}}(y_t, y_c)$ are computed once over all $(n_t + 1)(n_c + 1)$ outcome pairs — independently of $\gamma$. Stage 2 (Sweep): for each candidate $\gamma \in \Gamma$, $\Pr(\mathrm{Go} \mid \gamma)$ and $\Pr(\mathrm{NoGo} \mid \gamma)$ are obtained as weighted sums of indicator functions: $$\Pr(\mathrm{Go} \mid \gamma) = \sum_{y_t, y_c} \mathbf{1}\!\left[g_{\mathrm{Go}}(y_t, y_c) \ge \gamma\right]\, w(y_t, y_c),$$ $$\Pr(\mathrm{NoGo} \mid \gamma) = \sum_{y_t, y_c} \mathbf{1}\!\left[g_{\mathrm{NoGo}}(y_t, y_c) \ge \gamma\right]\, w(y_t, y_c),$$ where $w(y_t, y_c) = \mathrm{Binom}(y_t; n_t, \pi_t)\,\mathrm{Binom}(y_c; n_c, \pi_c)$ are the binomial weights under the null scenario. No further calls to `pbayespostpred1bin()` are required at this stage. The sweep is vectorised using `outer()`, making the full grid search computationally negligible once Stage 1 is complete. The optimal $\gamma_{\mathrm{go}}^*$ is the smallest value in $\Gamma$ such that $\Pr(\mathrm{Go} \mid \gamma_{\mathrm{go}}^*)$ satisfies the criterion against a target (e.g., $\Pr(\mathrm{Go}) < 0.05$). Analogously for $\gamma_{\mathrm{nogo}}^*$. ### 5.2 Example: Controlled Design, Posterior Probability ```{r getgamma-ctrl, fig.width = 8, fig.height = 6} res_gamma <- getgamma1bin( prob = 'posterior', design = 'controlled', theta_TV = 0.30, theta_MAV = 0.15, theta_NULL = NULL, pi_t_go = 0.10, pi_c_go = 0.10, pi_t_nogo = 0.30, pi_c_nogo = 0.10, target_go = 0.05, target_nogo = 0.20, n_t = 12L, n_c = 12L, a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5, z = NULL, m_t = NULL, m_c = NULL, ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL, alpha0e_t = NULL, alpha0e_c = NULL, gamma_grid = seq(0.01, 0.99, by = 0.01) ) plot(res_gamma, base_size = 20) ``` The same function supports `design = 'uncontrolled'` (with `pi_t_go`, `pi_t_nogo`, and `z`), `design = 'external'` (with power prior arguments), and `prob = 'predictive'` (with `theta_NULL`, `m_t`, and `m_c`). The calibration plot and the returned `gamma_go`/`gamma_nogo` values are available for all combinations. --- ## 6. Summary This vignette covered single binary endpoint analysis in BayesianQDM: 1. Bayesian model: Beta-Binomial conjugate posterior with explicit update formulae for $a_j^* = a_j + y_j$ and $b_j^* = b_j + n_j - y_j$; Jeffreys prior ($a_j = b_j = 0.5$) as the recommended weakly informative default. 2. Posterior probability: evaluated via the convolution integral in `pbetadiff()` — no closed-form CDF exists for the Beta difference. 3. Posterior predictive probability: evaluated by exact enumeration of the Beta-Binomial double sum in `pbetabinomdiff()`. 4. Three study designs: controlled (both groups observed), uncontrolled (hypothetical control count $z$ fixed from prior knowledge), external design (power prior with borrowing weight $\alpha_{0ej}$, augmenting the prior by $\alpha_{0ej}$ times the external sufficient statistics). 5. Operating characteristics: computed by exact enumeration over all $(n_t+1)(n_c+1)$ outcome pairs with binomial probability weights — no simulation required for the binary endpoint. 6. Optimal threshold search: two-stage precompute-then-sweep strategy in `getgamma1bin()` using `outer()` for a fully vectorised gamma sweep. See `vignette("single-continuous")` for the analogous continuous endpoint analysis.