Load the package hce and check the version:
For citing the package run citation("hce") (Gasparyan 2025).
Hierarchical composite endpoints (HCE) are a class of endpoints that combine multiple clinical outcomes into a single composite while preserving the distinct nature of each outcome. In a particular case, outcomes within a fixed follow-up period are ranked by clinical importance, and the patient’s most important outcome is used for analysis. HCEs are typically analyzed using win odds and related win statistics (Gasparyan et al. 2023).
A straightforward example of an HCE is an endpoint defined on an
ordinal scale assessed at a specific timepoint. For instance, the
COVID-19 HCE uses an 8-category ordinal scale to evaluate physical
limitations in hospitalized patients with COVID-19 at 15 or 30 days
post-treatment. See the COVID-19 and COVID-19b
dataset ordinal outcomes as examples (Beigel et
al. 2020).
table(COVID19)
#> GROUP
#> TRTP 1 2 3 4 5 6 7 8
#> Active 34 95 28 58 38 14 117 157
#> Placebo 58 121 24 60 33 8 102 115Ssimple HCEs are implemented in the hce package via the
hce class, which inherits from data.frame.
These objects must include the ordinal analysis values in the
AVAL column and a two-level treatment group in the
TRTP column. The hce::summaryWO() function can
be used to provide the number of wins, losses, and ties by category.
From these counts, one can compute the probability of ties.
COVID19HCE <- hce(GROUP = COVID19$GROUP, TRTP = COVID19$TRTP)
SUM0 <- summaryWO(COVID19HCE, ref = "Placebo")
SUM <- SUM0$summary
SUM$Ptie <- round(SUM$TIE/SUM$TOTAL, 2)
SUM
#> TRTP WIN LOSS TIE TOTAL WR WO Ptie
#> 1 A 135744 97143 48974 281861 1.3973627 1.3173641 0.17
#> 2 P 97143 135744 48974 281861 0.7156338 0.7590916 0.17While the win odds and its 95% confidence interval are calculated as follows:
WO <- calcWO(COVID19HCE, ref = "Placebo")[, c("WO", "LCL", "UCL")]
round(WO, 2)
#> WO LCL UCL
#> 1 1.32 1.15 1.51A more complex case arises when multiple events per patient are observed. Consider two binary outcomes: death and hospitalization. Each patient can experience up to two events (e.g., hospitalization followed by death), and the HCE must account for the hierarchy between them (typically prioritizing death over hospitalization) when determining wins, losses, and ties.
set.seed(1)
n <- 100
dat0 <- data.frame(TRTP = rep(c("A", "P"), each = n),
DEATH = c(rbinom(n = n, size = 1, prob = 0.6),
rbinom(n = n, size = 1, prob = 0.8)),
HOSP = c(rbinom(n = n, size = 1, prob = 0.5),
rbinom(n = n, size = 1, prob = 0.8)))Based on these outcomes, several methods can be used to construct hierarchical composite endpoints. Below are two approaches with their implementations.
This can be implemented as:
dat1 <- dat0
dat1$AVAL <- ifelse(dat1$DEATH == 1, 1, ifelse(dat1$HOSP == 1, 2, 3))
dat1 <- as_hce(dat1)
summaryWO(dat1)$summary
#> TRTP WIN LOSS TIE TOTAL WR WO
#> 1 A 3900 1047 5053 10000 3.7249284 1.798377
#> 2 P 1047 3900 5053 10000 0.2684615 0.556057In this case, we can also derive a single patient-level categorical variable that encodes the above comparison logic. From worst to best, the categories are: patients with both hospitalization and death; patients with death without prior hospitalization; patients with hospitalization without death; and patients with no events. This can be implemented using the following ordering:
dat2 <- dat0
dat2$ORD <- 2*dat2$DEATH + dat2$HOSP
unique(dat2[, c("DEATH", "HOSP", "ORD")])
#> DEATH HOSP ORD
#> 1 1 0 2
#> 3 1 1 3
#> 4 0 0 0
#> 6 0 1 1The multiplicative factor of 2 is used purely for ordering and can be
replaced by any monotone transformation. Here, a higher ORD
indicates a more prioritized (worse) outcome—i.e., outcomes that would
cause the patient to lose in a pairwise comparison. Because the
hce package convention is that higher values denote better
outcomes, we simply reverse the ordering in ORD to obtain
the analysis variable AVAL:
dat2$AVAL <- max(dat2$ORD) - dat2$ORD
dat2 <- as_hce(dat2)
unique(dat2[, c("DEATH", "HOSP", "ORD", "AVAL")])
#> DEATH HOSP ORD AVAL
#> 1 1 0 2 1
#> 3 1 1 3 0
#> 4 0 0 0 3
#> 6 0 1 1 2
summaryWO(dat2)$summary
#> TRTP WIN LOSS TIE TOTAL WR WO
#> 1 A 6300 1421 2279 10000 4.4334975 2.9054872
#> 2 P 1421 6300 2279 10000 0.2255556 0.3441764The number of ties is more than halved, leading to a larger estimated treatment effect. This occurs because, in this case, the second method more efficiently breaks ties by sequentially comparing outcomes (death first, then hospitalization), reducing indeterminate pairings.
Here we provide examples of HCE using in clinical trials from different therapeutic areas. General considerations for creating HCEs can be found in Gasparyan et al. (2022), Little et al. (2023).
The DARE-19 (M. Kosiborod et al. 2021; M. N. Kosiborod et al. 2021) trial used an HCE to assess outcomes in patients hospitalized for COVID-19 and treated for 30 days. The COVID-19 HCE is presented below. It combines death, in hospital organ dysfunction events with clinical status at Day 30 for patients alive, still hospitalized but without previous organ dysfunction events, and hospital discharge as the most favorable outcome for patients discharging without organ dysfunction events and being alive at Day 30.
Below a higher category signifies a better outcome. Patients are
ranked into one and only one category based on their clinically most
severe event. For example, patients experiencing an in-hospital new or
worsening organ dysfunction event then dying will be included in the
category I.
#> Order Category
#> 1 I Death
#> 2 II More than one new or worsened organ dysfunction events
#> 3 III One new or worsened organ dysfunction event
#> 4 IV Hospitalized at the end of follow-up (Day 30)
#> 5 V Discharged from hospital before Day 30
Patients in the category I are compared using the timing
of the event, with an earlier event being a worse outcome (are assigned
a lower rank). Similarly, in the category III the timing of
the event is used for ranking patients within this category. In the
category II patients are compared using the number of
events with a higher number signifying a worse outcome. Patients in the
category IV - hospitalized at the end of follow-up without
previous worsening events - are further ranked according to oxygen
support requirements at the hospital (IV.1 on high flow
oxygen devices, IV.2 requiring supplemental oxygen,
IV.3 not requiring supplemental oxygen, with a higher rank
being a better outcome). Patients in the category V are
compared using the timing of the event, but, the hospital discharge
being a favorable outcome, here the earlier event signifies a better
outcome than the late event (reverse of the ranking in categories
I and III).
The kidney HCE defined in Heerspink et al. (2023) has the following ordinal outcomes (for the review of the topic see Little et al. (2023)).
#> Order Category
#> 1 I Death
#> 2 II Dialysis or kidney transplantation
#> 3 III Sustained GFR < 15 ml/min per 1.73 m2
#> 4 IV Sustained GFR decline from baseline of >= 57%
#> 5 V Sustained GFR decline from baseline of >= 50%
#> 6 VI Sustained GFR decline from baseline of >= 40%
#> 7 VII Individual GFR slope
The dataset KHCE contains data on a kidney HCE
outcomes
dat <- KHCE
Order <- c("Death (adj)", "Chronic dialysis (adj) >=90 days",
"Sustained eGFR<15 (mL/min/1.73 m2)", "Sustained >=57% decline in eGFR",
"Sustained >=50% decline in eGFR", "Sustained >=40% decline in eGFR", "eGFR slope")
dat$GROUP <- factor(dat$GROUP, levels = Order)
table(dat$GROUP, dat$TRTP)
#>
#> A P
#> Death (adj) 40 50
#> Chronic dialysis (adj) >=90 days 17 29
#> Sustained eGFR<15 (mL/min/1.73 m2) 16 28
#> Sustained >=57% decline in eGFR 2 9
#> Sustained >=50% decline in eGFR 7 22
#> Sustained >=40% decline in eGFR 36 34
#> eGFR slope 632 578This dataset is derived from ADSL which contains
baseline characteristics, ADLB laboratory measurements of
kidney function, and ADET for the time-to-event outcomes
with their timing. For the detailed derivation see the Technical
Appendix in Heerspink et al.
(2023).
In the Heart Failure population (see Kondo et al. (2023)) the following HCE was considered
#> Order Category
#> 1 I Cardiovascular death
#> 2 II Total (first and recurrent) HF hospitalizations
#> 3 III Total urgent HF visits
#> 4 IV Improvement/deterioration in KCCQ-TSS
To model dependent outcomes, several methods are available:
Joint Distribution Modeling Using Copulas: This method employs copulas to model the joint distribution of outcomes, capturing their dependence.
Random Frailty Modeling: This approach captures patient-level dependence between outcomes using a random frailty model.
Conditional Distribution Specification Through Multi-State Modeling: This technique uses multi-state models to describe the conditional distribution of outcomes.
Sklar’s theorem (Sklar 1959) shows that multivariate distribution functions can be expressed using a copula and univariate distributions. For a random vector \(X^d=(X_1,\cdots,X_d)\) with a multivariate distribution function \(H(x_1,\cdots,x_d),\) Sklar’s theorem states that there is a copula \(C(\cdot)\) such that: \[H(x_1,\cdots,x_d)=C(F_1(x_1),\cdots,F_d(x_d)),\] where each component \(X_j\) has the univariate distribution \(F_j.\) A copula is essentially a multivariate distribution function where each univariate marginal distribution is uniform, describing the dependency structure of the multivariate distribution function \(H(\cdot)\). To construct the multivariate distribution function, one combines each variable’s univariate distributions \(F_j\) with the copula.
If \(X_j\) has distribution function \(F_j,\) then \(U_j=F_j(X_j)\) is uniformly distributed, allowing random variables \(X_j\sim F_j\) to be simulated by generating uniform random variables \(U_j\) and applying the inverse transformation:
\[F_j^{-1}(y)=\inf\{x\in {\mathbf R}: \ \ F_j(x)\geq y\}, \ \ \inf\varnothing=\infty.\] Thus, if one has simulated a uniform random vector \(U^d=(U_1,\cdots, U_d)\) from the copula \(C(\cdot)\), the random vector \(X^d=(X_1,\cdots,X_d)\) can be simulated as: \[(X_1,\cdots,X_d)=(F_1^{-1}(U_1),\cdots,F_d^{-1} (U_d)).\] The main challenge remains in simulating from the given copula.
An Archimedean copula (Nelsen 2006) is one where:
\[C(u^d;\varphi)=\varphi(\varphi^{-1}(u_1)+\cdots+\varphi^{-1}(u_d)).\] The function \(\varphi:[0,+\infty]\rightarrow [0,1]\) is a generator - continuous, decreasing, with \(\varphi(0)=1\) and \(\lim_{t\rightarrow+\infty}\varphi(t)=0.\) When \(\varphi(t)=e^{-t^\theta},\ \ \theta>1,\) the copula is called a Gumbel copula.
By Bernstein’s theorem, completely monotone Archimedean generators coincide with Laplace-Stieltjes transforms of distribution functions \(F,\) determined by \(\varphi=LS[F].\) The Marshall-Olkin algorithm (Marshall and Olkin 1988) for sampling from an Archimedean copula involves:
The vector \(U=(U_1,\cdots, U_d)\) is then a random vector from the Archimedean copula with generator \(\varphi\). For the Gumbel copula, one needs to use the inverse Laplace-Stieltjes transform of a stable distribution (Nolan 2020; Hofert and Mächler 2011): \[F\sim S(1/\theta, 1, \cos^\theta(\pi/(2\theta)), {\mathbf I}_{\{\theta=1\}},1)\] modifying the first step to sample from a stable distribution.
Chambers-Mallows-Stuck method (Chambers, Mallows, and Stuck 1976) efficiently simulates stable variables with:
Generating independent uniform and exponential random variables \[\Theta\sim U\left[-\frac{\pi}{2},\frac{\pi}{2}\right] \text{ and } W\sim Exp(1).\]
Defining \(\alpha=1/\theta,\) setting \(b_{\tan}=\beta\tan\left(\frac{\alpha\pi}{2}\right),\) and \(\theta_0=\arctan(b_{\tan})/\alpha,\) with \[C_{\tan}=(1+b_{\tan}^2)^\frac{1}{2\alpha}.\]
Utilizing the transformations: \[Z(\theta) = \frac{\sin(a_0) C_{\tan}}{\cos(\Theta)^\frac{1}{\alpha}}\left(\frac{\cos(a_0-\Theta)}{W}\right)^\frac{1-\alpha}{\alpha}, \ \ a_0=\alpha(\Theta+\theta_0),\,\theta>1.\] \[Z(1)=\frac{2}{\pi}\left(\pi_\beta\tan(\Theta)-\beta\log\left(\frac{\pi}{2}W\frac{\cos(\Theta)}{\pi_\beta}\right)\right), \ \ \pi_\beta=\frac{\pi}{2}+\beta\Theta.\] Finally, \(\gamma Z + \delta\) has the desired distribution with \(\gamma =[\cos(\pi/(2\theta))]^\theta\) and \(\delta={\mathbf I}_{\{\theta=1\}}\) (and one needs to set \(\beta=1\)).
In the original Chambers-Mallows-Stuck formula, the term \[\frac{1}{[\cos(\alpha\theta_0)\cos(\Theta)]^\frac{1}{\alpha}}\]
is replaced by \(\frac{C_{\tan}}{[\cos(\Theta)]^\frac{1}{\alpha}},\)
as suggested by the copula package (Hofert et al. 2025), which is based on the fact
that \(C_{\tan} =
1/(\cos(\alpha\theta_0))^{1/\alpha}.\) Indeed, one needs to show
that
\[(1+b_{\tan}^2)^\frac{1}{2\alpha}=1/(\cos(\alpha\theta_0))^{1/\alpha},\] which is equivalent to showing that \[1+\left[\beta\tan\left(\frac{\alpha\pi}{2}\right)\right]^2=\frac{1}{[\cos(\alpha\theta_0)]^2}.\] And this is true because of the trigonometric identity \[1+[\tan(y)]^2=\frac{1}{[\cos(y)]^2}.\] Here we have set \(y=\alpha\theta_0=\arctan(b_{\tan})=\arctan\left(\beta\tan\left(\frac{\alpha\pi}{2}\right)\right)\) and hence \(\tan(y)=\beta\tan\left(\frac{\alpha\pi}{2}\right).\)
The function simHCE() provides the implementation above
with the argument theta specifying the dependence of
outcomes.
Rates_A <- c(10, 20)
Rates_P <- c(20, 20)
dat1 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P,
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1, seed = 1)
dat2 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P,
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1.0001, seed = 1)
dat3 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P,
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 10, seed = 1)
calcWO(dat1)
#> WO LCL UCL SE WOnull alpha Pvalue WP LCL_WP
#> 1 1.483251 1.38986 1.582918 0.03318096 1 0.05 0 0.5973021 0.5816594
#> UCL_WP SE_WP SD_WP N
#> 1 0.6129447 0.007981092 0.5643484 5000
calcWO(dat2)
#> WO LCL UCL SE WOnull alpha Pvalue WP LCL_WP
#> 1 1.52306 1.426936 1.625659 0.03326189 1 0.05 0 0.6036558 0.5880583
#> UCL_WP SE_WP SD_WP N
#> 1 0.6192534 0.00795809 0.5627219 5000
calcWO(dat3)
#> WO LCL UCL SE WOnull alpha Pvalue WP LCL_WP
#> 1 1.387231 1.299852 1.480482 0.03319378 1 0.05 0 0.5811046 0.5652679
#> UCL_WP SE_WP SD_WP N
#> 1 0.5969413 0.008080097 0.5713491 5000