--- title: "Sampling Piecewise-Exponential Waiting Time Distributions" author: "Frans Willekens" date: "`r Sys.Date()`" output: pdf_document: number_sections: TRUE toc: TRUE toc_depth: 3 fig_caption: yes html_document: number_sections: TRUE df_print: paged fontsize: 12pt urlcolor: blue bibliography: "References.bib" vignette: | %\VignetteIndexEntry{Sampling Piecewise-Exponential Waiting Time Distributions} %\VignetteEncoding{UTF-8}" %\VignetteIndexEntry{Supplementary Materials} %\VignetteEngine{knitr::knitr} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center") knitr::opts_chunk$set(fig.pos = "H", out.extra = "") ## No scientific notation options(scipen=999) ``` # Introduction Life histories are sequences of transitions between states of existence [@aalen2008survival,@willekens2014]. In stochastic simulation of life histories, ages at transition are obtained by sampling time-to-event or waiting time distributions.The time-to-event is the duration between a reference event,e.g. birth, and the event of interest. In survival analysis, it is known as the survival time. A number of parametric models of survival times have been proposed in the literature. They include the exponential distribution, the Gompertz distribution and the Weibull distribution. The piecewise-exponential distribution is an extension of the exponential distribution. The exponential waiting time distribution has a single parameter, the constant rate of transition, hazard rate or incidence rate. In the piecewise-exponential model the transition rates are piecewise-constant. They are constant during time or age intervals and vary between intervals. Piecewise-constant transition rates are common in demography because transition rates are often published by age or age group. The piecewise-exponential waiting time distribution is the subject of this vignette. By sampling the distribution, individual waiting times can be generated. In the application, two states of existence are considered: alive and dead. The event of interest is death. The transition rate is the death rate. The multistate model, discussed in the vignette "Simulation of life histories", extends the two-state model to multiple states of existence. The basic approach to sampling waiting time distributions remains valid in the multistate model. The generic approach to generating waiting times consists of two steps [@rubinstein2017simulation]. In the first step, a random number is drawn from the standard uniform distribution U(0,1). The random draw is a value between zero and one, interpreted as a probability and denoted by u. The second step is to determine the duration at which the probability of a transition is u. The duration depends on the waiting time distribution. The vignette consists of two sections. The first is a brief theoretical background. The key concept is the inverse cumulative distribution function or quantile function. Concepts are illustrated using the exponential distribution and the piecewise-exponential distribution, and a few numerical illustrations should clarify the sampling of probability distributions. The second section is an application. Individual ages at death (lifespans) are generated from period age-specific death rates. The death rates are retrieved from the Human Mortality Database (HMD) (https://www.mortality.org). The retrieval of data from the HMD is described in the tutorial, which is the third vignette that comes with the package $VirtualPop$. The vignette requires the packages survival, eha, survminer,knitr, kableExtra, ggplot2. # Theoretical background: simulating waiting times ## Overview Let X be a continuous random variable denoting the waiting time to a transition (or time-to-transition) and let x be a realization of X. The cumulative distribution function (cdf) of X is $${F_X(x)=Pr\{X \le x\}=1-exp\left[-\int_0^x\mu(\tau) d\tau \right]=1-exp \left[-\Lambda(x) \right]}\tag{1} $$ with $\mu(\tau)$ the instantaneous hazard rate at duration $\tau$ and $\Lambda(x)=\int_0^x\mu(\tau) d\tau$ the cumulative or integrated hazard at x. The cumulative hazard depends on $\mu(\tau) \ \ \ \ \tau110] <- 110 binwidth <- (max(dd$x_D,na.rm=TRUE)-min(dd$x_D,na.rm=TRUE))/60 require (ggplot2) p <- ggplot() + geom_histogram(data=dd,aes(x=x_D,color=sex,fill=sex,y=after_stat(density)), alpha=0.5,position="dodge",binwidth=binwidth) xmin <- 0 xmax <- 110 p <- p + xlab("Age")+ylab("Density") p <- p + scale_x_continuous(breaks=seq(xmin,xmax,by=10)) p <- p + scale_y_continuous (breaks=seq(0,0.04,by=0.005)) p <- p + theme(legend.position = c(0.1, 0.85)) p ``` # Survival analysis with simulated data The simulated waiting times may be used for further analysis. To illustrate the process, let's estimate the survival function shown in Figure 3. The Kaplan-Meier estimator of the survival function and its 95 percent confidence interval are shown in Figure 6. The figure also shows the survival function obtained directly from the cumulative hazard: $S_{X}(x) = exp\left\lbrack - \Lambda(x) \right\rbrack$ and the exponential survival function with parameter the mean event rate $S_{X}(x) = exp\left\lbrack - \mu^{*}\ x \right\rbrack$ with $\mu^{*}$ the mean event rate (computed from the simulated waiting times). The code is ```{r test15} nsample <- 1000 breakpoints <- c(0, 10, 20, 30, 60) rates <- c(0.01,0.02,0.04,0.15) pw_sample <- VirtualPop::r.pw_exp (n=nsample, breakpoints, rates=rates) KM <- survival::survfit(survival::Surv(pw_sample)~1) ``` The survival function is KM\$surv and the standard error is KM\$std.err. For detailed results, type summary(KM) and str(KM). The code to produce Figure 6 is ```{r 2_test16,fig.cap="Piecewise-constant survival function"} max <- max(breakpoints) x <- seq(0,max, by=0.1) # Cumulative hazard H <- VirtualPop::H_pw(x, breakpoints,rates) dd <- data.frame(x=x,y=exp(-H)) # Plot KM estimator p <- survminer::ggsurvplot(KM,data=data.frame(pw_sample),conf.int = TRUE, ggtheme = theme_bw()) p <- p$plot + geom_step(data=dd,mapping=aes(x=x,y=y),linetype="dashed", color="blue") p <- p + xlab("Duration") + ylab("Survival function") p <- p + geom_text(x=45, y=0.8, label="KM estimate of survival function", color="red") p <- p + geom_text(x=45, y=0.7, label="Survival function",color="blue") p <- p + theme(legend.position="none") p ``` The simulated waiting time data may be used to estimate piecewise-constant hazard rates and reproduce the input rates. The package eha for event history analysis [@brostrom2021event] has a function that does the job. It is the function piecewise^[The function pch() of the eha package yields several distributions of the piecewise-constant hazards (pch) distribution: the density, distribution function, quantile function, cumulative hazard function, and random number generation.]. Let's use the waiting times generated by the rpexp function of the msm package. The code to compute the hazard rates is shown below and the estimated rates are shown in Table 2, together with the rates used in the simulation. ```{r 2_test17} breakp2 <- breakpoints[2:(length(breakpoints)-1)] x <- seq(0,max, by=0.2) ## waiting time = pw.sample or pw <- msm::rpexp(n=nsample,rate=rates,t=breakpoints[-length(breakpoints)]) pw_rates <- eha::piecewise (enter=rep(0,nsample),exit=pw, event=rep(1,nsample),cutpoints=breakp2) aa <-data.frame(From=breakpoints[-length(breakpoints)], To=breakpoints[-1], Events=pw_rates$events, Exposure=round(pw_rates$exposure,0), Rates_estimated=round(pw_rates$intensity,4), Rates_input=rates) out2 <- knitr::kable(aa, caption = paste0("Computation of hazard rates in piecewise-exponential", "model with simulated data"), format = 'latex', booktabs = T,linesep = "") kableExtra::kable_styling(out2,latex_options="HOLD_position") ``` The object $pw\_rates$ has three components: (a) the estimated number of events by duration category, (b) the estimated exposure time by duration category, and (c) the hazard rate by duration category. The input data (rates) are also added. The measures are shown in Table \ref{tab:2_test17}. The last column shows the piecewise-constant hazard rate function used as input to generate waiting times. # References{-}