% File apc/vignettes/CheckPrediction.Rnw % Part of the apc package, http://www.R-project.org % Copyright 2019 Bent Nielsen % Distributed under GPL 3 %\VignetteIndexEntry{Reproducing KN2020} \documentclass[a4paper,twoside,12pt]{article} \usepackage[english]{babel} \usepackage{booktabs,rotating,graphicx,amsmath,verbatim,fancyhdr,Sweave} \usepackage[colorlinks,linkcolor=red,urlcolor=blue]{hyperref} \newcommand{\R}{\textsf{\bf R}} \renewcommand{\topfraction}{0.95} \renewcommand{\bottomfraction}{0.95} \renewcommand{\textfraction}{0.1} \renewcommand{\floatpagefraction}{0.9} \DeclareGraphicsExtensions{.pdf,.jpg} \setcounter{secnumdepth}{1} \setcounter{tocdepth}{1} \oddsidemargin 1mm \evensidemargin 1mm \textwidth 160mm \textheight 230mm \topmargin -5mm \headheight 8mm \headsep 5mm \footskip 15mm \begin{document} \SweaveOpts{concordance=TRUE} \raggedleft \pagestyle{empty} \vspace*{0.1\textheight} \Huge {\bf Reproducing \\ Kuang and Nielsen \\ Generalized log normal Chain-Ladder \\ using the \texttt{apc} package} \noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex] \Large 23 November 2019 \vfill \normalsize \begin{tabular}{rl} Di Kuang & Lloyds of London \\ Bent Nielsen & Department of Economics, University of Oxford \\ & \small \& Nuffield College \\ & \small \& Institute for Economic Modelling \\ & \normalsize \texttt{bent.nielsen@nuffield.ox.ac.uk} \\ & \url{http://users.ox.ac.uk/~nuff0078} \end{tabular} \normalsize \newpage \raggedright \parindent 3ex \parskip 0ex \tableofcontents \cleardoublepage \setcounter{page}{1} \pagestyle{fancy} \renewcommand{\sectionmark}[1]{\markboth{\thesection #1}{\thesection \ #1}} \fancyhead[OL,ER]{\sl Reproducing Kuang and Nielsen.} %\fancyhead[ER]{\sl \rightmark} \fancyhead[EL,OR]{\bf \thepage} \fancyfoot{} \renewcommand{\headrulewidth}{0.1pt} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} The purpose of this vignette is to use the \texttt{apc} package version 1.3.5 to reproduce some the result in Kuang and Nielsen (2020): \textit{Generalized Log-Normal Chain-Ladder}. This adopts the theory presented in Harnau and Nielsen (2018), from an over-dispersed Poisson model to a log-normal model. There is also a vignette available for that paper. The \texttt{apc} package builds on the identification analysis and the forecast theory in Kuang, Nielsen and Nielsen (2008a,b), the development of deviance analysis for general data arrays in Nielsen (2014). The package is discussed in Nielsen (2015). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 1.1: The data} The data set is a reserving triangle from the XL group. It represents US casualty, gross paid and reported loss and allocated loss adjustment expense in 1000 USD. The data are available in the \texttt{apc} package. They can be called with the following command. Note that the output is wide and therefore truncated to fit the page width. <<>>= library(apc) data <- data.loss.XL() data$response[,1:8] @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 4.1: Analysis of variance} The deviance table can be reproduced by the following commands. The first call has the APC model as reference. The second call has the AC model as reference. For an overview of the models, see Nielsen (2014). The output is wide, so only selected columns are shown. <>= apc.fit.table(data,"log.normal.response")[,c(1,2,6,7)] apc.fit.table(data,"log.normal.response","AC")[,c(1,2,6,7)] @ Thus, Table 4.1 in the paper is constructed as follows. <>= table.APC <- apc.fit.table(data,"log.normal.response") table.AC <- apc.fit.table(data,"log.normal.response","AC") Table41 <- matrix(NA,nrow=3,ncol=6) Table41[1:3,1:2] <- table.APC[c(1,3,5),1:2] Table41[2:3,3:4] <- table.APC[c(3,5),6:7] Table41[3,5:6] <- table.AC[2,6:7] rownames(Table41) <- c("apc","ac","ad") colnames(Table41) <- c("-2logL","df","F_sup,apc","p","F_sup,ac","p") Table41 @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 4.2: Estimates} The table of estimates can be reproduced by the following commands. As a default the program gives the double differenced time effects. These are referred to as the canonical parameters. Note that in the \texttt{apc} package $\alpha$ is the age or development year $\beta$ is the period or calendar year, $\gamma$ is the cohort or policy year <>= fit <- apc.fit.model(data,"log.normal.response","AC") fit$coefficients.canonical @ The present age-cohort model has no calendar (period) effect, so also the first differences parameters are identified. They are found by an additional identification command. Thus, Table 4.2 in the paper is constructed from the following commands. <>= apc.identify(fit)$coefficients.dif[,1:2] fit$s2 fit$RSS @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 4.3: Forecasts} The forecasts are produced as follows. First the log normal forecasts use the fit found above. We want the 99.5\% quantile. The reserves are computed by policy year and for the entire lower triangle. Note that the \texttt{apc} also allows aggregation by development year and by calendar year as well as single cell forecasts. Aggregation by calendar year would give a cashflow. Further, there are three columns with standard errors: the overall standard error and breakdowns into process error and estimation error. <>= forecast <- apc.forecast.ac(fit,quantiles=0.995) forecast$response.forecast.coh forecast$response.forecast.all @ We compare with the standard chain ladder using the over-dispersed Poisson distribution forecasts of Harnau and Nielsen (2018). Here, the standard error has a third component, \texttt{tau.est}. <>= CL.fit <- apc.fit.model(data,"od.poisson.response","AC") CL.forecast <- apc.forecast.ac(CL.fit,quantiles=0.995) CL.forecast$response.forecast.coh CL.forecast$response.forecast.all @ We also look at the bootstrap forecast. This uses the bootstrap command in the \texttt{ChainLadder} package. That packages takes a cummulative triangle as input, which we can form in the \texttt{apc} package using the command \texttt{triangle.cummulative}. This command operates both on an \texttt{apc.data.list} and on a matrix. The output is a matrix. <>= m.cum <- triangle.cumulative(data) @ We then call the bootstrap command in the \texttt{ChainLadder} package. In the paper we have $10^5$ bootstrap repetitions. In the following code this is reduced to $10^3$ repetitions. Note, the bootstrap call does not appear to be fully stable and therefore it is commented out here and when building Table 4.3 below. <>= #library(ChainLadder) #BS <- BootChainLadder(m.cum, R = 10^3, process.distr=c("od.pois")) #summary(BS) @ The information from the above methods are now combined into Table 4.3. <>= Table_4_3 <- matrix(NA,nrow=20,ncol=9) rownames(Table_4_3) <- c(as.character(2:20),"total") col.3 <- c("Res","se/Res","99.5%/Res") colnames(Table_4_3) <- c(col.3,col.3,col.3) # Table 4.3, part I, log normal Chain Ladder forecast.coh <- forecast$response.forecast.coh forecast.all <- forecast$response.forecast.all Table_4_3[1:19,1] <- forecast.coh[,1] Table_4_3[1:19,2:3] <- forecast.coh[,c(2,5)]/forecast.coh[,1] Table_4_3[20,1] <- forecast.all[,1] Table_4_3[20,2:3] <- forecast.all[,c(2,5)]/forecast.all[,1] # Table 4.3, part II, standard Chain Ladder CL.forecast.coh <- CL.forecast$response.forecast.coh CL.forecast.all <- CL.forecast$response.forecast.all Table_4_3[1:19,4] <- CL.forecast.coh[,1] Table_4_3[1:19,5:6] <- CL.forecast.coh[,c(2,6)]/CL.forecast.coh[,1] Table_4_3[20,4] <- CL.forecast.all[,1] Table_4_3[20,5:6] <- CL.forecast.all[,c(2,6)]/CL.forecast.all[,1] # Table 4.3, part III, bootstrap #sum.bs.B <- summary(BS)$ByOrigin #sum.bs.T <- summary(BS)$Totals #qua.bs.B <- quantile(BS, c(0.995))$ByOrigin #qua.bs.T <- quantile(BS, c(0.995))$Totals #Table_4_3[1:19,7] <- sum.bs.B[2:20,3] #Table_4_3[1:19,8] <- sum.bs.B[2:20,4]/sum.bs.B[2:20,3] #Table_4_3[1:19,9] <- qua.bs.B[2:20,1]/sum.bs.B[2:20,3] #Table_4_3[20,7] <- sum.bs.T[3,] #Table_4_3[20,8] <- sum.bs.T[4,]/sum.bs.T[3,] #Table_4_3[20,9] <- qua.bs.T[1,1]/sum.bs.T[3,] #Table_4_3 @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 4.4: Forecasts} In this table the analysis is redone when dropping the last calendar (period) year and when dropping the two last calendar years. \texttt{apc} can extract subsets of the data with the following commands. <>= data.1 <- apc.data.list.subset(data,0,0,0,1,0,0) data.2 <- apc.data.list.subset(data,0,0,0,2,0,0) @ Table 4.4 is built up from 9 subpanels, but the last row with the bootstrap panels. We start by defining the 6 top panels, labelled a,b,...,f. <>= # Define panels Table_4_4a <- matrix(NA,nrow=6,ncol=3) Table_4_4a[1:5,1] <- 16:20 colnames(Table_4_4a) <- c("i","se/Res","99.5%/Res") Table_4_4b <- Table_4_4c <- Table_4_4d <- Table_4_4a Table_4_4b[1:5,1] <- 15:19 Table_4_4c[1:5,1] <- 14:18 Table_4_4e <- Table_4_4b Table_4_4f <- Table_4_4c @ We then fill in the forecast information similar to Table 4.3. <>= # Panel a. log normal Chain Ladder, no cut for.coh <- forecast$response.forecast.coh for.all <- forecast$response.forecast.all Table_4_4a[1:5,2:3] <- for.coh[15:19,c(2,5)]/for.coh[15:19,1] Table_4_4a[ 6,2:3] <- for.all[ ,c(2,5)]/for.all[ ,1] # Panel b. log normal Chain Ladder, cut 1 calendar year fit.1 <- apc.fit.model(data.1,"log.normal.response","AC") forecast.1 <- apc.forecast.ac(fit.1,quantiles=c(0.995)) for.1.coh <- forecast.1$response.forecast.coh for.1.all <- forecast.1$response.forecast.all Table_4_4b[1:5,2:3] <- for.1.coh[14:18,c(2,5)]/for.1.coh[14:18,1] Table_4_4b[ 6,2:3] <- for.1.all[ ,c(2,5)]/for.1.all[ ,1] # Panel c. log normal Chain Ladder, cut 2 calendar years fit.2 <- apc.fit.model(data.2,"log.normal.response","AC") forecast.2 <- apc.forecast.ac(fit.2,quantiles=c(0.995)) for.2.coh <- forecast.2$response.forecast.coh for.2.all <- forecast.2$response.forecast.all Table_4_4c[1:5,2:3] <- for.2.coh[13:17,c(2,5)]/for.2.coh[13:17,1] Table_4_4c[ 6,2:3] <- for.2.all[ ,c(2,5)]/for.2.all[ ,1] # Panel d. Standard Chain Ladder, no cut CL.for.coh <- CL.forecast$response.forecast.coh CL.for.all <- CL.forecast$response.forecast.all Table_4_4d[1:5,2:3] <- CL.for.coh[15:19,c(2,6)]/CL.for.coh[15:19,1] Table_4_4d[ 6,2:3] <- CL.for.all[ ,c(2,6)]/CL.for.all[ ,1] # Panel e. Standard Chain Ladder, cut 1 calendar year CL.fit.1 <- apc.fit.model(data.1,"od.poisson.response","AC") CL.forecast.1 <- apc.forecast.ac(CL.fit.1,quantiles=c(0.995)) CL.for.coh <- CL.forecast.1$response.forecast.coh CL.for.all <- CL.forecast.1$response.forecast.all Table_4_4e[1:5,2:3] <- CL.for.coh[14:18,c(2,6)]/CL.for.coh[14:18,1] Table_4_4e[ 6,2:3] <- CL.for.all[ ,c(2,6)]/CL.for.all[ ,1] # Panel f. Standard Chain Ladder, cut 2 calendar years CL.fit.2 <- apc.fit.model(data.2,"od.poisson.response","AC") CL.forecast.2 <- apc.forecast.ac(CL.fit.2,quantiles=c(0.995)) CL.for.coh <- CL.forecast.2$response.forecast.coh CL.for.all <- CL.forecast.2$response.forecast.all Table_4_4f[1:5,2:3] <- CL.for.coh[13:17,c(2,6)]/CL.for.coh[13:17,1] Table_4_4f[ 6,2:3] <- CL.for.all[ ,c(2,6)]/CL.for.all[ ,1] # Combine table rbind(cbind(Table_4_4a,Table_4_4b,Table_4_4c), cbind(Table_4_4d,Table_4_4e,Table_4_4f)) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Table 4.5: Bartlett tests} Here we have a variety of specification tests. They are not coded in \texttt{apc} as yet, so they require a bit of work. We start with Bartlett's test and define a function that can take 2 or 3 subsets. <>= Bartlett <- function(data,model.family,model.design,s.1,s.2,s.3=NULL) # data is an apc.data.list # s are subset indices, that is sets of six numbers { fit.model <- function(data,model.family,model.design,s) { data.s <- apc.data.list.subset(data,s[1],s[2],s[3],s[4],s[5],s[6], suppress.warning=TRUE) fit <- apc.fit.model(data.s,model.family,model.design) dev <- fit$deviance if(model.family=="log.normal.response") dev <- fit$RSS return(list(dev=dev, df=fit$df.residual)) } fit <- fit.model(data,model.family,model.design,s.1) dev <- fit$dev df <- fit$df fit <- fit.model(data,model.family,model.design,s.2) dev <- c(dev,fit$dev) df <- c( df,fit$df ) m <- 2 if(!is.null(s.3)) { fit <- fit.model(data,model.family,model.design,s.3) dev <- c(dev,fit$dev) df <- c( df,fit$df ) m <- 3 } dev.<- sum(dev) df.<- sum(df) LR <- df.*log(dev./df.)-sum(df*log(dev/df)) C <- 1+(1/3/(m-1))*(sum(1/df)-1/df.) t <- LR/C p <- pchisq(LR/C,m-1,0,FALSE) return(list(t=t,p=p)) } @ The next function is for testing the mean of subsets using F tests when the variance is common. <>= Ftest <- function(data,model.family,model.design,s.1,s.2,s.3=NULL) # data is an apc.data.list # s are subset indices, that is sets of six numbers { append <- function(data,model.design,s,v=NULL,d=NULL) { data.s <- apc.data.list.subset(data,s[1],s[2],s[3],s[4],s[5],s[6], suppress.warning=TRUE) index <- apc.get.index(data.s) v1 <- index$response[index$index.data] d1 <- apc.get.design(index,model.design)$design if(is.null(v)) v <- v1 else v <- c(v,v1) if(is.null(d)) d <- d1 else { d0 <- matrix(0,nrow(d),ncol(d1)) d10 <- matrix(0,nrow(d1),ncol(d)) d <- rbind(cbind(d,d0),cbind(d10,d1)) } return(list(v=v,d=d)) } a <- append(data,model.design,s.1) v <- a$v; d <- a$d a <- append(data,model.design,s.2,v,d) v <- a$v; d <- a$d if(!is.null(s.3)) { a <- append(data,model.design,s.3,v,d) v <- a$v; d <- a$d } fit.R <- apc.fit.model(data,model.family,model.design) if(model.family=="log.normal.response") { fit.R$deviance <- fit.R$RSS fit.U <- glm.fit(d,log(v),family=gaussian(link = "identity")) } if(model.family=="od.poisson.response") fit.U <- glm.fit(d,v,family=quasipoisson(link = "log")) dev.R <- fit.R$deviance dev.U <- fit.U$deviance df.R <- fit.R$df.residual df.U <- fit.U$df.residual F <- (dev.R-dev.U)/(df.R-df.U)/(dev.U/df.U) p <- pf(F,df.R-df.U,df.U,lower.tail=FALSE) # # reproducing typo # F <- (dev.R/df.R)/(dev.U/df.U) # p <- pf(F,df.R,df.U,lower.tail=FALSE) return(list(F=F,p=p)) } @ We now define Table 4.5 and input the results from Bartlett's test. <>= dim.names <- list(c("a","b","c"),c("LR/C","p","F","p","LR/C","p","F","p")) Table_4_5 <- matrix(NA,nrow=3,ncol=8,dimnames=dim.names) s1 <- c(0,0,0,0,0,14) s2 <- c(0,0,0,0,6,0) Bt <- Bartlett(data,"log.normal.response","AC",s1,s2) Ft <- Ftest(data,"log.normal.response","AC",s1,s2) Table_4_5[1,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p) Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2) Ft <- Ftest(data,"od.poisson.response","AC",s1,s2) Table_4_5[1,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p) s1 <- c(0,0,0,10,0,0) s2 <- c(0,0,10,0,0,10) s3 <- c(0,0,0,0,10,0) Bt <- Bartlett(data,"log.normal.response","AC",s1,s2,s3) Ft <- Ftest(data,"log.normal.response","AC",s1,s2,s3) Table_4_5[2,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p) Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2,s3) Ft <- Ftest(data,"od.poisson.response","AC",s1,s2,s3) Table_4_5[2,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p) s1 <- c(0,0,0 ,6,0,0) s2 <- c(0,0,14,0,0,0) Bt <- Bartlett(data,"log.normal.response","AC",s1,s2) Ft <- Ftest(data,"log.normal.response","AC",s1,s2) Table_4_5[3,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p) Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2) Ft <- Ftest(data,"od.poisson.response","AC",s1,s2) Table_4_5[3,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p) Table_4_5 @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section*{References} \begin{description} \item Harnau, J.\ and Nielsen, B.\ (2018) Asymptotic theory for over-dispersed age-period-cohort and extended chain-ladder models. \textit{Journal of the American Statistical Association} 113, 1722-1732 \textit{Download}: Earlier version: \url{https://www.nuffield.ox.ac.uk/economics/Papers/2017/HarnauNielsen2017apcDP.pdf} \item Kuang, D. and Nielsen, B. (2020) Generalized Log-Normal Chain-Ladder. \textit{Scandinavian Actuarial Journal} 2020, 553-576. \textit{Download}: Open access: \url{https://www.tandfonline.com/doi/full/10.1080/03461238.2019.1696885}. Earlier version: \url{https://www.nuffield.ox.ac.uk/economics/Papers/2018/2018W02_KuangNielsen2018GLNCL.pdf}. \item Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. \textit{Biometrika} 95, 979-986. \textit{Download}: Earlier version: \url{http://www.nuffield.ox.ac.uk/economics/papers/2007/w5/KuangNielsenNielsen07.pdf}. \item Kuang, D., Nielsen, B. and Nielsen, J.P. (2008b) Forecasting with the age-period-cohort model and the extended chain-ladder model. \textit{Biometrika} 95, 987-991. \textit{Download}: Earlier version: \url{http://www.nuffield.ox.ac.uk/economics/papers/2008/w9/KuangNielsenNielsen_Forecast.pdf}. \item Nielsen, B.\ (2014) Deviance analysis of age-period-cohort models. \textit{Download}: \url{http://www.nuffield.ox.ac.uk/economics/papers/2014/apc_deviance.pdf}. \item Nielsen, B.\ (2015) apc: An R package for age-period-cohort analysis. \textit{R Journal} 7, 52-64. \textit{Download}: \url{https://journal.r-project.org/archive/2015-2/nielsen.pdf}. \end{description} \end{document}