% File apc/vignettes/CheckPrediction.Rnw
% Part of the apc package, http://www.R-project.org
% Copyright 2019 Bent Nielsen
% Distributed under GPL 3

%\VignetteIndexEntry{Reproducing KN2020}

\documentclass[a4paper,twoside,12pt]{article}

\usepackage[english]{babel}
\usepackage{booktabs,rotating,graphicx,amsmath,verbatim,fancyhdr,Sweave}
\usepackage[colorlinks,linkcolor=red,urlcolor=blue]{hyperref}
\newcommand{\R}{\textsf{\bf R}}
\renewcommand{\topfraction}{0.95}
\renewcommand{\bottomfraction}{0.95}
\renewcommand{\textfraction}{0.1}
\renewcommand{\floatpagefraction}{0.9}
\DeclareGraphicsExtensions{.pdf,.jpg}
\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}

\oddsidemargin 1mm
\evensidemargin 1mm
\textwidth 160mm
\textheight 230mm
\topmargin -5mm
\headheight 8mm
\headsep 5mm
\footskip 15mm

\begin{document}
\SweaveOpts{concordance=TRUE}

\raggedleft
\pagestyle{empty}
\vspace*{0.1\textheight}
\Huge
{\bf Reproducing \\ Kuang and Nielsen \\ Generalized log normal Chain-Ladder \\ using the \texttt{apc} package}
\noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex]
\Large
23 November 2019
\vfill
\normalsize
\begin{tabular}{rl}
 Di Kuang      & Lloyds of London \\
 Bent Nielsen  & Department of Economics, University of Oxford \\
               & \small \& Nuffield College \\
               & \small \& Institute for Economic Modelling \\
               & \normalsize \texttt{bent.nielsen@nuffield.ox.ac.uk} \\
               & \url{http://users.ox.ac.uk/~nuff0078}
\end{tabular}
\normalsize
\newpage
\raggedright
\parindent 3ex
\parskip 0ex
\tableofcontents
\cleardoublepage
\setcounter{page}{1}
\pagestyle{fancy}
\renewcommand{\sectionmark}[1]{\markboth{\thesection #1}{\thesection \ #1}}
\fancyhead[OL,ER]{\sl Reproducing Kuang and Nielsen.}
%\fancyhead[ER]{\sl \rightmark}
\fancyhead[EL,OR]{\bf \thepage}
\fancyfoot{}
\renewcommand{\headrulewidth}{0.1pt}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

The purpose of this vignette is to
use the
\texttt{apc} package
version 1.3.5
to reproduce some the result in
Kuang and Nielsen (2020):
\textit{Generalized Log-Normal Chain-Ladder}.
This adopts the theory presented in
Harnau and Nielsen (2018),
from an over-dispersed Poisson model to a log-normal model.
There is also a vignette available for that paper.
The 
\texttt{apc} package
builds on the identification analysis and the forecast theory in
Kuang, Nielsen and Nielsen (2008a,b),
the development of deviance analysis for general data arrays in
Nielsen (2014).
The package is discussed in
Nielsen (2015).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 1.1: The data}

The data set is a reserving triangle from the XL group.
It represents US casualty, gross paid and reported loss and allocated loss
adjustment expense in 1000 USD.

The data are available in the 
\texttt{apc} package.
They can be called with the following command. Note that the output is wide and therefore truncated to fit the page width.
<<>>=
library(apc)
data <- data.loss.XL()
data$response[,1:8]
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 4.1: Analysis of variance}

The deviance table can be reproduced by the following commands.
The first call has the APC model as reference.
The second call has the AC model as reference.
For an overview of the models, see
Nielsen (2014).
The output is wide, so only selected columns are shown.
<<result=tex>>=
apc.fit.table(data,"log.normal.response")[,c(1,2,6,7)]
apc.fit.table(data,"log.normal.response","AC")[,c(1,2,6,7)]
@

Thus, Table 4.1 in the paper is constructed as follows.
<<result=tex>>=
table.APC <- apc.fit.table(data,"log.normal.response")
table.AC  <- apc.fit.table(data,"log.normal.response","AC")
Table41 <- matrix(NA,nrow=3,ncol=6)
Table41[1:3,1:2]  <- table.APC[c(1,3,5),1:2]
Table41[2:3,3:4]  <- table.APC[c(3,5),6:7]
Table41[3,5:6]    <- table.AC[2,6:7]
rownames(Table41) <- c("apc","ac","ad")
colnames(Table41) <- c("-2logL","df","F_sup,apc","p","F_sup,ac","p")
Table41
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 4.2: Estimates}

The table of estimates can be reproduced by the following commands.
As a default the program gives the double differenced time effects.
These are referred to as the canonical parameters.
Note that in the \texttt{apc} package
$\alpha$ is the age or development year
$\beta$ is the period or calendar year,
$\gamma$ is the cohort or policy year

<<result=tex>>=
fit <- apc.fit.model(data,"log.normal.response","AC")
fit$coefficients.canonical
@

The present age-cohort model has no calendar (period) effect,
so also the first differences parameters are identified. They
are found by an additional identification command.
Thus, Table 4.2 in the paper is constructed from the following commands.

<<result=tex>>=
apc.identify(fit)$coefficients.dif[,1:2]
fit$s2
fit$RSS
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 4.3: Forecasts}

The forecasts are produced as follows. First the log normal forecasts use
the fit found above. We want the 99.5\% quantile. The reserves are
computed by policy year and for the entire lower triangle.
Note that the \texttt{apc} also allows aggregation by
development year and by calendar year as well as single cell forecasts.
Aggregation by calendar year would give a cashflow.
Further, there are three columns with standard errors:
the overall standard error and breakdowns into process error
and estimation error.

<<result=tex>>=
forecast    <- apc.forecast.ac(fit,quantiles=0.995)
forecast$response.forecast.coh
forecast$response.forecast.all
@

We compare with the standard chain ladder using the over-dispersed
Poisson distribution forecasts of Harnau and Nielsen (2018).
Here, the standard error has a third component, \texttt{tau.est}.

<<result=tex>>=
CL.fit <- apc.fit.model(data,"od.poisson.response","AC")
CL.forecast    <- apc.forecast.ac(CL.fit,quantiles=0.995)
CL.forecast$response.forecast.coh
CL.forecast$response.forecast.all
@

We also look at the bootstrap forecast. This uses the bootstrap command in the
\texttt{ChainLadder} package.
That packages takes a cummulative triangle as input, which we can form in the
\texttt{apc} package using the command
\texttt{triangle.cummulative}.
This command operates both on an \texttt{apc.data.list} and on a matrix.
The output is a matrix.

<<result=tex>>=
m.cum   <- triangle.cumulative(data)
@

We then call the bootstrap command in the
\texttt{ChainLadder} package.
In the paper we have $10^5$ bootstrap repetitions.
In the following code this is reduced
to $10^3$ repetitions.

Note, the bootstrap call does not appear to be fully stable and therefore it is commented out
here and when building Table 4.3 below.

<<result=tex>>=
#library(ChainLadder)
#BS <- BootChainLadder(m.cum, R = 10^3, process.distr=c("od.pois"))
#summary(BS)
@

The information from the above methods are now combined into Table 4.3.

<<result=tex>>=
Table_4_3   <- matrix(NA,nrow=20,ncol=9)
rownames(Table_4_3) <- c(as.character(2:20),"total")
col.3   <- c("Res","se/Res","99.5%/Res")
colnames(Table_4_3) <- c(col.3,col.3,col.3)
#   Table 4.3, part I, log normal Chain Ladder
forecast.coh  <- forecast$response.forecast.coh
forecast.all  <- forecast$response.forecast.all
Table_4_3[1:19,1]   <- forecast.coh[,1]
Table_4_3[1:19,2:3] <- forecast.coh[,c(2,5)]/forecast.coh[,1]
Table_4_3[20,1]     <- forecast.all[,1]
Table_4_3[20,2:3]   <- forecast.all[,c(2,5)]/forecast.all[,1]
#   Table 4.3, part II, standard Chain Ladder
CL.forecast.coh <- CL.forecast$response.forecast.coh 
CL.forecast.all <- CL.forecast$response.forecast.all
Table_4_3[1:19,4]   <- CL.forecast.coh[,1]
Table_4_3[1:19,5:6] <- CL.forecast.coh[,c(2,6)]/CL.forecast.coh[,1]
Table_4_3[20,4]     <- CL.forecast.all[,1]         
Table_4_3[20,5:6]   <- CL.forecast.all[,c(2,6)]/CL.forecast.all[,1]
#   Table 4.3, part III, bootstrap
#sum.bs.B <- summary(BS)$ByOrigin
#sum.bs.T <- summary(BS)$Totals
#qua.bs.B <- quantile(BS, c(0.995))$ByOrigin
#qua.bs.T <- quantile(BS, c(0.995))$Totals
#Table_4_3[1:19,7]   <- sum.bs.B[2:20,3]
#Table_4_3[1:19,8]   <- sum.bs.B[2:20,4]/sum.bs.B[2:20,3]
#Table_4_3[1:19,9]   <- qua.bs.B[2:20,1]/sum.bs.B[2:20,3]
#Table_4_3[20,7]     <- sum.bs.T[3,]
#Table_4_3[20,8]     <- sum.bs.T[4,]/sum.bs.T[3,]
#Table_4_3[20,9]     <- qua.bs.T[1,1]/sum.bs.T[3,]
#Table_4_3
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 4.4: Forecasts}

In this table the analysis is redone when dropping the last calendar (period) year
and when dropping the two last calendar years.
\texttt{apc} can extract subsets of the data with the following commands.

<<result=tex>>=
data.1      <- apc.data.list.subset(data,0,0,0,1,0,0)
data.2      <- apc.data.list.subset(data,0,0,0,2,0,0)
@

Table 4.4 is built up from 9 subpanels, but the last row
with the bootstrap panels.  We start by defining the 6 top
panels, labelled a,b,...,f.

<<result=tex>>=
#   Define panels
Table_4_4a <- matrix(NA,nrow=6,ncol=3)
Table_4_4a[1:5,1] <- 16:20
colnames(Table_4_4a) <- c("i","se/Res","99.5%/Res")
Table_4_4b <- Table_4_4c <- Table_4_4d <- Table_4_4a
Table_4_4b[1:5,1] <- 15:19
Table_4_4c[1:5,1] <- 14:18
Table_4_4e <- Table_4_4b
Table_4_4f <- Table_4_4c
@

We then fill in the forecast information similar to Table 4.3.

<<result=tex>>=
#   Panel a. log normal Chain Ladder, no cut
for.coh <- forecast$response.forecast.coh
for.all <- forecast$response.forecast.all
Table_4_4a[1:5,2:3] <- for.coh[15:19,c(2,5)]/for.coh[15:19,1]
Table_4_4a[  6,2:3] <- for.all[     ,c(2,5)]/for.all[     ,1]
#   Panel b. log normal Chain Ladder, cut 1 calendar year
fit.1 <- apc.fit.model(data.1,"log.normal.response","AC")
forecast.1  <-  apc.forecast.ac(fit.1,quantiles=c(0.995))
for.1.coh <- forecast.1$response.forecast.coh
for.1.all <- forecast.1$response.forecast.all
Table_4_4b[1:5,2:3] <- for.1.coh[14:18,c(2,5)]/for.1.coh[14:18,1]
Table_4_4b[  6,2:3] <- for.1.all[     ,c(2,5)]/for.1.all[     ,1]
#   Panel c. log normal Chain Ladder, cut 2 calendar years
fit.2 <- apc.fit.model(data.2,"log.normal.response","AC")
forecast.2  <-  apc.forecast.ac(fit.2,quantiles=c(0.995))
for.2.coh <- forecast.2$response.forecast.coh
for.2.all <- forecast.2$response.forecast.all
Table_4_4c[1:5,2:3] <- for.2.coh[13:17,c(2,5)]/for.2.coh[13:17,1]
Table_4_4c[  6,2:3] <- for.2.all[     ,c(2,5)]/for.2.all[     ,1]
#   Panel d. Standard Chain Ladder, no cut
CL.for.coh <- CL.forecast$response.forecast.coh
CL.for.all <- CL.forecast$response.forecast.all
Table_4_4d[1:5,2:3] <- CL.for.coh[15:19,c(2,6)]/CL.for.coh[15:19,1]
Table_4_4d[  6,2:3] <- CL.for.all[     ,c(2,6)]/CL.for.all[     ,1]
#   Panel e. Standard Chain Ladder, cut 1 calendar year
CL.fit.1        <- apc.fit.model(data.1,"od.poisson.response","AC")
CL.forecast.1   <- apc.forecast.ac(CL.fit.1,quantiles=c(0.995))
CL.for.coh <- CL.forecast.1$response.forecast.coh
CL.for.all <- CL.forecast.1$response.forecast.all
Table_4_4e[1:5,2:3] <- CL.for.coh[14:18,c(2,6)]/CL.for.coh[14:18,1]
Table_4_4e[  6,2:3] <- CL.for.all[     ,c(2,6)]/CL.for.all[     ,1]
#   Panel f. Standard Chain Ladder, cut 2 calendar years
CL.fit.2        <- apc.fit.model(data.2,"od.poisson.response","AC")
CL.forecast.2   <- apc.forecast.ac(CL.fit.2,quantiles=c(0.995))
CL.for.coh <- CL.forecast.2$response.forecast.coh
CL.for.all <- CL.forecast.2$response.forecast.all
Table_4_4f[1:5,2:3] <- CL.for.coh[13:17,c(2,6)]/CL.for.coh[13:17,1]
Table_4_4f[  6,2:3] <- CL.for.all[     ,c(2,6)]/CL.for.all[     ,1]
#   Combine table
rbind(cbind(Table_4_4a,Table_4_4b,Table_4_4c),
      cbind(Table_4_4d,Table_4_4e,Table_4_4f))
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Table 4.5: Bartlett tests}

Here we have a variety of specification tests.  They are not coded in
\texttt{apc} as yet, so they require a bit of work.
We start with Bartlett's test and define a function that can take 2 or 3 subsets.

<<result=tex>>=
Bartlett <- function(data,model.family,model.design,s.1,s.2,s.3=NULL)
#   data is an apc.data.list
#   s are subset indices, that is sets of six numbers
{   fit.model <- function(data,model.family,model.design,s)
    {   data.s <- apc.data.list.subset(data,s[1],s[2],s[3],s[4],s[5],s[6],
                        suppress.warning=TRUE)
        fit <- apc.fit.model(data.s,model.family,model.design)
        dev <- fit$deviance
        if(model.family=="log.normal.response")
            dev <- fit$RSS
        return(list(dev=dev, df=fit$df.residual))    
    }
    fit <- fit.model(data,model.family,model.design,s.1)    
    dev <- fit$dev
     df <- fit$df
    fit <- fit.model(data,model.family,model.design,s.2)
    dev <- c(dev,fit$dev)
     df <- c( df,fit$df )
    m <- 2 
    if(!is.null(s.3))
    {   fit <- fit.model(data,model.family,model.design,s.3)
        dev <- c(dev,fit$dev)
         df <- c( df,fit$df )
        m <- 3
    }     
    dev.<- sum(dev)
     df.<- sum(df)
    LR  <- df.*log(dev./df.)-sum(df*log(dev/df))
    C   <- 1+(1/3/(m-1))*(sum(1/df)-1/df.)
    t   <- LR/C
    p   <- pchisq(LR/C,m-1,0,FALSE)
    return(list(t=t,p=p))
}
@

The next function is for testing the mean of subsets using F tests when the variance is common.
<<result=tex>>=
Ftest <- function(data,model.family,model.design,s.1,s.2,s.3=NULL)
#   data is an apc.data.list
#   s are subset indices, that is sets of six numbers
{   append <- function(data,model.design,s,v=NULL,d=NULL)
    {   data.s <- apc.data.list.subset(data,s[1],s[2],s[3],s[4],s[5],s[6],
                        suppress.warning=TRUE)
        index  <- apc.get.index(data.s)
        v1 <- index$response[index$index.data]
        d1 <- apc.get.design(index,model.design)$design
        if(is.null(v))  v <- v1 
        else            v <- c(v,v1) 
        if(is.null(d))  d <- d1 
        else
        {   d0 <- matrix(0,nrow(d),ncol(d1))
            d10 <- matrix(0,nrow(d1),ncol(d))
            d <- rbind(cbind(d,d0),cbind(d10,d1))
        }
        return(list(v=v,d=d))
    }
    a <- append(data,model.design,s.1)
    v <- a$v; d <- a$d
    a <- append(data,model.design,s.2,v,d)
    v <- a$v; d <- a$d
    if(!is.null(s.3))
    {   a <- append(data,model.design,s.3,v,d)
        v <- a$v; d <- a$d  }
    fit.R <- apc.fit.model(data,model.family,model.design)    
    if(model.family=="log.normal.response")
    {   fit.R$deviance <- fit.R$RSS
        fit.U <- glm.fit(d,log(v),family=gaussian(link = "identity"))
    }    
    if(model.family=="od.poisson.response")
        fit.U <- glm.fit(d,v,family=quasipoisson(link = "log"))
    dev.R <- fit.R$deviance
    dev.U <- fit.U$deviance
     df.R <- fit.R$df.residual
     df.U <- fit.U$df.residual        
    F <- (dev.R-dev.U)/(df.R-df.U)/(dev.U/df.U)
    p <- pf(F,df.R-df.U,df.U,lower.tail=FALSE)
#    #   reproducing typo
#    F <- (dev.R/df.R)/(dev.U/df.U)
#    p <- pf(F,df.R,df.U,lower.tail=FALSE)    
    return(list(F=F,p=p))
}
@

We now define Table 4.5
and input the results from Bartlett's test.
<<result=tex>>=
dim.names <- list(c("a","b","c"),c("LR/C","p","F","p","LR/C","p","F","p"))
Table_4_5 <- matrix(NA,nrow=3,ncol=8,dimnames=dim.names)
s1 <- c(0,0,0,0,0,14)
s2 <- c(0,0,0,0,6,0)
Bt <- Bartlett(data,"log.normal.response","AC",s1,s2)
Ft <-    Ftest(data,"log.normal.response","AC",s1,s2)
Table_4_5[1,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p)
Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2)
Ft <-    Ftest(data,"od.poisson.response","AC",s1,s2)
Table_4_5[1,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p)
s1 <- c(0,0,0,10,0,0)
s2 <- c(0,0,10,0,0,10)
s3 <- c(0,0,0,0,10,0)
Bt <- Bartlett(data,"log.normal.response","AC",s1,s2,s3)
Ft <-    Ftest(data,"log.normal.response","AC",s1,s2,s3)
Table_4_5[2,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p)
Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2,s3)
Ft <-    Ftest(data,"od.poisson.response","AC",s1,s2,s3)
Table_4_5[2,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p)
s1 <- c(0,0,0 ,6,0,0)
s2 <- c(0,0,14,0,0,0)
Bt <- Bartlett(data,"log.normal.response","AC",s1,s2)
Ft <-    Ftest(data,"log.normal.response","AC",s1,s2)
Table_4_5[3,1:4] <- c(Bt$t,Bt$p,Ft$F,Ft$p)           
Bt <- Bartlett(data,"od.poisson.response","AC",s1,s2)
Ft <-    Ftest(data,"od.poisson.response","AC",s1,s2)
Table_4_5[3,5:8] <- c(Bt$t,Bt$p,Ft$F,Ft$p)
Table_4_5
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{References}
\begin{description}
  \item 
    Harnau, J.\ and Nielsen, B.\ (2018)
    Asymptotic theory for over-dispersed age-period-cohort and extended chain-ladder models.
    \textit{Journal of the American Statistical Association} 113, 1722-1732
    \textit{Download}:
    Earlier version:
    \url{https://www.nuffield.ox.ac.uk/economics/Papers/2017/HarnauNielsen2017apcDP.pdf}
  \item 
    Kuang, D. and Nielsen, B. (2020)
    Generalized Log-Normal Chain-Ladder.
    \textit{Scandinavian Actuarial Journal} 2020, 553-576.
    \textit{Download}:
    Open access:
    \url{https://www.tandfonline.com/doi/full/10.1080/03461238.2019.1696885}.
    Earlier version:
    \url{https://www.nuffield.ox.ac.uk/economics/Papers/2018/2018W02_KuangNielsen2018GLNCL.pdf}.
  \item 
    Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a)
    Identification of the age-period-cohort model and the extended chain ladder model.
    \textit{Biometrika} 95, 979-986.
    \textit{Download}:
    Earlier version:
    \url{http://www.nuffield.ox.ac.uk/economics/papers/2007/w5/KuangNielsenNielsen07.pdf}.
  \item 
    Kuang, D., Nielsen, B. and Nielsen, J.P. (2008b)
    Forecasting with the age-period-cohort model and the extended chain-ladder model.
    \textit{Biometrika} 95, 987-991.
    \textit{Download}:
    Earlier version:
    \url{http://www.nuffield.ox.ac.uk/economics/papers/2008/w9/KuangNielsenNielsen_Forecast.pdf}.  
  \item 
    Nielsen, B.\ (2014)
    Deviance analysis of age-period-cohort models. 
    \textit{Download}:
    \url{http://www.nuffield.ox.ac.uk/economics/papers/2014/apc_deviance.pdf}.
  \item
    Nielsen, B.\ (2015)
    apc: An R package for age-period-cohort analysis.
    \textit{R Journal} 7, 52-64.
    \textit{Download}:
    \url{https://journal.r-project.org/archive/2015-2/nielsen.pdf}.
\end{description}


\end{document}