\documentclass[article, shortnames, nojss]{jss}

\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{ctable}
\usepackage[caption = false]{subfig}
\usepackage{pdflscape}
\usepackage{color}


% switch off unnnecessary equation numbering
% \usepackage{mathtools}
% \mathtoolsset{showonlyrefs=true}
%\VignetteIndexEntry{StMoMo: An R Package for Stochastic Mortality Modelling}
%!\VignetteEncoding{UTF-8}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Command definitions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\newcommand{\R}{{\proglang{R}} }
\newcommand{\slfrac}[2]{\left.#1\middle/#2\right.}
\newcommand{\Exp}{\mathbb{E}}
\newcommand{\LogL}{{\cal L}}

%\newcommand{\rev}[1]{{\color{blue} #1}}
\newcommand{\rev}[1]{{#1}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% almost as usual
\author{Andr\'es M. Villegas\\ UNSW Business School \\ UNSW Sydney \And Pietro Millossovich\\ Cass Business School \\ City, University of London  \\ DEAM, University of Trieste \And 
        Vladimir K. Kaishev\\ Cass Business School \\ City, University of London}
\title{\pkg{StMoMo}: An \proglang{R} Package for Stochastic Mortality Modeling}

%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Andr\'es M. Villegas, Vladimir K. Kaishev, Pietro Millossovich} %% comma-separated

\Plaintitle{StMoMo: An R Package for Stochastic Mortality Modeling} %% without formatting
%\Shorttitle{\pkg{foo}: A Capitalized Title} %% a short title (if necessary)

%% an abstract and keywords
\Abstract{
  In this paper we mirror the framework of generalized (non-)linear models to define the family of generalized age-period-cohort stochastic mortality models which encompasses the vast majority of stochastic mortality projection models proposed to date, including the well-known Lee-Carter and Cairns-Blake-Dowd models. We also introduce the \R package \pkg{StMoMo} which exploits the unifying framework of the generalized age-period-cohort family to provide tools for fitting stochastic mortality models, {assessing} their goodness of fit and performing mortality projections. We illustrate some of the capabilities of the package by {performing} a comparison of several stochastic mortality models applied to the England and Wales population.
}
\Keywords{mortality modeling; mortality forecasting; age-period-cohort; generalized non-linear models}
\Plainkeywords{mortality modeling; mortality forecasting; age-period-cohort; generalized non-linear models} %% without formatting
%% at least one keyword must be supplied

%% publication information
%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
%% \Volume{50}
%% \Issue{9}
%% \Month{June}
%% \Year{2012}
%% \Submitdate{2012-06-04}
%% \Acceptdate{2012-06-04}

%% The address of (at least) one author should be given
%% in the following format:
\Address{
  Andr\'es M. Villegas\\
  School of Risk and Actuarial Studies and\\
  ARC Centre of Excellence in Population Ageing Research (CEPAR)\\
  UNSW Business School\\
  UNSW Sydney 2052, Australia\\ 
  E-mail: \email{a.villegas@unsw.edu.au}\\
  
  Pietro Millossovich\\
  Faculty of Actuarial Science and Insurance\\
  Cass Business School\\
  City, University of London\\
  106 Bunhill Row, London, EC1Y 8TZ, UK\\
  E-mail: \email{Pietro.Millossovich.1@city.ac.uk}\\
  
  Vladimir K. Kaishev\\
  Faculty of Actuarial Science and Insurance\\
  Cass Business School\\
  City, University of London\\
  106 Bunhill Row, London, EC1Y 8TZ, UK\\
  E-mail: \email{Vladimir.Kaishev.1@city.ac.uk}\\

}
%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/512/507-7103
%% Fax: +43/512/507-2851

%% for those who use Sweave please include the following line (with % symbols):

%% need no \usepackage{Sweave.sty}

%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{document}
\SweaveOpts{concordance=FALSE}

<<SweaveOptions, eval=TRUE,echo = FALSE>>=
#Do this to add R> at the start of lines
#options(prompt = "R> ")
#options(continue="+  ")
options(prompt = "R> ", continue = "+  ", width = 70, useFancyQuotes = FALSE)
@


%% include your article here, just as usual
%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.

%\section[About Java]{About \proglang{Java}}
%% Note: If there is markup in \(sub)section, then it has to be escape as above.

\section{Introduction}\label{sec:intro}

During the last two centuries developed {countries} experienced a persistent increase in life expectancy. For instance, \citet{Oeppen2002} estimate that during the last 160 years the world record in female life expectancy at birth has increased at an approximate steady pace of 3 months per year. This increase in life expectancy, though a sign of social progress, poses a challenge to governments, private pension plans and life insurers {because of its} impact on pension and health costs.  Actuaries and demographers have recognized the {problems caused by} an aging population and rising longevity and have thus devoted {significant attention} to the development of statistical techniques for the modeling and projection of mortality rates.\\  

One of the most influential approaches {to} the stochastic modeling of mortality rates is the parsimonious mortality model proposed by \cite{Lee1992}. This model uses principal component analysis to decompose the {age-time matrix of} mortality rates into a bilinear combination of age and period parameters, with {the latter} being treated as time series to produce mortality projections. The Lee-Carter model has inspired numerous variants and extensions. For instance, \citet{Lee2001}, \citet{Booth2002}, and  \citet{Brouhns2002} have proposed alternative estimation approaches in order to improve the goodness-of-fit and {the} forecasting properties of the model.  In particular, \cite{Brouhns2002} propose a more formal statistical approach to estimating the parameters by embedding the Lee-Carter model into a Poisson regression setting. Other authors have extended the Lee-Carter model by including additional terms, such as {multiple} bilinear age-period {components} \citep{Renshaw2003a, Hyndman2007}, or a cohort effect {term} \citep{Renshaw2006}.\\
  
The two factor {Cairns-Blake-Dowd (CBD)} model introduced by \cite{Cairns2006a} is one of the most prominent {variants of} the Lee-Carter {model}. {The CBD} model relies on the linearity of the logit of one-year death {probabilities} at older ages. Specifically, {it} assumes that, for a {given}, year the logit of the one-year death probability is a linear function of age, and treats the intercept and slope parameters across years as stochastic processes. \cite{Cairns2009} consider three extensions to the original CBD model by incorporating combinations of a quadratic age term and a cohort effect {term}. \cite{Plat2009a} has combined {features of the CBD and the Lee-Carter models} to produce a model that is suitable for full age ranges and captures the cohort effect.\\

Given the abundance and rapid increase in the number of stochastic mortality models proposed in the literature, there have been some recent attempts to find the commonalities among these models. \citet{Hunt2015b} review the structure of mortality models and describe an age-period-cohort model structure which encompasses the vast majority of stochastic mortality models. \citet{Currie2014} shows that many mortality models can be expressed in terms of generalized linear models or generalized non-linear models.\\

In this paper, we build {upon} the {works} of \citet{Hunt2015b} and \citet{Currie2014} to define the family of generalized age-period-cohort stochastic mortality models by mirroring the terminology of generalized linear models. {We also} introduce the \R package \pkg{StMoMo} \citep{StMoMo2016} which exploits the unifying framework of the generalized age-period-cohort family \rev{combined with the powerful fitting function of the \pkg{gnm} package \citep{Turner2012}} to provide computational tools for implementing many of the stochastic mortality models proposed to date.\footnote{The acronym StMoMo, pronounced Saint Momo, stands for Stochastic Mortality Modeling. Momo is the king of Carnivals in numerous Latin American festivities \citep{Wikipedia2015}.} The \pkg{StMoMo} package is available at \url{http://cran.r-project.org/package=StMoMo}. Version 0.4.0 has been used for this paper.\\

Several packages for mortality modeling are available in the \R environment \citep{RCoreTeam2014}. The package \pkg{demography} \citep{Hyndman2014}, whose usage is explained in detail in \cite{Booth2014}, implements, among other things, the original Lee-Carter model along with the \cite{Lee2001}, \cite{Booth2002}, and \cite{Hyndman2007} variants. The \pkg{ilc} package \citep{Butt2014} implements the \citet{Renshaw2006} cohort extension of the Lee-Carter model together with the Lee-Carter model under a Poisson regression framework. The \pkg{LifeMetrics} \R functions implement the original CBD model and the three extended CBD models considered in \citet{Cairns2009}, along with the Lee-Carter model (using Poisson maximum likelihood), \rev{the traditional age-period-cohort model (see, \citet{Osmond1985})} and the \citet{Renshaw2006} model. This package, which is not on CRAN, is available at  \url{http://www.macs.hw.ac.uk/~andrewc/lifemetrics/}.\\ 

There are however several {drawbacks of the existing packages} which our package \pkg{StMoMo} seeks to overcome. First, the existing packages are based on model-specific fitting algorithms limiting the models available to those already predefined in the packages. By contrast, \pkg{StMoMo} allow{s} users to easily expand the number of models available. In addition, whilst \pkg{StMoMo} provides forecasting and simulation functions for any model within the generalized age-period-cohort family, the existing packages only provide such {functions} for a limited number of models. For instance,  {the} package \pkg{ilc} only includes forecasting functions for the Lee-Carter model. Similarly, {simulation with the package \pkg{LifeMetrics} is limited to the Lee-Carter and the standard CBD models}. Finally, \pkg{StMoMo} provides {functions} which are not available in existing packages, such as tools for analyzing the goodness-of-fit\footnote{{We note that \pkg{ilc} also provides some graphical tools for assessing the goodness of fit of the models implemented in that package.}} and evaluating the impact of parameter uncertainty using bootstrapping techniques.\\ 

{\pkg{StMoMo} comes with {a set of} functions for defining an abstract model --- specifying for instance the number of period terms, whether coefficients are parametric or not --- and for fitting a given model. This is particularly useful when estimating several models on a given dataset or a given model to different datasets. The package also provides preset functions for defining the most common models available in the mortality forecasting literature. {In addition, other models preferred by the user can be created} in a very simple fashion, see Section~\ref{sec:GAPC_StMOMO} where several examples are given. Therefore, the flexibility of the package allows a user to quickly build up a battery of different models, and this is particularly useful when seeking the most appropriate mortality model, comparing different models or assessing model risk. \pkg{StMoMo} is particularly appealing for actuaries managing life and pensions portfolios exposed to longevity risk. The code backing most functions {implemented} in the package has been extensively used and tested for the development of multi-population mortality models for assessing basis risk in longevity risk transactions, see \cite{Haberman2014}.}\\

In this paper we describe the statistical framework underlying \pkg{StMoMo} and illustrate {its usage}. {For this purpose}, we use as a running example a comparison of several stochastic mortality models fitted to the England and Wales population. This example is in the spirit of the comparison exercises of \cite{Cairns2009,Cairns2011a}, \cite{Haberman2011a} and \cite{Lovasz2011}, {allowing us to} show how several of the analysis performed in these papers can easily be replicated using \pkg{StMoMo}. The structure of {the} paper is as follows. In Section~\ref{sec:Notation} we {introduce} our notation. In Section~\ref{sec:GAPC}, we mirror the framework of generalized linear models to define the family of generalized age-period-cohort (GAPC) stochastic mortality models and {demonstrate that} many of the mortality models discussed in the literature can be framed within this family. {In} Section~\ref{sec:GAPC_StMOMO} {we explain how the GAPC family of models is implemented in \pkg{StMoMo}}. In Section~\ref{sec:Fitting}, we describe the fitting of GAPC mortality models and illustrate how this can be accomplished using \pkg{StMoMo}. {In Section~\ref{sec:GoF} we consider} the evaluation of the goodness-of-fit of GAPC models. In Section~\ref{sec:Forecasting} we discuss the forecasting and simulation of GAPC models using time series techniques. Section~\ref{sec:ParameterUncertainty} describes the use of bootstrapping techniques to incorporate parameter uncertainty in the estimation and forecasting of GAPC mortality models. Finally, {in Section~\ref{sec:Conclusions} we provide some conclusions and discuss  possible extensions of the \pkg{StMoMo} package}. 

\section{Notation and data}\label{sec:Notation}

Let the random variable $D_{xt}$ denote the number of deaths in a population at age $x$ last birthday during calendar year $t$. Also let $d_{xt}$ denote the observed number of deaths, $E^c_{xt}$ the central exposed to risk at age $x$ in year $t$, and $E^0_{xt}$ the corresponding initial exposed to risk. The one-year \rev{death probability} for an individual aged $x$ last birthday and in calendar year $t$, denoted $q_{xt}$, can be estimated as $\hat{q}_{xt}=\slfrac{d_{xt}}{E^0_{xt}}$. The force of mortality and central death rates are denoted by $\mu_{xt}$ and $m_{xt}$, respectively, with the empirical estimate of the latter being $\hat{m}_{xt}=\slfrac{d_{xt}}{E^c_{xt}}$. Under the assumption that the force of mortality is constant over each year of age {and calendar year}, i.e., from age $x$ to age $x+1$ and year $t$ to $t+1$, then the force of mortality $\mu_{xt}$ and the death rate $m_{xt}$ coincide. We assume that this is the case throughout.\\


In \pkg{StMoMo} and throughout this paper we assume that deaths, $d_{xt}$, and either central exposures, $E^c_{xt}$, or initial exposures, $E^0_{xt}$, are available in a rectangular array format comprising ages (on the rows) $x = x_1, x_2, \ldots, x_k$, and calendar years (on the columns) $t = t_1,t_2, \ldots, t_n$, When only central exposures are available and initial exposures are required (or vice-versa), one can approximate the initial exposures by adding half the matching reported numbers of deaths to the central exposures, i.e, $E^0_{xt}\approx E^c_{xt} + \frac{1}{2}d_{xt}$. When the context is clear, we may write $E_{xt}$ to refer to $E^0_{xt}$ or $E^c_{xt}$.

\section{Generalized APC stochastic mortality models}\label{sec:GAPC}

Some authors have recently sought to identify the similarities among stochastic mortality models. For instance, \citet{Hunt2015b} describe an age-period-cohort (APC) model structure which encompasses the vast majority of stochastic mortality models. In another interesting contribution, \citet{Currie2014} shows {that} many common mortality models can be expressed in the standard terminology of generalized linear {or non-linear models}. In this section, we build {upon} the aforementioned papers to define the family of generalized age-period-cohort (GAPC) stochastic mortality models. \\

Akin to generalized linear models (see, e.g., \citet{McCullagh1989}), a GAPC stochastic mortality model is comprised of four components:

\begin{enumerate}
 %Random Component
  \item The \textit{random component}: {the numbers of deaths $D_{xt}$} {follow a Poisson distribution or a Binomial distribution}, so that 
  \begin{equation*}\label{eq:Poisson}
 D_{xt}\sim \mathrm{Poisson}(E^c_{xt}\mu_{xt})
 \end{equation*}
 or 
 \begin{equation*}\label{eq:Binomical}
 D_{xt}\sim \mathrm{Binomial}(E^0_{xt}, q_{xt}),
 \end{equation*}
 with $\Exp\left(\slfrac{D_{xt}}{E^c_{xt}}\right) = \mu_{xt}$ and $\Exp\left(\slfrac{D_{xt}}{E^0_{xt}}\right) = q_{xt}$, respectively.

 %Systematic Component
 \item The \textit{systematic component}: following \citet{Hunt2015b} the effects of age $x$, calendar year $t$  and year-of-birth (cohort) $c=t-x$ are captured through a \textit{predictor} $\eta_{xt}$ given by:
 \begin{equation*}\label{eq:predictor}
  \eta_{xt} = \alpha_x + \sum_{i=1}^N \beta_x^{(i)}\kappa_t^{(i)} + \beta_x^{(0)}\gamma_{t-x}.
 \end{equation*}
Here:
\begin{itemize}
  \item The term $\alpha_x$ is a static age function capturing the general shape of mortality by age.
  \item $N\geq 0$ is an integer indicating the number of age-period terms describing the mortality trends, with each time index $\kappa_t^{(i)}$, $i = 1,\ldots, N$, contributing in specifying the mortality trend and $\beta_x^{(i)}$ modulating its effect across ages.
 \item The term $\gamma_{t-x}$ accounts for the cohort effect  with $\beta_x^{(0)}$ modulating its effect across ages.
\end{itemize}
The age modulating terms $\beta_x^{(i)}$, $i = 0,1,\ldots, N$, can be either pre-specified functions of age, i.e., $\beta_x^{(i)}\equiv f^i(x)$, as in CBD type models, or non-parametric terms without any prior structure which need to be estimated as in the Lee-Carter model. \rev{In the GAPC family we assume that the period indexes $\kappa_t^{(i)}$, $i = 1,\ldots, N$, and the cohort index $\gamma_{t-x}$ are stochastic processes}. This is the key feature that allows the stochastic projection of GAPC models and thus the generation of probabilistic forecasts of future mortality rates.\\

%Link function 
\item The \textit{link function} {$g$} associating the random component and the systematic component so that 
 \begin{equation*}\label{eq:link}
  g\left(\Exp\left(\frac{D_{xt}}{E_{xt}}\right)\right) = \eta_{xt}  .
 \end{equation*}
Although a number of link functions would be possible, it is convenient to use the so-called canonical link and pair the Poisson distribution with the log link function and the Binomial distribution with the logit link function (see, e.g., \citet{Currie2014} for a discussion of this in the context of mortality models and \citet{McCullagh1989} in the wider context of GLMs).\\

%constraint function 
\item The \textit{set of parameter constraints}: most stochastic mortality models are only identifiable up to a transformation and thus require parameter constraints to ensure unique parameter estimates. These parameter constraints are applied {through a} \textit{constraint function} {$v$} which maps an arbitrary {vector} of {parameters}
\[\theta := \left(\alpha_{x}, \beta_x^{(1)},..., \beta_x^{(N)}, \kappa_t^{(1)},..., \kappa_t^{(N)}, \beta_x^{(0)}, \gamma_{t-x}\right)\] 
into a {vector} of {transformed parameters} {\[v(\theta)=\tilde{\theta} = \left(\tilde{\alpha}_{x}, \tilde{\beta}_x^{(1)},..., \tilde{\beta}_x^{(N)}, \tilde{\kappa}_t^{(1)},..., \tilde{\kappa}_t^{(N)}, \tilde{\beta}_x^{(0)}, \tilde{\gamma}_{t-x}\right)\]}
satisfying the model constraints {with no effect on the predictor $\eta_{xt}$ (i.e., $\theta$ and $\tilde{\theta}$ result in the same $\eta_{xt}$)}.
\end{enumerate}

Most stochastic mortality models proposed in the literature belong to the GAPC family. This includes the original Lee-Carter model, the extensions of the Lee-Carter proposed in \citet{Renshaw2003a, Renshaw2006}, the original CBD model, and the extended CBD models of \cite{Cairns2009}. In addition, all the model structures considered in \citet{Haberman2011a}, \citet{Lovasz2011}  and \citet{VanBerkum2014}, as well as the models of \cite{Plat2009a}, \cite{Aro2011}, \cite{OHare2012}, \citet{Borger2013} and \citet{Alai2014b}, are part of the GAPC class of models.\footnote{{We note however that models which rely on the smoothness of mortality over both age and time, such as the graduation approach of \citet{Renshaw1996} and the P-Spline model of \rev{\citet{Currie2006a}}, do not belong to the GAPC family. The P-Spline approach is implemented in the \pkg{MortalitySmooth} \R package \citep{Camarda2012}.}}\\

Next, we describe in detail some of these models highlighting how they can  be framed within the GAPC family.


\subsection{Lee-Carter model under a Poisson setting}\label{ex:LC}

\citet{Brouhns2002} have implemented the Lee-Carter model assuming a Poisson distribution of the number of deaths and using the log \textit{link function} with respect to the force of mortality $\mu_{xt}$. The predictor structure proposed by \citet{Lee1992} assumes that there is a static age function, $\alpha_x$, a unique non-parametric age-period term ($N=1$), and no cohort effect. Thus, the predictor is given by:
\begin{equation}\label{eq:LCpredictor}
\eta_{xt} = \alpha_x + \beta^{(1)}_x \kappa^{(1)}_t
\end{equation}
{In order to project mortality, the time index $\kappa^{(1)}_t$ is modeled and forecasted using ARIMA processes. Typically, a random walk with drift {has been shown to provide} a reasonable fit, that is,
\begin{equation*}\label{eq:LCkappa}
\kappa^{(1)}_t = \delta + \kappa^{(1)}_{t-1} + \xi_t, \qquad \xi_t\sim N(0,\sigma^2_{\kappa}) \qquad \text{i.i.d.}, 
\end{equation*}
where $\delta$ is the drift parameter and $\xi_t$ is a Gaussian white noise process with variance $\sigma^2_{\kappa}$.}\\

The Lee-Carter model is only identifiable up to a transformation, as for arbitrary real constants $c_1$ and $c_2\neq 0$ the parameters in Equation~\ref{eq:LCpredictor} can be transformed in the following way 
\begin{equation}\label{eq:LCTransformations}
\left(\alpha_x,\beta^{(1)}_x, \kappa^{(1)}_t\right)\to\left(\alpha_x + c_1\beta^{(1)}_x, \frac{1}{c_2}\beta^{(1)}_x, c_2(\kappa^{(1)}_t - c_1)\right),
\end{equation}
leaving $\eta_{xt}$ unchanged. To ensure identifiability of the model, \citet{Lee1992} suggest the  following \textit{set of parameter constraints}
\begin{equation}\label{eq:LCConstraints}
\sum_x \beta^{(1)}_x = 1, \qquad \sum_t \kappa^{(1)}_t = 0, 
\end{equation}
which can be imposed by choosing
\begin{equation}\label{eq:LCConstants}
c_1 = \frac{1}{n}\sum_t \kappa^{(1)}_t, \qquad c_2 = \sum_x \beta^{(1)}_x
\end{equation}
in transformation (\ref{eq:LCTransformations}).


\subsection{Renshaw and Haberman model: Lee-Carter with cohort effects}\label{ex:RH}
\cite{Renshaw2006} generalize the Lee-Carter model by incorporating a cohort effect to obtain the predictor:
\begin{equation}\label{eq:LCCohortEffect}
\eta_{xt} = \alpha_x + \beta^{(1)}_x \kappa^{(1)}_t   + \beta^{(0)}_x \gamma_{t-x}
\end{equation} 
{Mortality projections for this model are derived using time series forecast of the estimated $\kappa^{(1)}_t$ and $\gamma_{t-x}$, generated using univariate ARIMA processes under the assumption of independence between the period and the cohort effects.}\\

In order to estimate the model, \cite{Renshaw2006} assume a Poisson distribution of deaths (\textit{random component}) and use a log \textit{link} function targeting the force of mortality $\mu_{xt}$. As with the Lee-Carter model, the predictor $\eta_{xt}$ is invariant to the transformation:
\begin{align}
\left(\alpha_x,\beta^{(1)}_x, \kappa^{(1)}_t, \beta^{(0)}_x, \gamma_{t-x}\right)\to &\bigg(\alpha_x + c_1\beta^{(1)}_x + c_2\beta^{(0)}_x , \frac{1}{c_3}\beta^{(1)}_x, \notag\\
& c_3(\kappa^{(1)}_t - c_1), \frac{1}{c_4}\beta^{(0)}_x,  c_4(\gamma_{t-x} - c_2)\bigg),\label{eq:RHtransformations}
\end{align}
where $c_1$, $c_2$, $c_3\neq 0$ and $c_4\neq 0$ are real constants. Identifiability of the model can be ensured using the following \textit{set of parameter constraints}:
\begin{equation*}\label{eq:RHGAPCconstraints}
\sum_x \beta^{(1)}_x = 1, \qquad \sum_t \kappa^{(1)}_t = 0, \qquad \sum_x \beta^{(0)}_x = 1, \qquad \sum_{c=t_1-x_k}^{t_n - x_1}\gamma_c = 0, 
\end{equation*}
which can be imposed by setting
\begin{equation*}\label{eq:RHGAPCconsFun}
c_1 = \frac{1}{n}\sum_t \kappa^{(1)}_t, \quad c_2 = \frac{1}{n+k-1}\sum_{c=t_1-x_k}^{t_n - x_1}\gamma_c, \quad c_3 = \sum_x \beta^{(1)}_x, \quad c_4 = \sum_x \beta^{(0)}_x,
\end{equation*}
in transformation (\ref{eq:RHtransformations}).\\

\cite{Renshaw2006} also consider several substructures of the predictor (\ref{eq:LCCohortEffect}) obtained by setting to a constant one or both of the age modulating terms. Of particular interest is the substructure {obtained by setting $\beta_{x}^{(0)} = 1$,}
\begin{equation}\label{eq:CohortEffectH1}
\eta_{xt} = \alpha_x + \beta^{(1)}_x \kappa^{(1)}_t   + \gamma_{t-x},
\end{equation} 
which has been suggested by \cite{Haberman2011a} as a simpler structure that resolves some stability issues of the original model.

\subsection{APC model}\label{ex:APC}
Another commonly used substructure of the Renshaw and Haberman model is the so-called age-period-cohort (APC) model, {corresponding to $\beta_{x}^{(1)} = 1$, $\beta_{x}^{(0)} = 1$,}
\begin{equation*}\label{eq:CohortEffectAPC}
\eta_{xt} = \alpha_x +  \kappa^{(1)}_t   + \gamma_{t-x},
\end{equation*} 
which has a long-standing tradition in the fields of medicine and demography \rev{(see, e.g., \cite{Clayton1987}, \cite{Hobcraft1982} and \cite{Osmond1985})}, but has not been widely used in the actuarial literature until {it was} considered by \cite{Currie2006}. The APC model is known to be {invariant with respect to} the following two transformations:

\begin{align}
\left(\alpha_x, \kappa^{(1)}_t, \gamma_{t-x}\right) &\to \left(\alpha_x + \phi_1 - \phi_2x, \kappa^{(1)}_t + \phi_2t, \gamma_{t-x} - \phi_1 - \phi_2(t-x)\right)\label{eq:APCTransGamma}\\
\left(\alpha_x, \kappa^{(1)}_t, \gamma_{t-x}\right) &\to \left(\alpha_x + c_1, \kappa^{(1)}_t - c_1, \gamma_{t-x}\right)\label{eq:APCTransKappa},
\end{align}
where $c_1$, $\phi_1$, and $\phi_2$ are real constants. However, we can ensure identifiability of the model by imposing the \textit{set of constraints}:
\begin{equation*}\label{eq:APCconstraints}
\sum_t \kappa^{(1)}_t = 0, \qquad \sum_{c=t_1-x_k}^{t_n - x_1}\gamma_c = 0, \qquad \sum_{c=t_1-x_k}^{t_n - x_1}c\gamma_c = 0,
\end{equation*}
where the last two constraints {imply} that the cohort effect fluctuates around zero with no discernible linear trend. Following \citet[Appendix A]{Haberman2011a}, the constraints on the cohort effect can be imposed by applying transformation (\ref{eq:APCTransGamma}) with constants $\phi_1$ and $\phi_2$ obtained by regressing $\gamma_{t-x}$ on $t-x$, so that 
\begin{equation*}\label{eq:APCConstFun1}
\gamma_{t-x} = \phi_1 + \phi_2(t-x) + \epsilon_{t-x}, \quad \epsilon_{t-x} \sim N(0,\sigma^2) \quad \text{i.i.d.}.
\end{equation*}
The constraint on the period index can then be imposed by applying transformation (\ref{eq:APCTransKappa}) with 
\begin{equation*}\label{eq:APCconsFun2}
c_1 = \frac{1}{n}\sum_t \kappa^{(1)}_t.
\end{equation*}

\subsection{CBD model}\label{ex:CBD}
\citet{Cairns2006a} propose a predictor structure with two age-period terms ($N=2$) with pre-specified age-modulating parameters $\beta^{(1)}_x = 1$ and $\beta^{(2)}_x = x-\bar{x}$, no static age function and no cohort effect. Thus, the predictor of the CBD model is given by: 
\begin{equation*}\label{eq:CBDGAPCpredictor}
\eta_{xt} =  \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)},
\end{equation*}
where $\bar{x}$ is the average age in the data. {\citet{Cairns2006a} obtain mortality forecasts by projecting the period effects $\kappa_t^{(1)}$ and $\kappa_t^{(2)}$ using a bivariate random walk with drift}.\\

The CBD model does not have identifiability issues and hence the \textit{set of parameter constraints} is empty. In order to estimate the parameter of the CBD model we can follow \citet{Haberman2011a} and  assume a Binomial distribution of deaths using a logit \textit{link function} targeting the one-year death probabilities $q_{xt}$.
\newpage
\subsection{M7: Quadratic CBD model with cohort effects}\label{ex:M7}
\cite{Cairns2009} extend the original CBD model by adding a cohort effect and a quadratic age effect to obtain the predictor:
\begin{equation}\label{eq:M7}
\eta_{xt} = \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)} + \left((x-\bar{x})^2-\hat{\sigma}^2_x\right)\kappa^{(3)}_t  + \gamma_{t-x}, 
\end{equation}
where $\hat{\sigma}^2_x$ is the average value of $(x-\bar{x})^2$. This model, usually referred to as model M7, is not identifiable {as} the transformation
\begin{align}
\left(\kappa^{(1)}_t, \kappa^{(2)}_t, \kappa^{(3)}_t, \gamma_{t-x}\right) \to& \bigg(\kappa^{(1)}_t + \phi_1 + \phi_2 (t-\bar{x}) + \phi_3\left((t-\bar{x})^2+\hat{\sigma}^2_x\right), \kappa^{(2)}_t - \phi_2-2\phi_3(t-\bar{x}),\notag \\
 & \kappa^{(3)}_t + \phi_3, \gamma_{t-x} - \phi_1 - \phi_2(t-x) - \phi_3(t-x)^2\bigg),\label{eq:M7Trans}
\end{align}
for real constants $\phi_1$, $\phi_2$ and $\phi_3$, {leaves the predictor unchanged}. To identify the model \cite{Cairns2009} impose the \textit{set of constraints}:
\begin{equation*}\label{eq:M7constraints}
\sum_{c=t_1-x_k}^{t_n - x_1}\gamma_c = 0, \qquad \sum_{c=t_1-x_k}^{t_n - x_1}c\gamma_c = 0, \qquad \sum_{c=t_1-x_k}^{t_n - x_1}c^2\gamma_c = 0, 
\end{equation*}
which  ensure that the cohort effect fluctuates around zero and has no discernible linear or quadratic trend. Following \citet[Appendix A]{Haberman2011a}, these constraints can be imposed by applying transformation (\ref{eq:M7Trans}) with constants $\phi_1$, $\phi_2$ and $\phi_3$ obtained by regressing $\gamma_{t-x}$ on $t-x$ and $(t-x)^2$, so that 
\begin{equation*}\label{eq:M7ConstFun1}
\gamma_{t-x} = \phi_1 + \phi_2(t-x) + \phi_3(t-x)^2 + \epsilon_{t-x}, \quad \epsilon_{t-x} \sim N(0,\sigma^2) \quad \text{i.i.d.}.
\end{equation*}
\cite{Cairns2009} also consider the simpler predictor structures
\begin{align*}
\eta_{xt} &= \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)} + \gamma_{t-x}, \\
\eta_{xt} &= \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)} + (x_c - x)\gamma_{t-x}, 
\end{align*}
where $x_c$ is a constant parameter to be estimated. These structures are typically referred to as models M6 and M8, respectively.

\subsection{Plat model}\label{ex:Plat}
%\begin{example}[Plat model for older ages]\label{ex:Plat}
\citet{Plat2009a} combines the CBD model with some features of the Lee-Carter model to produce a model that is suitable for full age ranges and captures the cohort effect. The proposed predictor structure assumes that there is a static age function, $\alpha_x$, three age-period terms ($N=3$) with pre-specified age-modulating parameters $\beta^{(1)}_x = 1$, $\beta^{(2)}_x = \bar{x} -x$, $\beta^{(3)}_x = (\bar{x}-x)^+=\max(0, \bar{x}-x)$, and a cohort effect with pre-specified age-modulating parameters $\beta^{(0)}_x = 1$. Thus, the predictor is given by:  
\begin{equation}\label{eq:Plat}
\eta_{xt} =  \alpha_x  + \kappa_t^{(1)} + (\bar{x}-x)\kappa_t^{(2)} + (\bar{x}-x)^+\kappa^{(3)}_t + \gamma_{t-x}.
\end{equation}
\citet{Plat2009a} targets the force of mortality $\mu_{xt}$ with the log \textit{link} and estimates the parameters of the model by assuming a Poisson distribution of the deaths.  The following parameter transformations leave the predictor in (\ref{eq:Plat}) unchanged:
\begin{align}
\left( \alpha_x, \kappa^{(1)}_t, \kappa^{(2)}_t, \kappa^{(3)}_t,  {\gamma}_{t-x} \right) &\to \Big(\alpha_x + \phi_1 - \phi_2 x + \phi_3 x^2 ,\kappa^{(1)}_t + \phi_2t + \phi_3(t^2-2\bar{x}t), \notag\\
& \kappa^{(2)}_t + 2\phi_3t, \kappa^{(3)}_t, \gamma_{t-x} - \phi_1 - \phi_2(t-x) - \phi_3(t-x)^2\Big) \label{eq:PlatTrans2}\\
\left( \alpha_x, \kappa^{(1)}_t, \kappa^{(2)}_t, \kappa^{(3)}_t, \gamma_{t-x} \right) &\to \Big(\alpha_x + c_1 + c_2 (\bar{x}-x) + c_3(\bar{x}-x)^+,  \notag\\
& \kappa^{(1)}_t - c_1, \kappa^{(2)}_t - c_2, \kappa^{(3)}_t - c_3 , \gamma_{t-x} \Big) \label{eq:PlatTrans1},
\end{align}
where $c_1$, $c_2$, $c_3$, $\phi_1$, $\phi_2$, and $\phi_3$ are any real constants. The following \textit{set of parameter constraints} can be imposed to ensure identifiability:
\begin{equation} \label{eq:PlatConst}
\sum_t \kappa^{(1)}_t = 0,  \sum_t \kappa^{(2)}_t = 0, \sum_t \kappa^{(3)}_t = 0, \sum_{c=t_1-x_k}^{t_n - x_1}\gamma_c = 0, \sum_{c=t_1-x_k}^{t_n - x_1} c\gamma_c = 0, \sum_{c=t_1-x_k}^{t_n - x_1} c^2\gamma_c = 0  
\end{equation}
The first three constraints ensure that the period indexes are centered around zero, while the last three constraints ensure that the cohort effect fluctuates around zero and has no linear or quadratic trend. Following \citet[Appendix A]{Haberman2011a}, the constraints on the cohort effect can be imposed by applying transformation (\ref{eq:PlatTrans2}) with constants $\phi_1$, $\phi_2$, and $\phi_3$ obtained by regressing $\gamma_{t-x}$ on $t-x$ and $(t-x)^2$, so that 
\begin{equation}\label{eq:PlatConstFun1}
\gamma_{t-x} = \phi_1 + \phi_2(t-x) + \phi_3(t-x)^2 + \epsilon_{t-x}, \quad \epsilon_{t-x} \sim N(0,\sigma^2) \quad \text{i.i.d.}.
\end{equation}
The constraints on the period indexes can then be imposed by applying transformation (\ref{eq:PlatTrans1}) with
\begin{equation}\label{eq:PlatConstFun3}
c_i = \frac{1}{n}\sum_t \kappa^{(i)}_t, \qquad i=1,2,3.
\end{equation}
In the cases where only older ages are of interest, \cite{Plat2009a} suggests to drop the third period term from predictor (\ref{eq:Plat}):\footnote{Note that the reduced Plat model is essentially the M6 model of \cite{Cairns2009} with an added static age term $\alpha_x$.}   
\begin{equation}\label{eq:PlatOld}
\eta_{xt} = \alpha_x + \kappa^{(1)}_t + (\bar{x}-x)\kappa^{(2)}_t + \gamma_{t-x}.
\end{equation}
We note that this reduced Plat model has the same identifiability issues as the complete model with the omission of the transformations and constraints involving $\kappa^{(3)}_t$ and $c_3$. 

\newpage
\section[GAPC stochastic mortality models with StMoMo]{GAPC stochastic mortality models with \pkg{StMoMo}}\label{sec:GAPC_StMOMO}

{The \pkg{StMoMo} package provides an \R implementation of the GAPC family of stochastic mortality models using the standard \code{S3} object-oriented system. \pkg{StMoMo} can be installed with the code:}
 
<<installStMoMo, eval = FALSE>>=
install.packages("StMoMo")
@

\noindent 
The package is loaded within \R  as follows:
<<loadStMoMo, eval=TRUE,echo = TRUE, message=FALSE, warning=FALSE>>=
library("StMoMo")
@

{In the package \pkg{StMoMo}, GAPC stochastic mortality models are constructed using the \code{StMoMo} function}. The synopsis of this functions is:
\begin{Code}
 StMoMo(link = c("log", "logit"), staticAgeFun = TRUE, periodAgeFun = "NP",
   cohortAgeFun = NULL, constFun = function(ax, bx, kt, b0x, gc, wxt, ages)
   list(ax = ax, bx = bx, kt = kt, b0x = b0x, gc = gc))
\end{Code}
The \code{StMoMo} function takes as input information on the \textit{link function} (and the associated distributional assumption), the \textit{predictor structure}, and the \textit{set of parameter constraints} to create an object of {the} type \code{"StMoMo"} representing a GAPC mortality model:
\begin{itemize}
 \item The argument \code{link} defines the \textit{link function} and {\textit{the random component}} associated with the mortality model. Setting \code{link = "log"} assumes that deaths follow a Poisson distribution and uses a log link targeting the force of mortality $\mu_{xt}$,  while setting \code{link = "logit"} assumes that deaths follow a Binomial distribution and uses a logit link targeting one-year \rev{death probabilities} $q_{xt}$.
 
 \item The \textit{predictor} of the model is defined via the arguments \code{staticAgeFun}, \code{periodAgeFun} and \code{cohortAgeFun}.  Argument \code{staticAgeFun} is a logical {variable} indicating whether the model has a static age function $\alpha_x$ or not. Argument \code{periodAgeFun} is a list of length $N$ containing the definitions of the period age-modulating parameters $\beta_x^{(i)}$, $i=1,\ldots,N$, with each entry being {either} \code{"NP"} for non-parametric age terms, \code{"1"} for $\beta_x^{(i)}=1$, or a predefined parametric function of age.\footnote{{Note that we can define a model with no age-period terms ($N=0$) by making \code{periodAgeFun = NULL}.}}  Argument \code{cohortAgeFun} defines the cohort age modulating parameter $\beta_x^{(0)}$ and can take values  \code{"NP"} for non-parametric age terms, \code{"1"} for $\beta_x^{(0)}=1$, a predefined parametric function of age, or \code{NULL} if the model does not have a cohort effect.
 
 \item The \textit{set of parameter constraints} are defined via {the} argument \code{constFun} which is a user-defined {implementation of the \textit{constraint function} $v$}  mapping an arbitrary {vector} of {parameters} to a {vector} of {transformed parameters} satisfying the model constraints.
\end{itemize} 

{We note that due to limitations of the \R functions used for fitting \code{"StMoMo"} objects to
data (see Section~\ref{sec:Fitting}), the current version of \pkg{StMoMo} does not support models combining parametric and non-parametric age-modulating functions, $\beta_x^{(i)}$, $0=1,\ldots,N$. However, such models are not typically considered and the majority of models proposed in the literature are either extensions of the Lee-Carter model with all age-modulating terms being non-parametric or extensions of the CBD model with all age-modulating terms being parametric.}\footnote{{For instance, a model with predictor structure $\eta_{xt} = \alpha_x + (x-\bar{x})\kappa^{(1)}_t + \beta^{(2)}_x\kappa^{(1)}_t$, corresponding to \code{StMoMo(staticAgeFun = TRUE, periodAgeFun = c(f1, "NP"))}, with \code{f1 <- function(x, ages) x - mean(ages)}, is not supported.}}  \\

\renewcommand{\arraystretch}{1.5}
\begin{table}[!tb]
  \centering
		\begin{tabular}{m{2cm} m{10cm} }
		\toprule		
		 \textbf{Model}& \textbf{Predictor} \tabularnewline \midrule		
		LC           & $\eta_{xt} = \alpha_x + \beta^{(1)}_x \kappa^{(1)}_t$ \tabularnewline 
		CBD                   & $\eta_{xt} = \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)}$ \tabularnewline  	
    APC                  & $\eta_{xt} = \alpha_x + \kappa^{(1)}_t+ \gamma_{t-x}$ \tabularnewline
		RH & $\eta_{xt} = \alpha_x + \beta^{(1)}_x \kappa^{(1)}_t + \gamma_{t-x}$\tabularnewline  	
		M7                   & $\eta_{xt} = \kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)} + \left((x-\bar{x})^2-\hat{\sigma}_x^2\right)\kappa_t^{(3)} + \gamma_{t-x}$ \tabularnewline
    PLAT                   & $\eta_{xt} = \alpha_x + \kappa_t^{(1)} + (\bar{x}-x)\kappa_t^{(2)} + \gamma_{t-x}$ \tabularnewline
		\bottomrule
		\end{tabular}		
		\caption{Model structures considered in this paper.}\label{tab:Models}		
\end{table}
\renewcommand{\arraystretch}{1}


In order to illustrate the creation of particular GAPC mortality models and other {capabilities} of \pkg{StMoMo}, in the rest of this paper we will focus on the models summarized in Table~\ref{tab:Models}. From now \rev{on}, LC stands for the Lee-Carter model; CBD for the original Cairns-Blake-Dowd model; \rev{APC for the age-period-cohort model}; RH for the cohort extension of the Lee-Carter model defined in Equation~\ref{eq:CohortEffectH1} and proposed by \citet{Renshaw2006}; M7 for the quadratic CBD model defined in Equation~\ref{eq:M7}; and PLAT for the reduced Plat model defined previously in Equation~\ref{eq:PlatOld}. For the sake of comparability, in all cases we will assume a Binomial distribution of deaths and use the logit function to link $q_{xt}$ to the predictor structure $\eta_{xt}$. \\

Below, we show how to define {each of the models in Table~\ref{tab:Models} using the package \pkg{StMoMo}}.

\subsubsection*{Lee-Carter model}

The LC model under a Binomial setting can be defined using the following code:
<<defineLClong, eval = FALSE>>=
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
  c1 <- mean(kt[1, ], na.rm = TRUE)
  c2 <- sum(bx[, 1], na.rm = TRUE)
  list(ax = ax + c1 * bx, bx = bx / c2, kt =  c2 * (kt - c1))   
} 
LC <- StMoMo(link = "logit", staticAgeFun = TRUE, periodAgeFun = "NP", 
  constFun = constLC)
@
\noindent
Recalling Section~\ref{ex:LC}, we note that the constraint function \code{constLC} is the \R implementation of transformation (\ref{eq:LCTransformations}) with constants $c_1$ and $c_2$ calculated using Equation~\ref{eq:LCConstants} to impose the constraints defined in Equation~\ref{eq:LCConstraints}. The \pkg{StMoMo} package also contains the function \code{lc} to facilitate the definition of Lee-Carter models. Hence, we could define the LC model using the much simpler predefined command:  
<<defineLC, eval=TRUE, cache=TRUE>>=
LC <- lc(link = "logit")
@

\subsubsection*{CBD model}

To define the CBD model we use the following commands:
<<defineCBDlong, eval = FALSE>>=
f2 <- function(x, ages) x - mean(ages)
CBD <- StMoMo(link = "logit", staticAgeFun = FALSE, 
  periodAgeFun = c("1", f2))
@
\noindent
Here, we note that function \code{f2} defines the second age-modulating parameter $\beta^{(2)}_x = x - \bar{x}$ and that a \code{constFUN} argument need not be provided since the CBD model does not have identifiability issues. Alternatively, we can define the CBD model using the predefined function \code{cbd}:
<<defineCBD, eval=TRUE, cache=TRUE>>=
CBD <- cbd()
@


\subsubsection*{APC model, RH model and M7 model}
The APC, RH and M7 models could be defined by implementing explicitly the discussions in Sections~\ref{ex:RH},~\ref{ex:APC} and~\ref{ex:M7}. However, \pkg{StMoMo} includes predefined functions \code{apc}, \code{rh}, \code{m7} that facilitate the definition of the APC model, the RH model and model M7, respectively.\footnote{The \pkg{StMoMo} package also includes functions \code{m6} and \code{m8} for defining models M6 and M8. We also note that the \citet{Renshaw2006} cohort extension of the Lee-Carter in Equation~\ref{eq:LCCohortEffect} can be defined using the function \code{rh} with argument \code{cohortAgeFun = "NP"}.} Thus, these models are defined with the code:
<<defineAPC_RH_M7, eval=TRUE, cache=TRUE>>=
RH <- rh(link = "logit",  cohortAgeFun = "1")
APC <- apc(link = "logit")
M7 <- m7()
@  

\subsubsection*{PLAT model}
The package \pkg{StMoMo} does not include a predefined function for the Plat model. Nevertheless, recalling Section~\ref{ex:Plat}, we can define the reduced Plat model using the code:
<<definePlat, eval=TRUE, cache=TRUE>>=
f2 <- function(x, ages) mean(ages) - x
constPlat <- function(ax, bx, kt, b0x, gc, wxt, ages){
  nYears <- dim(wxt)[2]
  x <- ages
  t <- 1:nYears
  c <- (1 - tail(ages, 1)):(nYears - ages[1])
  xbar <- mean(x)
  phiReg <- lm(gc ~ 1 + c + I(c ^ 2), na.action = na.omit)
  phi <- coef(phiReg)
  gc <- gc - phi[1] - phi[2] * c - phi[3] * c ^ 2 
  kt[2, ] <- kt[2, ] + 2 * phi[3] * t
  kt[1, ] <- kt[1, ] + phi[2] * t + phi[3] * (t ^ 2 - 2 * xbar * t)  
  ax <- ax + phi[1] - phi[2] * x + phi[3] * x ^ 2
  ci <- rowMeans(kt, na.rm = TRUE)
  ax <- ax + ci[1] + ci[2] * (xbar - x) 
  kt[1, ] <- kt[1, ] - ci[1]
  kt[2, ] <- kt[2, ] - ci[2]
  list(ax = ax, bx = bx, kt = kt, b0x = b0x, gc = gc)
}
PLAT <- StMoMo(link = "logit", staticAgeFun = TRUE,
  periodAgeFun = c("1", f2), cohortAgeFun = "1", constFun = constPlat)
@
\noindent
We note that the constraint function \code{constPlat} is the \R implementation of transformations (\ref{eq:PlatTrans2}) and (\ref{eq:PlatTrans1}) omitting the terms involving $\kappa^{(3)}$ and $c_3$, and with constants $\phi_1$, $\phi_2$, $\phi_3$ obtained via the linear regression defined in (\ref{eq:PlatConstFun1}) and constants $c_1$ and $c_2$ as in Equation~\ref{eq:PlatConstFun3}.  Function \code{constPlat} imposes the constraints in Equation~\ref{eq:PlatConst}.\\


\section{Model fitting}\label{sec:Fitting}
Parameter estimates of GAPC stochastic mortality models can be obtained by maximizing the model log-likelihood, which is given by
\begin{equation*}\label{eq:PoissonLogL}
\LogL({d_{xt}},{\hat{d}_{xt}}) = \sum_{x}  \sum_{t}\omega_{xt}\left\{{d_{xt}}\log{\hat{d}_{xt}}-\hat{d}_{xt}-\log{d_{xt}!}\right\}
\end{equation*}
in the case of a Poisson distribution of deaths, and by
\begin{equation*}\label{eq:BinomialLogL}
\LogL({d_{xt}},{\hat{d}_{xt}}) = \sum_{x}  \sum_{t}\omega_{xt}\left\{{d_{xt}}\log\left(\frac{\hat{d}_{xt}}{E^0_{xt}}\right) + (E^0_{xt}-d_{xt})\log\left(\frac{E^0_{xt}-\hat{d}_{xt}}{E^0_{xt}}\right) + \rev{\log}\binom{E^0_{xt}}{d_{xt}} \right\}
\end{equation*}
in the case of a Binomial distribution of deaths. {In both cases, $\omega_{xt}$ are weights taking the value $0$ if a particular $(x,t)$ data cell is omitted or $1$ if the cell is included, and}
\[\hat{d}_{xt}=E_{xt}\,g^{-1}\left(\alpha_x + \sum_{i=1}^N \beta_x^{(i)}\kappa_t^{(i)} + \beta_x^{(0)}\gamma_{t-x}\right)\] is the expected number of deaths predicted by the model, with $g^{-1}$ denoting the inverse of the \textit{link function} $g$.\\

In the mortality literature, maximization of the log-likelihood is typically performed using the Newton-Raphson iterative procedure tailored for each model (see, e.g., \citet{Brouhns2002}, \citet{Renshaw2006} and  \citet{Cairns2009}). This is in fact the approach implemented in {the} packages \pkg{ilc} and \pkg{LifeMetrics}. Nonetheless, as discussed extensively by \cite{Currie2014}, many stochastic mortality models are examples of generalized linear models or generalized non-linear model, which facilitates their fitting using standard statistical software.\footnote{\citet{Haberman2011a} have also noticed this fact and profit from GLM facilities in standard statistical packages when fitting CBD type models.} \citet{Currie2014} exemplifies this fact by fitting several stochastic mortality models in \R using the standard function \code{glm} or the function \code{gnm} of {the} package \pkg{gnm} \citep{Turner2012}.\footnote{{\citet{Debon2010} also discuss the use of the package \pkg{gnm} for fitting Lee-Carter type models.}}\\

\pkg{StMoMo} provides the generic function \code{fit} for estimating the parameters of GAPC mortality models. In line with the remarks of \citet{Currie2014}, the corresponding \code{S3} method for objects of {the} class \code{"StMoMo"} \rev{heavily} relies on the function \code{gnm} of {the} package \pkg{gnm} to estimate the parameters of a GAPC model.%\footnote{The current version of \pkg{StMoMo} only includes the \code{S3} method \code{fit.StMoMo} for objects of class \code{"StMoMo"}, but future versions may provide tailored \code{fit} methods for particular stochastic mortality models.} 
Internally, this is accomplished by constructing the equivalent \pkg{gnm} formulation of the GAPC mortality model.\footnote{We note that when all the $\beta^{(i)}_x$ are parametric functions of age, the model is a GLM and therefore \code{gnm} by default resorts to the \code{glm} function of \R when fitting the parameters of the model.} For instance, the \pkg{gnm} formula of the Binomial \code{LC} model created before is
<<showGNMformulaLC, eval=TRUE, cache=TRUE, dependson="defineLC">>=
LC$gnmFormula
@  
\noindent
while the \pkg{gnm} formula of the Binomial \code{CBD} model defined before is
<<showGNMformulaCBD, eval=TRUE, cache=TRUE, dependson="defineCBD">>=
CBD$gnmFormula
@  

We now illustrate the usage of the function \code{fit} of {the} package \pkg{StMoMo} by fitting the six models defined before to England and Wales mortality data. \rev{Function \code{fit} expects the user to provide a list of class \code{"StMoMoData"} containing deaths and exposures in a matrix format with ages on the rows and calendar years on the columns. Such a type of list can readily be created using the \code{StMoMoData} function as exemplified in Section \ref{sec:ParameterUncertainty}. For illustration purposes, the object \code{EWMaleData}, included in {the} package \pkg{StMoMo}, contains deaths counts (\code{EWMaleData\$Dxt}) and central exposures (\code{EWMaleData\$Ext}) for England and Wales males for the period 1961-2011 and for ages 0-100 obtained from the \cite{HumanMortalityDatabase2014}.} 
<<EWData, eval=TRUE, cache=TRUE>>=
EWMaleData
@

{However, in our examples we concentrate} on ages 55 to 89 as the CBD model and the M7 model have been particularly designed to fit higher ages. Additionally, since some models include cohort effects and in agreement with the usual practice (see e.g \citet{Cairns2009} and \citet{Haberman2011a}), we exclude (by setting $\omega_{xt}=0$) all cohorts that have fewer than three observations. \rev{Missing values for either death counts or exposures are automatically zero-weighted.}\\

Models LC, APC, CBD, M7 and PLAT can be fitted to England and Wales male mortality data for ages 55 to 89 using the code:
<<fit1, eval=TRUE, cache=TRUE, results=hide, dependson="defineLC";"defineAPC_RH_M7";"definePlat">>=
EWMaleIniData <- central2initial(EWMaleData)
ages.fit <- 55:89
wxt <- genWeightMat(ages = ages.fit, years = EWMaleIniData$years, 
  clip = 3)
LCfit <- fit(LC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
APCfit <- fit(APC, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
CBDfit <- fit(CBD, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
M7fit <- fit(M7, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
PLATfit <- fit(PLAT, data = EWMaleIniData, ages.fit = ages.fit, wxt = wxt)
@
\noindent
From this code we note the following:
\begin{itemize}
  \item In order to match the logit-Binomial setting used before in the definition of the mortality models, initial exposures are approximated by transforming the available central exposures. \rev{This is accomplished using the utility function \code{central2initial} of package \pkg{StMoMo}.}
  \item The first and last three cohorts years are excluded from the fitting via the argument \code{wxt}. The appropriate 0-1 weighting matrix, \code{wxt}, is constructed using the utility function \code{genWeightMat} of package \pkg{StMoMo}.
\end{itemize}

The fitting of the RH model requires some care as it is well known that fitting cohort extensions of the Lee-Carter models is problematic \citep{Hunt2014a}. In particular, \cite{Currie2014} has encountered convergence issues when using package \pkg{gnm} to fit the RH model. As a possible way to circumvent these issues, \cite{Currie2014} suggests the use of appropriate starting values when fitting model RH. This can be achieved in the function \code{fit} via input arguments \code{start.ax}, \code{start.bx}, \code{start.kt}, \code{start.b0x}, and \code{start.gc}. Using the parameters of the Lee-Carter model as starting values, model RH can be fitted with the code:\footnote{\rev{\pkg{StMoMo} also implements the fitting method suggested by \citet{Hunt2014a} to solve the convergence issues of the RH model. This can be acomplished by setting argument \code{approxConst = TRUE} when defining the RH model, that is, \code{rh(approxConst=TRUE).}}}
  
<<fitRH, eval=TRUE, cache=TRUE, results=hide, dependson="defineAPC_RH_M7">>=
RHfit <- fit(RH, data = EWMaleIniData,  ages.fit = ages.fit, wxt = wxt, 
  start.ax = LCfit$ax, start.bx = LCfit$bx, start.kt = LCfit$kt)
@

The output from the function \code{fit} is an object of {the} class \code{"fitStMoMo"} including, among other things, the following information:
\begin{itemize}  
  \item \code{model}: the \code{"StMoMo"} object defining the underlying GAPC stochastic mortality model;
  \item \code{ax}, \code{bx}, \code{kt}, \code{b0x}, \code{gc}: the estimated parameters;
  \item \code{loglik}: the log-likelihood of the model;
  \item \code{deviance}: the model deviance;
  \item \code{nobs}: the number of observations in the data;
  \item \code{npar}: the effective number of parameters of the model;  
  \item \code{fittingModel}: the output of the \code{gnm} call used to fit the model.
\end{itemize}
There are {\code{print}}, \code{plot}, \code{fitted}, and \code{residuals} methods for the \code{"fitStMoMo"} class. For instance, Figures~\ref{fig:ParamLC},~\ref{fig:ParamCBD} and~\ref{fig:ParamAPC} depicting the fitted parameters of the LC model, the CBD model and the APC model, respectively, were produced with the code:
<<plotParam, eval=FALSE>>=
plot(LCfit, nCol = 3)
plot(CBDfit, parametricbx = FALSE)
plot(APCfit, parametricbx = FALSE, nCol = 3)
@ 

\setkeys{Gin}{width=1.0\textwidth}
\begin{figure}[!bt]  
\centering
<<plotLC, fig=TRUE, width=8, height = 2.8, echo=FALSE, cache=TRUE, dependson="fit1">>=
plot(LCfit, nCol = 3)
@ 
\caption{Parameters for the Lee-Carter (LC) model fitted to the England and Wales male population for ages 55-89 and the period 1961-2011.}\label{fig:ParamLC}  
\end{figure}

\setkeys{Gin}{width=0.6875\textwidth}
\begin{figure}[!tb]  
\centering
<<plotCBD, fig=TRUE, width=8, height = 4.2, echo=FALSE, cache=TRUE, dependson="fit1">>=
plot(CBDfit, parametricbx = FALSE)
@ 
\caption{Parameters for the CBD model fitted to the England and Wales male population for ages 55-89 and the period 1961-2011.}\label{fig:ParamCBD}  
\end{figure}

\setkeys{Gin}{width=1.0\textwidth}
\begin{figure}[!bt]  
\centering
<<plotAPC, fig=TRUE, width=8, height = 2.8, echo=FALSE, cache=TRUE, dependson="fit1">>=
plot(APCfit, parametricbx = FALSE, nCol = 3)
@ 
\caption{Parameters for the APC model fitted to the England and Wales male population for ages 55-89 and the period 1961-2011.}\label{fig:ParamAPC}  
\end{figure}


\section{Goodness-of-fit analysis}\label{sec:GoF}


<<resAll, eval=TRUE, echo=FALSE, cache=TRUE, dependson="fit1";"fitRH">>=
LCres <- residuals(LCfit)
CBDres <- residuals(CBDfit)
APCres <- residuals(APCfit)
RHres <- residuals(RHfit)
M7res <- residuals(M7fit)
PLATres <- residuals(PLATfit)
@ 
{The goodness-of-fit of mortality models is typically analyzed} by inspecting the residuals of the fitted model. Regular patterns in the residuals indicate the inability of the model to describe all the features of the data appropriately. With a Poisson or Binomial random component, it is appropriate to look at the scaled deviance residuals defined as:  

\begin{equation*}\label{eq:Residuals}
r_{xt}=\mathrm{sign}(d_{xt}-\hat{d}_{xt})\sqrt{\frac{\mathrm{dev}(x,t)}{\hat{\phi}}}, \qquad \hat{\phi} = \frac{D(d_{xt},\hat{d}_{xt})}{K-\nu},
\end{equation*}
where 
\[\mathrm{dev}(x,t) = \rev{2\left[d_{xt}\log\left(\frac{d_{xt}}{\hat{d}_{xt}}\right) - (d_{xt} - \hat{d}_{xt})\right]}\]
for a Poisson random component, or
\[\mathrm{dev}(x,t) = \rev{2\left[d_{xt}\log\left(\frac{d_{xt}}{\hat{d}_{xt}}\right) + (E^0_{xt}-d_{xt})\log\left(\frac{E^0_{xt}-d_{xt}}{E^0_{xt}-\hat{d}_{xt}}\right)\right]}\]
for a Binomial random component. {Further,
\[D(d_{xt},\hat{d}_{xt}) = \sum_{x}  \sum_{t}\omega_{xt}\mathrm{dev}(x,t)\] 
is the total deviance of the model, $K=\sum_x\sum_t\omega_{xt}$ is the number of observations in the data and} $\nu$ is the effective number of parameters in the model.\\

In \pkg{StMoMo} standardized deviance residuals can be obtained with the generic function \code{residuals} applied to a fitted stochastic mortality model of {the} class \code{"fitStMoMo"}. For example, to obtain the residuals of the LC model and of the CBD model fitted before we use the commands:
<<resLC_CBD, eval=FALSE>>=
LCres <- residuals(LCfit)
CBDres <- residuals(CBDfit)
@ 
\noindent
Graphs of these residuals can be produced using the generic function \code{plot}. This function supports, via the argument \code{type}, three types of plots: 
\begin{itemize}
\item scatter plots of residuals by age, period and cohort such as those extensively used in \citet{Haberman2011a};
\item black and white sign-plots of the residuals such as those used in \citet{Cairns2009} and \citet{Lovasz2011}; and 
\item color maps (heat-maps) of the residuals. 
\end{itemize}


Figure~\ref{fig:ResHeatmap} presents heat-maps of the deviance residuals for the six models fitted to the England and Wales male mortality experience. These charts were produced using function \code{plot} with option \code{type = "colourmap"}. For instance, Figure~\ref{fig:ResHeatmapLC} was obtained with the code:  
<<plotLCresShow, eval=FALSE>>=
plot(LCres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
\noindent
From Figure~\ref{fig:ResHeatmap} we see that models LC, CBD and APC display strong residual patterns while the residuals of models RH, M7 and PLAT look reasonably random. The APC model shows a strong clustering of residuals due to its inability to allow for varying improvement rates with age. The LC and CBD models, which do not incorporate a cohort effect, show very marked diagonals patterns indicating the inability of these models to capture the well-known cohort effect observed in the England and Wales population \citep{Willets2004}. The issues with the fit of the LC and CBD models become more evident when looking at scatter plots of the residuals by age, period and cohort. Such plots for the LC model (Figure~\ref{fig:ResScatterLC}) and the CBD model (Figure~\ref{fig:ResScatterCBD}) can be produced using argument \code{type = "scatter"} of the function \code{plot} via the commands:
<<plotResShow, eval=FALSE>>=
plot(LCres, type = "scatter", reslim = c(-3.5, 3.5))
plot(CBDres, type = "scatter", reslim = c(-3.5, 3.5))
@ 
\noindent

\setkeys{Gin}{width=0.5\textwidth}
\begin{figure}[!htb]  
    \centering
    \subfloat[LC]{
<<plotLCres, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(LCres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
\label{fig:ResHeatmapLC}}
  	\subfloat[CBD]{
<<plotCBDres, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(CBDres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
\label{fig:ResHeatmapCBD}}\,		
\subfloat[APC]{
<<plotAPCres, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(APCres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
}
\subfloat[RH]{
<<plotRHres, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(RHres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
}\,
\subfloat[M7]{
<<plotM7res, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(M7res, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
}
\subfloat[PLAT]{
<<plotPLATres, fig=TRUE, width=4, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
par(mar=c(4.5, 4, 1, 1))
plot(PLATres, type = "colourmap", reslim = c(-3.5, 3.5))
@ 
}
		\caption{Heat-maps of deviance residuals for different model fitted to the England and Wales males population for ages 55-89 and the period 1961-2011.}\label{fig:ResHeatmap}		
\end{figure}

\setkeys{Gin}{width=1.0\textwidth}
\begin{figure}[!htb]  
    \centering
    \subfloat[LC]{
<<plotLCresScatter, fig=TRUE, width=10, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
plot(LCres, type = "scatter", reslim = c(-3.5, 3.5))
@ 
\label{fig:ResScatterLC}}\,
\subfloat[CBD]{
<<plotCBDresScatter, fig=TRUE, width=10, height=3, echo=FALSE, cache=TRUE, dependson="resAll">>=
plot(CBDres, type = "scatter", reslim = c(-3.5, 3.5))
@ 
\label{fig:ResScatterCBD}} 	
\caption{Scatter plots of deviance residuals for models LC and CBD fitted to the England and Wales males population for ages 55-89 and the period 1961-2011.}\label{fig:ResScatter}		
\end{figure}


The right panels in Figure~\ref{fig:ResScatter} clearly show that the LC and CBD models are unable to capture the cohort effect. In addition, the left panel in Figure~\ref{fig:ResScatterCBD} reveals some strong patterns by age, reflecting the lack of a quadratic age term in the CBD which may be necessary to capture the commonly observed curvature of the mortality rates in a logit scale.\\


When evaluating the goodness-of-fit of different models, it is generally anticipated that models with more parameters provide a better fit to the data. To rule out the possibility that the better fit observed in a model is the result of over-parametrization and compare the relative performance of several models, it has become common in the mortality literature to use information criteria which modify the maximum likelihood criterion by penalizing models with more parameters.\footnote{For examples of such analysis see \citet[Section 6.1.1]{Cairns2009}, \citet[Section 3.3]{Haberman2011a}, and \citet[Section 4.1]{Lovasz2011}.} Two of these criteria are the Akaike Information Criteria (AIC) and the Bayesian Information Criteria (BIC), defined as $AIC = 2\nu-2\LogL$ and $BIC = \nu \log K - 2\LogL$, respectively, with a lower value of AIC and BIC being preferable. In \R these information criteria can be computed using the generic functions \code{AIC} and \code{BIC}. For example, we can get the AIC and BIC of the CBD model as follows:
<<BICCBD, cache=TRUE, dependson="fit1">>=
AIC(CBDfit)
BIC(CBDfit)
@ 
Table~\ref{tab:InfoCriteria} presents AIC and BIC values for the six models fitted to the England and Wales male data. We note that both criteria lead to the same ranking of models with M7, PLAT, and RH being the best performing models. Overall, these results are consistent with the existing literature comparing single population models, where the Renshaw-Haberman extension of the Lee-Carter model and the M7 model have been identified as good candidates for modeling mortality in the England and Wales population \citep{Cairns2009,Haberman2011a}.   
<<BICtable, echo=FALSE, cache=TRUE, dependson="fit1", results=tex, message=FALSE, warning=FALSE>>=
library(xtable)
BICdf <-data.frame(list(LC = c(LCfit$npar[1], AIC(LCfit), BIC(LCfit)), 
                        CBD = c(CBDfit$npar[1], AIC(CBDfit), BIC(CBDfit)), 
                        APC = c(APCfit$npar[1], AIC(APCfit), BIC(APCfit)), 
                        RH = c(RHfit$npar[1], AIC(RHfit), BIC(RHfit)), 
                        M7 = c(M7fit$npar[1], AIC(M7fit), BIC(M7fit)), 
                        PLAT = c(PLATfit$npar[1], AIC(PLATfit), BIC(PLATfit))))

rownames(BICdf) <- c("Number of parameters", "AIC", "BIC")
AIC.rank <- rank(BICdf[2,])
BIC.rank <- rank(BICdf[3,])
temp <- BICdf
BICdf[2,] <- paste(round(temp[2,],0), "(",AIC.rank,")", sep = "")
BICdf[3,] <- paste(round(temp[3,],0), "(",BIC.rank,")", sep = "")
xtable(BICdf, dec = 1, caption = "Number of parameters, AIC and BIC values (with their respective rankings in brackets) for different model fitted to the England and Wales males population for ages 55-89 and the period 1961-2011.", center = "centering", file = "", floating = TRUE,label="tab:InfoCriteria",align="lcccccc", digits = 0)
@ 

\section{Forecasting and simulation with stochastic mortality models}\label{sec:Forecasting}
In the family of GAPC stochastic mortality models the dynamics of mortality are driven by the period indexes $\kappa^{(i)}_t$, $i=1,\ldots, N$, and the cohort index $\gamma_{t-x}$. Therefore, the forecasting and simulation of mortality rates requires the modeling of these indexes using time series techniques.\\

\rev{For the period indexes we consider two alternative modelling approaches.  A first possibility is to use the standard approach in the actuarial literature \citep{Cairns2006a, Cairns2011a, Haberman2011a, Lovasz2011} and assume that the period indexes follow a multivariate random walk with drift. That is,}

%For the period indexes we follow the standard approach \citep{Cairns2006a, Cairns2011a, Haberman2011a, Lovasz2011} and use a multivariate random walk with drift. Specifically, we assume that
\begin{equation}\label{eq:GAPCPeriodIndex}
\boldsymbol{\kappa}_t = \boldsymbol{\delta} + \boldsymbol{\kappa}_{t-1} + \boldsymbol{\xi}_t^\kappa, \qquad \boldsymbol{\kappa}_t = 
\left(\!
    \begin{array}{c}
      \kappa_t^{(1)} \\
      \vdots\\
  		\kappa_t^{(N)}
    \end{array}
  \!\right), \qquad \boldsymbol{\xi}_t^\kappa\sim N(\mathbf{0},\Sigma),
\end{equation}
where {$\boldsymbol{\delta}$} is an $N$-dimension vector of drift parameters and $\Sigma$ is the $N\times N$ variance-covariance matrix of the multivariate white noise $\boldsymbol{\xi}_t^\kappa$. \\

\rev{A second alternative is to assume that the individual period indexes, $\kappa_t^{(i)}$, $i=1,\ldots,N$, follow a general univariate ARIMA model. Under this approach, the $i$-th period index, $\kappa_t^{(i)}$, is assumed to follow  an ARIMA$(p_i,q_i,d_i)$ with drift, so that}

\begin{equation}\label{eq:GAPCPeriodArima}
\Delta^{d_i}\kappa_t^{(i)} =  \delta_{0}^{(i)} + \phi_1^{(i)}\Delta^{d_i}\kappa^{(i)}_{t-1} +\cdots+ \phi^{(i)}_{p_i}\Delta^{d_i}\kappa^{(i)}_{t-p_i}+\xi^{(i)}_{t}+\delta^{(i)}_1\xi^{(i)}_{t-1}+\cdots+\delta^{(i)}_{q_i}\xi^{(i)}_{t-q_i},
\end{equation}
\rev{where $\Delta$ is the difference operator, $\delta_0^{(i)}$ is the drift parameter, $\phi_1^{(i)},\dots,\phi^{(i)}_{p_i}$ are the autoregressive coefficients with $\phi_{p_i}\neq 0$,  $\delta_1^{(i)},\dots,\delta^{(i)}_{q_i}$ are the moving average coefficients with $\delta^{(i)}_{q_i}\neq 0$ and $\xi^{(i)}_t$ is a Gaussian white noise process with variance $\sigma_{\xi}^{(i)}$.}\\ 


As pointed out by \citet{Currie2014}, the main challenge when forecasting stochastic mortality models is specifying the dynamics of the cohort effect. To have a simple starting point, we follow previous studies \citep{Renshaw2006, Cairns2011a, Lovasz2011} and assume that the cohort index, $\gamma_{t-x}$, follows a univariate ARIMA process which is independent of the period index, $\boldsymbol{\kappa}_t$. In general, we assume that $\gamma_c\equiv \gamma_{t-x}$ follows an ARIMA$(p,q,d)$ with drift, so that 
\begin{equation}\label{eq:GAPCCohortIndex}
\Delta^d\gamma_c =  \delta_0 + \phi_1\Delta^d\gamma_{c-1} +\cdots+ \phi_p\Delta^d\gamma_{c-p}+\epsilon_{c}+\delta_1\epsilon_{c-1}+\cdots+\delta_q\epsilon_{c-q},
\end{equation}
\rev{where $\epsilon_c$ is a Gaussian white noise process with variance $\sigma_{\epsilon}$.}\\ 

The time series models in \rev{Equations~\ref{eq:GAPCPeriodIndex}, \ref{eq:GAPCPeriodArima} and \ref{eq:GAPCCohortIndex}} can be used to obtain projected (simulated) values of the period index  $\dot{\boldsymbol{\kappa}}_{t_n+s}:=\left(\dot{\kappa}^{(1)}_{t_n+s}, \ldots, \dot{\kappa}^{(N)}_{t_n+s}\right)'$ and cohort index $\dot{\gamma}_{t_n + s - x_1}$, $s=1,\ldots,h$, to derive forecasted (simulated)  values of the predictor
\begin{equation*}\label{eq:ProjectedPredictor}
\dot{\eta}_{x,t_n+s} = \alpha_x + \sum_{i=1}^N \beta_x^{(i)}\dot{\kappa}_{t_n+s}^{(i)} + \beta_x^{(0)}\dot{\gamma}_{t_n+s-x},
\end{equation*}
which can in turn be used to obtain forecasted (simulated) age-specific central mortality rates, $\dot{\mu}_{x,t_n+s}$ or age-specific one-year \rev{death probabilities}, $\dot{q}_{x,t_n+s}$.\\

\renewcommand{\arraystretch}{1.5}
\begin{table}[tb]
  \centering
		\begin{tabular}{m{4cm} m{8cm} }
		\toprule		
		 \textbf{Mortality model}& \textbf{Model for $\gamma_{t-x}$} \tabularnewline \midrule		
    APC                      & ARIMA$(1,1,0)$ with drift \tabularnewline
		RH                       & ARIMA$(1,1,0)$ with drift\tabularnewline  	
		M7                       & ARIMA$(2,0,0)$ with non-zero intercept\tabularnewline
    PLAT                     & ARIMA$(2,0,0)$ with non-zero intercept \tabularnewline
		\bottomrule
		\end{tabular}		
		\caption{ARIMA models for the cohort effect for models APC, RH, M7 and PLAT.}\label{tab:ARIMAModels}  	
\end{table}
\renewcommand{\arraystretch}{1}


In {the} package \pkg{StMoMo} the forecasting of GAPC stochastic mortality models is implemented via the generic method \code{forecast}. This function estimates and forecasts the multivariate random walk with drift in Equation~\ref{eq:GAPCPeriodIndex} using the approach described in \citet[Appendix B]{Haberman2011a}\footnote{This is implemented in the function \code{mrwd} of {the} package \pkg{StMoMo}.} and uses function \code{Arima} of package \pkg{forecast} \citep{Hyndman2008, Hyndman2015} to estimate and forecast the ARIMA processes of \rev{Equation~\ref{eq:GAPCPeriodArima} and Equation~\ref{eq:GAPCCohortIndex}}. For instance, if we assume \rev{that the period indexes follow a multivariate random walk with drift (the default in \pkg{StMoMo}) and} that the cohort indexes of the APC, RH, M7, and PLAT model follow the ARIMA processes specified in Table~\ref{tab:ARIMAModels}, 50-year ahead ($h=50$) central projections of the period indexes, cohort index, and one-year death probabilities for the England and Wales mortality experience can be obtain with the code:
<<forAll, cache=TRUE, dependson="fit1";"fitRH">>=
LCfor <- forecast(LCfit, h = 50)
CBDfor <- forecast(CBDfit, h = 50)
APCfor <- forecast(APCfit, h = 50, gc.order = c(1, 1, 0))
RHfor <- forecast(RHfit, h = 50, gc.order = c(1, 1, 0))
M7for <- forecast(M7fit, h = 50, gc.order = c(2, 0, 0))
PLATfor <- forecast(PLATfit, h = 50, gc.order = c(2, 0, 0))
@ 
\rev{Alternatively, we could assume that the period indexes follow independent univariate ARIMA models. This is achieved in function \code{forecast} by setting the argument \code{method = "iarima"}. For example, projections of the Lee-Carter model under the assumption that $\kappa_t^{(1)}$ follows an ARIMA(1,1,2) with drift are produced with the code:}

<<forLCArima, cache=TRUE, dependson="fit1">>=
LCforArima <- forecast(LCfit, h = 50, kt.method = "iarima", 
  kt.order = c(1, 1, 2))
@ 
\rev{By setting \code{kt.order = NULL} the function \code{auto.arima} from package \pkg{forecast} selects the best ARIMA process for each period index.}

The output from the function \code{forecast} is an object of {the} class \code{"forStMoMo"} including, among other things, the following information:
\begin{itemize}  
  \item \code{rates}: a matrix with the central projection of the mortality rates, $\dot{\mu}_{x,t_n+s}$ or $\dot{q}_{x,t_n+s}$, $s=1,\ldots,h$;
  \item \code{kt.f}: a list containing information on the multivariate random walk with drift fitted to the period index $\boldsymbol{\kappa}_t$; and 
  \item \code{gc.f}: a list containing information on the ARIMA model fitted to the cohort index $\gamma_{t-x}$.
  \end{itemize}
There are {\code{print}} and \code{plot} methods for the \code{"forStMoMo"} class. For instance, plots of the forecast of the period indexes of models LC, RH, M7 and PLAT (see Figures~\ref{fig:PeriodIndexForecast} and \ref{fig:PeriodIndexForecastLC}) can be produced using the code:  

<<plotForecastPeriodIndexShow, eval=FALSE>>=  
plot(LCfor, only.kt = TRUE)
plot(LCforArima, only.kt = TRUE)
plot(RHfor, only.kt = TRUE)
plot(M7for, only.kt = TRUE, nCol = 3)
plot(PLATfor, only.kt = TRUE)
@
\noindent
Similarly, plots of the forecast of the cohort indexes of the APC, RH, M7 and PLAT models (see Figure~\ref{fig:CohortIndexForecast}) can be obtained with the commands: 
<<plotForecastCohortIndexShow, eval=FALSE>>=  
plot(APCfor, only.gc = TRUE)
plot(RHfor, only.gc = TRUE)
plot(M7for, only.gc = TRUE)
plot(PLATfor, only.gc = TRUE)
@

\begin{landscape}
\begin{figure}[!htb]  
    \centering
\setkeys{Gin}{width=0.45\textwidth}    
    \subfloat[RH]{
<<plotForecastPeriodIndexRH, fig=TRUE, width=5.2, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(RHfor, only.kt = TRUE, lwd = 2)
@ 
\label{fig:PeriodRH}}
\setkeys{Gin}{width=0.9\textwidth}
\subfloat[PLAT]{
<<plotForecastPeriodIndexPLAT, fig=TRUE, width=10.4, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(PLATfor, only.kt = TRUE, lwd = 2)
@ 
\label{fig:PeriodPLAT}}\,   
\setkeys{Gin}{width=1.35\textwidth}
\subfloat[M7]{
<<plotForecastPeriodIndexM7, fig=TRUE, width=10.4, height=2.8, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(M7for, only.kt = TRUE, nCol = 3, lwd = 2)
@ 
\label{fig:PeriodM7}}   
\caption{Forecast of the period indexes of the RH, M7 and PLAT models applied to the England and Wales males population for ages 55-89 and the period 1961-2011. Dashed lines represent central forecast and dotted lines represent 95\% prediction intervals.}\label{fig:PeriodIndexForecast}    
\end{figure}
\end{landscape}


\setkeys{Gin}{width=0.5\textwidth}
\begin{figure}[!tb]  
    \centering    
    \subfloat[LC with random walk with drift]{
<<plotForecastLCRW, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(LCfor, only.kt = TRUE, lwd = 2)
@ 
\label{fig:PeriodLCRW}}
    \subfloat[LC with ARIMA(1,1,2) with drift]{
<<plotForecastLCArima, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(LCforArima, only.kt = TRUE, lwd = 2)
@ 
\label{fig:PeriodLCARIMA}}
\caption{Forecast of the period indexes of the LC model with random walk with drift and ARIMA(1,1,2) with drift applied to the England and Wales males population for ages 55-89 and the period 1961-2011. Dashed lines represent central forecast and dotted lines represent 95\% prediction intervals.}\label{fig:PeriodIndexForecastLC}    
\end{figure}

\setkeys{Gin}{width=0.5\textwidth}
\begin{figure}[!tb]  
    \centering    
    \subfloat[APC]{
<<plotForecastCohortIndexAPC, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(APCfor, only.gc = TRUE, lwd = 2)
@ 
\label{fig:CohortAPC}}
    \subfloat[RH]{
<<plotForecastCohortIndexRH, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(RHfor, only.gc = TRUE, lwd = 2)
@ 
\label{fig:CohortRH}}\,
    \subfloat[M7]{
<<plotForecastCohortIndexM7, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(M7for, only.gc = TRUE, lwd = 2)
@ 
\label{fig:CohortM7}}
    \subfloat[PLAT]{
<<plotForecastCohortIndexPLAT, fig = TRUE, width=6, height=4, echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(PLATfor, only.gc = TRUE, lwd = 2)
@ 
\label{fig:CohortPLAT}}
\caption{Forecast of the cohort indexes of the APC, RH, M7 and PLAT models applied to the England and Wales males population for ages 55-89 and the period 1961-2011. Dashed lines represent central forecast and dotted lines represent 95\% prediction intervals.}\label{fig:CohortIndexForecast}    
\end{figure}


The package \pkg{StMoMo} also provides the function \code{simulate} for simulating trajectories from GAPC stochastic mortality models. In order to simulate the period index, $\boldsymbol{\kappa}_t$, in \pkg{StMoMo} we implement a multivariate adaptation of Algorithm 2 in \citet{Haberman2009} without provision for parameter error,\footnote{We note that Algorithm 2 in \citet{Haberman2009} is itself an adaptation of the prediction interval approach of \citet{Cairns2006a}.} while to simulate the cohort index, $\gamma_{t-x}$,\footnote{\rev{And the period index when Equation~\ref{eq:GAPCPeriodArima} is assumed.}} function \code{simulate} uses the equivalent \code{S3} method for objects of class \code{"Arima"} provided by {the} package \pkg{forecast}. For example, the code below produces 500 simulated trajectories {for the next 50 years} of the six stochastic mortality models fitted previously to the England and Wales male mortality experience:
<<simAll, cache=TRUE, dependson="fit1";"fitRH">>=
set.seed(1234)
nsim <- 500
LCsim <- simulate(LCfit, nsim = nsim, h = 50)
CBDsim <- simulate(CBDfit, nsim = nsim, h = 50)
APCsim <- simulate(APCfit, nsim = nsim, h = 50, gc.order = c(1, 1, 0))
RHsim <- simulate(RHfit, nsim = nsim, h = 50, gc.order = c(1, 1, 0))
M7sim <- simulate(M7fit, nsim = nsim, h = 50, gc.order = c(2, 0, 0))
PLATsim <- simulate(PLATfit, nsim = nsim, h = 50, gc.order = c(2, 0, 0))
@ 

The output from the function \code{simulate} is an object of {the} class \code{"simStMoMo"} including, among other things, the following information:
\begin{itemize}  
  \item \code{rates}: a three dimensional array with the future simulated mortality rates;
  \item \code{kt.s}: a list containing information on the simulated paths of the period index $\boldsymbol{\kappa}_t$; and 
  \item \code{gc.s}: a list containing information on the simulated paths of the cohort index $\gamma_{t-x}$.
\end{itemize}

\setkeys{Gin}{width=1.0\textwidth}
\begin{figure}[tb]  
\centering
<<plotSimulationRH, eval=TRUE,  fig = TRUE, width=9, height=3, results=hide, warning=FALSE, message=FALSE, cache=TRUE, echo=FALSE, dependson="simAll">>=
par(mfrow=c(1, 3))
plot(RHfit$years, RHfit$kt[1, ], xlim = range(RHfit$years, RHsim$kt.s$years),
  ylim = range(RHfit$kt, RHsim$kt.s$sim[1, , 1:20]),  type = "l",
  xlab = "year", ylab = "kt", main = "Period index")
matlines(RHsim$kt.s$years, RHsim$kt.s$sim[1, , 1:20], type = "l", lty = 1)
#Plot cohort index
plot(RHfit$cohorts, RHfit$gc, xlim = range(RHfit$cohorts, RHsim$gc.s$cohorts),
  ylim = range(RHfit$gc, RHsim$gc.s$sim[, 1:20], na.rm = TRUE), type = "l",
  xlab = "year", ylab = "kt", main = "Cohort index (ARIMA(1,1,0) with drift)")
matlines(RHsim$gc.s$cohorts, RHsim$gc.s$sim[, 1:20], type = "l", lty = 1)
#Plot rates at age 65
qxt <- EWMaleIniData$Dxt / EWMaleIniData$Ext
plot(RHfit$years, qxt["65", ], xlim = range(RHfit$years, RHsim$years), 
  ylim = range(qxt["65", ], RHsim$rates["65", , 1:20]), type = "l",
  xlab = "year", ylab = "rate", main = "Mortality rates at age 65")
matlines(RHsim$years, RHsim$rates["65", , 1:20], type = "l", lty = 1)
@
\caption{20 simulated trajectories of the period index $\kappa^{(1)}_t$, cohort index $\gamma_{t-x}$ and one-year \rev{death probabilities} at age 65 $q_{xt}$ for model RH fitted to the England and Wales males population for ages 55-89 and the period 1961-2011.}\label{fig:SimRH}  
\end{figure}


This output can be used to extract sample trajectories from a model. For instance, Figure~\ref{fig:SimRH}, which depicts 20 trajectories of the period index, cohort index and one-year \rev{death probabilities} at age 65 from model RH, was produced with the code:  
<<plotSimulationRHShow, eval=FALSE>>=
par(mfrow = c(1, 3))
plot(RHfit$years, RHfit$kt[1, ], xlim = range(RHfit$years, 
  RHsim$kt.s$years), ylim = range(RHfit$kt, RHsim$kt.s$sim[1, , 1:20]), 
  type = "l", xlab = "year", ylab = "kt", main = "Period index")
matlines(RHsim$kt.s$years, RHsim$kt.s$sim[1, , 1:20], type = "l", lty = 1)
plot(RHfit$cohorts, RHfit$gc, xlim = range(RHfit$cohorts, 
  RHsim$gc.s$cohorts), ylim = range(RHfit$gc, RHsim$gc.s$sim[, 1:20],
  na.rm = TRUE), type = "l", xlab = "year", ylab = "kt",
  main = "Cohort index (ARIMA(1,1,0) with drift)")
matlines(RHsim$gc.s$cohorts, RHsim$gc.s$sim[, 1:20], type = "l", lty = 1)
qxt <- Dxt / Ext
plot(RHfit$years, qxt["65", ], xlim = range(RHfit$years, RHsim$years), 
  ylim = range(qxt["65", ], RHsim$rates["65", , 1:20]), type = "l", 
  xlab = "year", ylab = "rate", main = "Mortality rates at age 65")
matlines(RHsim$years, RHsim$rates["65", , 1:20], type = "l", lty = 1)
@



\setkeys{Gin}{width=0.28\textwidth}
\begin{figure}[!tb]  
    \centering
    \subfloat[LC]{
<<plotLCfan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="simAll">>=
library(fanplot)
probs = c(2.5, 10, 25, 50, 75, 90, 97.5)
qxt <- EWMaleIniData$Dxt/EWMaleIniData$Ext 
par(mar=c(4.5,4,1,1))
#age 65
plot(LCfit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(LCsim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(LCfit$years, qxt["75", ], pch = 20)
fan(t(LCsim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 75
points(LCfit$years, qxt["85",], pch = 20)
fan(t(LCsim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanLC}}
\subfloat[CBD]{
<<plotCBDfan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="plotLCfan">>=
par(mar=c(4.5,4,1,1))
#age 65
plot(CBDfit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(CBDsim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(CBDfit$years, qxt["75", ], pch = 20)
fan(t(CBDsim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 75
points(CBDfit$years, qxt["85",], pch = 20)
fan(t(CBDsim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanCBD}}
\subfloat[APC]{
<<plotAPCfan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="plotLCfan">>=
par(mar=c(4.5,4,1,1))
#age 65
plot(APCfit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(APCsim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(APCfit$years, qxt["75", ], pch = 20)
fan(t(APCsim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 75
points(APCfit$years, qxt["85",], pch = 20)
fan(t(APCsim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanAPC}}\,
\subfloat[RH]{
<<plotRHfan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="plotLCfan">>=
par(mar=c(4.5,4,1,1))
#age 65
plot(RHfit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(RHsim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(RHfit$years, qxt["75", ], pch = 20)
fan(t(RHsim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 85
points(RHfit$years, qxt["85",], pch = 20)
fan(t(RHsim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanRH}}
\subfloat[M7]{
<<plotM7fan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="plotLCfan">>=
par(mar=c(4.5,4,1,1))
#age 65
plot(M7fit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(M7sim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(M7fit$years, qxt["75", ], pch = 20)
fan(t(M7sim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 85
points(M7fit$years, qxt["85",], pch = 20)
fan(t(M7sim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanM7}}
\subfloat[PLAT]{
<<plotPLATfan, fig=TRUE, width=4, height=5, echo=FALSE, cache=TRUE, dependson="plotLCfan">>=
par(mar=c(4.5,4,1,1))
#age 65
plot(PLATfit$years, qxt["65", ], xlim = c(1960, 2061), ylim = c(0.0025,0.2),
     xlab = "year", ylab = "mortality rate (log scale)", pch = 20, log = "y")
fan(t(PLATsim$rates["65", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("black", "white")), ln = NULL)
#age 75
points(PLATfit$years, qxt["75", ], pch = 20)
fan(t(PLATsim$rates["75", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("red", "white")), ln = NULL)
#age 85
points(PLATfit$years, qxt["85",], pch = 20)
fan(t(PLATsim$rates["85", , ]), start = 2012,
    probs = probs, n.fan = 4,
    fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
#labels
text(1965, qxt[c("65","75", "85"),"1990"], labels=c("x=65", "x=75", "x=85"))
@ 
\label{fig:FanPLAT}}
		\caption{Fan charts for mortality rates $q_{xt}$ at ages $x=65$ (bottom fan),  $x=75$ (middle fan) and $x=85$ (top fan) from the six models fitted to the England and Wales males population for ages 55-89 and the period 1961-2011. The dots show historical mortality rates for 1961-2011. Shades in the fan represent prediction intervals at the 50\%, 80\% and 95\% level.}\label{fig:Fancharts}		
\end{figure}


We can also use the output from function \code{simulate} to produce fan charts depicting the uncertainty  associated with a model forecast. According to \citet{Cairns2011a}, such plots are central to the analysis of the plausibility of the forecast from a model, and can be used as a criterion when deciding upon what is the most appropriate model among a group of possible stochastic mortality models. Figure~\ref{fig:Fancharts} shows fan charts depicting 50\%, 80\% and 95\% prediction intervals for mortality rates at ages 65, 75 and 85 for each of the six models fitted to the England and Wales experience. Such fan charts can be produced using the package \pkg{fanplot} \citep{fanplot2015, Abel2015}. For instance, the fan chart for the Lee-Carter model shown in Figure~\ref{fig:FanLC} was produced with the code:
<<plotLCFanShow, eval=FALSE>>=
library("fanplot")
probs = c(2.5, 10, 25, 50, 75, 90, 97.5)
qxt <- Dxt / Ext 
matplot(LCfit$years, t(qxt[c("65", "75", "85"), ]), 
  xlim = c(1960, 2061), ylim = c(0.0025, 0.2), pch = 20, col = "black", 
  log = "y", xlab = "year", ylab = "mortality rate (log scale)")
fan(t(LCsim$rates["65", , ]), start = 2012, probs = probs, n.fan = 4,
  fan.col = colorRampPalette(c("black", "white")), ln = NULL)
fan(t(LCsim$rates["75", , ]), start = 2012, probs = probs, n.fan = 4,
  fan.col = colorRampPalette(c("red", "white")), ln = NULL)
fan(t(LCsim$rates["85", , ]), start = 2012, probs = probs, n.fan = 4,
  fan.col = colorRampPalette(c("blue", "white")), ln = NULL)
text(1965, qxt[c("65", "75", "85"), "1990"], 
  labels = c("x = 65", "x = 75", "x = 85"))
@ 
\noindent
From Figure~\ref{fig:Fancharts} we note the following:
\begin{itemize}
\item Whilst for models CBD, RH, M7 and PLAT the fans at age 85 are wider than at age 65 in accordance with historical evidence (see \citet[Appendix B]{Cairns2011a}), for models LC and APC the fans at age 85 are narrower than at age 65. This suggests that forecasts from models LC and APC are not plausible for the dataset used in this paper.
\item Forecasts for the PLAT model show an implausible increase of mortality rates. This is because the central trend is linked to the estimated cohort effect $\gamma_{t-x}$ for the PLAT model (see Figure~\ref{fig:CohortPLAT}) which shows a steep upward trend between 1935 and 1955.
\item The central trend and uncertainty levels produced by the each of the models have noticeably differences. This highlights the importance of recognizing model risk as a significant issue when modeling mortality \citep{Cairns2011a}.  
\end{itemize}

\rev{Finally, with the aid of \pkg{StMoMo}'s utility function \code{extractCohort}, one can use the output of \code{forecast} and \code{simulate} to extract the projected death probabilities for specific cohorts. For example, a plot of the  mortality rates projected by the Lee-Carter method for the 1950 cohort (see Figure \ref{fig:LCLifeTable}) can be produced as follow:}
<<plotLifetableShow, eval=FALSE, cache=TRUE, dependson="forAll">>=
plot(55:61, extractCohort(fitted(LCfit, type = "rates"), cohort = 1950), 
  type = "l", log = "y", xlab = "age", ylab = "q(x)", 
  main = "Mortality rates for the 1950 cohort", 
  xlim = c(55,89), ylim = c(0.005, 0.12))
lines(62:89, extractCohort(LCfor$rates, cohort = 1950), lty = 2)
@
\rev{These cohort mortality rates can then be employed to build a life table to perform demographic or actuarial calculations using, for instance, the \pkg{lifecontingencies} package \citep{Spedicato2013}.}  

\setkeys{Gin}{width=0.75\textwidth}
\begin{figure}[!tb]  
\centering
<<plotLifetable, fig=TRUE, width=7.5, height=5,  echo=FALSE, cache=TRUE, dependson="forAll">>=
plot(55:61, extractCohort(fitted(LCfit, type = "rates"), cohort = 1950), 
  type = "l", log = "y", xlab = "age", ylab = "q(x)", 
  main = "Mortality rates for the 1950 cohort", 
  xlim = c(55,89), ylim = c(0.005, 0.12))
lines(62:89, extractCohort(LCfor$rates, cohort = 1950), lty = 2)
@
\caption{Mortality rates for the 1950 generation obtained with the Lee-Carter model. The solid line corresponds to fitted values while the dash line corresponds to projected values.}\label{fig:LCLifeTable}  
\end{figure}


\section{Parameter uncertainty and bootstrapping}\label{sec:ParameterUncertainty}

When analyzing the uncertainty in mortality projections in an actuarial context it is important to consider all sources of risk. However, the prediction intervals (fan charts) obtained in the previous section only account for the uncertainty arising from the error in the forecast of the period and cohort indexes and ignore the uncertainty arising from the estimation of the parameters of the GAPC model. \\ 

Due to the analytical intractability of many stochastic mortality models, parameter uncertainty is typically accounted for using the bootstrap procedure. This procedure yields $B$ samples  $\alpha_{x}^b, \beta_x^{(1),b},\ldots, \beta_x^{(N),b}, \kappa_t^{(1),b},\ldots,\kappa_t^{(N),b}, \beta_x^{(0),b}, \gamma_{t-x}^b$, $b=1,\ldots,B$, of the parameters of the GAPC model which can then be used to produce confidence and prediction intervals of demographic and actuarial quantities. To sample the parameters of GAPC models we consider the semiparametric bootstrap proposed by \citet{Brouhns2005} and the residual bootstrap first considered in \citet{Koissi2006}:

\begin{itemize}
\item \textbf{Semiparametric bootstrap.} \citet{Brouhns2005} propose a semiparametric bootstrap where first $B$ samples of the number of deaths $d_{xt}^b$, $b=1,\ldots,B$, are generated by sampling from the Poisson Distribution with mean $d_{xt}$. Each bootstrapped sample $d_{xt}^b$, $b=1,\ldots,B$, is then used to re-estimate the model to obtain $B$ bootstrapped parameter estimates $\alpha_{x}^b, \beta_x^{(1),b},\ldots, \beta_x^{(N),b}, \kappa_t^{(1),b},\ldots,\kappa_t^{(N),b}, \beta_x^{(0),b}, \gamma_{t-x}^b$, $b=1,\ldots,B$. \citet{Renshaw2008} use the fitted number of death $\hat{d}_{xt}$ (instead of the observed deaths $d_{xt}$) to perform the sampling from the Poisson distribution. \citet{Wang2005} consider a similar semiparametric approach using a Binomial distribution of deaths.

\item \textbf{Residual bootstrap.} Another possibility is to bootstrap the residuals of the model as suggested by \citet{Koissi2006}. Under this approach the deviance residuals $r_{xt}$ are resampled with replacement to generate $B$ replications $r_{xt}^b$, $b=1,\ldots,B$, which are mapped to the corresponding resampled death counts $d_{xt}^b$, $b=1,\ldots,B$, using the inverse formula.  The model is then re-fitted using these resampled number of deaths to produce $B$ sets of estimated parameters $\alpha_{x}^b, \beta_x^{(1),b},\ldots, \beta_x^{(N),b}, \kappa_t^{(1),b},\ldots,\kappa_t^{(N),b}, \beta_x^{(0),b}, \gamma_{t-x}^b$, $b=1,\ldots,B$. We refer to \citet{Koissi2006} and \citet{Renshaw2008} for details on the inverse formula under a Poisson distribution of deaths and to \citet{Debon2010} for the inverse formula under a Binomial framework. Finally, we note that in implementing the residual bootstrap we follow \citet{Renshaw2008} and apply the inverse formula to produce samples of the observed number of deaths rather than samples of the fitted number of deaths as originally done by \citet{Koissi2006}.  
\end{itemize}

In what follows we illustrate the assessment of {the} parameter uncertainty using {the} package \pkg{StMoMo}. In doing so we deviate from the England and Wales example we have used so far and use instead New Zealand mortality data \rev{for males}. This new example follows closely the work of \citet{Li2014} who uses New Zealand mortality data to compare several simulation strategies for assessing the risk in mortality projections with a Poisson Lee-Carter model. The main rationale for the change of dataset is that parameter uncertainty is particularly important when analyzing the mortality of smaller populations such as smaller countries or pension plans.\footnote{{While the population of England and Wales in 2008 was 54.8 million, the population of New Zealand in 2008 was 4.3 million.}} This new example also serves as a means for illustrating the use of {the \pkg{StMoMo} package} with other datasets. \\

%Reinstate if re-running New Zeland example
% <<HMDpassword, eval=TRUE, echo=FALSE>>=  
% username <- ""
% password <- ""
% @ 

Mortality data for New Zealand can be extracted from the \cite{HumanMortalityDatabase2014} using function \code{hmd.mx} of  {the} \pkg{demography} package with the code:

%Reinstate if re-running New Zeland example
% <<loadNZ, eval=TRUE, results='hide', cache=TRUE, warning=FALSE, message=FALSE >>=  
%   library(demography)
%   NZdata <- hmd.mx(country = "NZL_NP", username = username,
%                    password = password)
% @ 

<<loadNZ, eval=FALSE>>=  
library("demography")
NZdata <- hmd.mx(country = "NZL_NP", username = username,
  password = password, label = "New Zealand")
@ 

\noindent
We note that the \code{username} and \code{password} above are for the Human Mortality Database and should be replaced appropriately. \rev{We can then transform the Human Mortality Database data for Kiwi males into \pkg{StMoMo}'s format using function \code{StMoMoData}:}

<<transNZ, eval=FALSE>>=  
NZStMoMo <- StMoMoData(NZdata, series = "male")
@


Following \citet{Li2014}, we fit a Poisson Lee-Carter model to New Zealand male data for ages 0 to 89 and for the period 1985 to 2008. This can be carried out with the commands:\footnote{\rev{We note that fitting a Poisson Lee-Carter model can also be done using function \code{lca} in \pkg{demography} by setting argument \code{adjust="dxt"}. Similarly, we can a fit a Poisson Lee-Carter model using function \code{lca.rh} in package \pkg{ilc}.}}
%Reinstate if re-running New Zeland example
% <<fitNZ, eval=TRUE, results='hide', cache=TRUE, warning=FALSE, message=FALSE, dependson="loadNZ" >>=  
%   Ext_NZ <- NZdata$pop$male 
%   Dxt_NZ <- NZdata$rate$male * Ext_NZ
%   LCfit_NZ <- fit(lc(), Dxt = Dxt_NZ, Ext = Ext_NZ, ages = NZdata$age, 
%              years = NZdata$year, ages.fit = 0:89, years.fit = 1985:2008)
% @ 

<<fitNZ, eval=FALSE>>=  
LCfit_NZ <- fit(lc(), data = NZStMoMo, ages.fit = 0:89, 
  years.fit = 1985:2008)
@

In {the \pkg{StMoMo} package} the bootstrap of GAPC stochastic mortality models is implemented with the generic function \code{bootstrap}. This functions supports both the semiparametric bootstrap and the residual bootstrap. For instance, 5000 semiparametric bootstrap samples of the Lee-Carter model can be obtained with the code:
<<bootLCNZ, eval=FALSE>>=  
LCboot_NZ <- bootstrap(LCfit_NZ, nBoot = 5000, type = "semiparametric")
@
%Reinstate if re-running New Zeland example
%Do this this way as bootstrapping takes a while
% <<loadLCNZboot, eval=TRUE,echo = FALSE, message=FALSE, warning=FALSE, cache=TRUE>>=
% load("LCbootSemiObsNZ.RData")
% LCboot_NZ <- LCbootSemiObs
% @
\noindent
\rev{We note that the bootstrap is a computationally intensive procedure. In particular, the 5000 semiparametric bootstrap samples of the Lee-Carter model took about two hours to run.}\footnote{In running this code, we used a computer with an Intel Core i5-3320m processor
running at 2.60 GHz under Windows 7 Home Premium Edition (64 bits) with 8 GB of RAM.}\\

The output from function \code{bootstrap} is an object of {the} class \code{"bootStMoMo"} in which the component \code{bootParameters} contains the \code{nBoot} replications of the bootstrap parameters. A fan chart depicting the 50\%, 80\% and 95\% confidence intervals of the bootstrapped Lee-Carter model (Figure~\ref{fig:LCboot}) can be obtained with the command: 
<<PlotbootLCNZshow, eval=FALSE>>=  
plot(LCboot_NZ, nCol = 3)
@
\noindent 
From Figure~\ref{fig:LCboot} we note that whilst the parameter uncertainty in the static age function $\alpha_x$ and the period index $\kappa_t^{(1)}$ is modest, the uncertainty in the age-modulating parameters $\beta_x^{(1)}$ is more significant.\\  

\begin{figure}[!tb]  
\centering
%Reinstate if re-running New Zeland example
% <<plotLCNZboot, fig.width=8, fig.height = 2.8, out.width="16cm", echo=FALSE, cache=TRUE, dependson="loadLCNZboot">>=
% plot(LCboot_NZ, nCol = 3)
% @
\includegraphics[width=16cm]{plotLCNZboot.pdf}  
\caption{Bootstrapped parameters for the Poisson Lee-Carter model fitted to the New Zealand male population for ages 0-89 and the period 1985-2008. Shades in the fan represent confidence intervals at the 50\%, 80\% and 95\% level.}\label{fig:LCboot}  
\end{figure}

 Once a stochastic mortality model has been bootstrapped we can simulate it forward to obtain simulated trajectories which account for both the forecast error in the period and cohort indexes and the error in the model fitting. In \pkg{StMoMo} we can accomplish this using {the} function \code{simulate} applied to an object of class \code{"bootStMoMo"}. Thus, to obtain 5000 simulated trajectories of the Lee-Carter model {for the next 24 years taking} into account parameter uncertainty we use the instruction:
% %Reinstate if re-running New Zeland example
% % <<simPULCNZ, cache=TRUE, dependson=c("fitNZ, loadLCNZboot")>>=
% % LCsimPU_NZ <- simulate(LCboot_NZ,  h = 24)
% % @ 
% 
<<simPULCNZ, eval=FALSE>>=
LCsimPU_NZ <- simulate(LCboot_NZ,  h = 24)
@ 
\noindent 
To highlight the impact of parameter risk on mortality rate projections it is instructive to compare  prediction intervals with and without the allowance of parameter uncertainty. {24-year ahead} central forecast together with 5000 trajectories of the Lee-Carter model allowing only for forecast error in the random walk with drift and ignoring the model fitting error can be obtained with the code:
% %Reinstate if re-running New Zeland example
% % <<simLCNZ, cache=TRUE, dependson=c("fitNZ, loadLCNZboot")>>=
% % LCfor_NZ <- forecast(LCfit_NZ, h = 24)
% % LCsim_NZ <- simulate(LCfit_NZ, nsim = 5000, h = 24)
% % @ 
% 
<<simLCNZ, eval=FALSE>>=
LCfor_NZ <- forecast(LCfit_NZ, h = 24)
LCsim_NZ <- simulate(LCfit_NZ, nsim = 5000, h = 24)
@ 
\noindent

Figure~\ref{fig:LCNZmxtPred} depicts 95\% prediction intervals for mortality rates at age 40, 60 and 80 with and without allowance for parameter uncertainty. This graph was produced with the code:
<<plotLCPredshow, eval=FALSE>>=  
mxt <- LCfit_NZ$Dxt / LCfit_NZ$Ext
mxtHat <- fitted(LCfit_NZ, type = "rates")           
mxtCentral <- LCfor_NZ$rates
mxtPred2.5 <- apply(LCsim_NZ$rates, c(1, 2), quantile, probs = 0.025)
mxtPred97.5 <- apply(LCsim_NZ$rates, c(1, 2), quantile, probs = 0.975)
mxtHatPU2.5 <- apply(LCsimPU_NZ$fitted, c(1, 2), quantile, probs = 0.025)  
mxtHatPU97.5 <- apply(LCsimPU_NZ$fitted, c(1, 2), quantile, probs = 0.975)  
mxtPredPU2.5 <- apply(LCsimPU_NZ$rates, c(1, 2), quantile, probs = 0.025)
mxtPredPU97.5 <- apply(LCsimPU_NZ$rates, c(1, 2), quantile, probs = 0.975)
x <- c("40", "60", "80")
matplot(LCfit_NZ$years, t(mxt[x, ]), xlim = range(LCfit_NZ$years, 
  LCfor_NZ$years), ylim = range(mxtHatPU97.5[x, ], mxtPredPU2.5[x, ],
  mxt[x, ]), type = "p", xlab = "years",
  ylab = "mortality rates (log scale)", log = "y", pch = 20,
  col = "black")
matlines(LCfit_NZ$years, t(mxtHat[x, ]), lty = 1, col = "black")
matlines(LCfit_NZ$years, t(mxtHatPU2.5[x, ]), lty = 5, col = "red")
matlines(LCfit_NZ$years, t(mxtHatPU97.5[x, ]), lty = 5, col = "red")
matlines(LCfor_NZ$years, t(mxtCentral[x, ]), lty = 4, col = "black")
matlines(LCsim_NZ$years, t(mxtPred2.5[x, ]), lty = 3, col = "black")
matlines(LCsim_NZ$years, t(mxtPred97.5[x, ]), lty = 3, col = "black")
matlines(LCsimPU_NZ$years, t(mxtPredPU2.5[x, ]), lty = 5, col = "red")
matlines(LCsimPU_NZ$years, t(mxtPredPU97.5[x, ]), lty = 5, col = "red")
text(1986, mxtHatPU2.5[x, "1995"], labels = c("x=40", "x=60", "x=80"))
@

\begin{figure}[!tb]  
\centering
%Reinstate if re-running New Zeland example
% <<plotLCPred, eval=TRUE,  fig.width=5, fig.height=5,out.width="7.5cm", results='hide', warning=FALSE, message=FALSE, cache=TRUE, echo=FALSE, dependson=c("simLCNZ", "simPULCNZ", "fitNZ")>>=
% par(mar=c(4.5,4,1,1))
% mxt <- LCfit_NZ$Dxt / LCfit_NZ$Ext
% mxtHat <- fitted(LCfit_NZ, type = "rates")           
% mxtCentral <- LCfor_NZ$rates
% # 95% Prediction intervals without PU
% mxtPred2.5 <- apply(LCsim_NZ$rates, c(1, 2), quantile, probs = 0.025)
% mxtPred97.5 <- apply(LCsim_NZ$rates, c(1, 2), quantile, probs = 0.975)
% # 95% intervals with PU (in sample, and predictions)
% mxtHatPU2.5 <- apply(LCsimPU_NZ$fitted, c(1, 2), quantile, probs = 0.025)  
% mxtHatPU97.5 <- apply(LCsimPU_NZ$fitted, c(1, 2), quantile, probs = 0.975)  
% mxtPredPU2.5 <- apply(LCsimPU_NZ$rates, c(1, 2), quantile, probs = 0.025)
% mxtPredPU97.5 <- apply(LCsimPU_NZ$rates, c(1, 2), quantile, probs = 0.975)
% 
% #Plot
% x <- c("40", "60", "80")
% matplot(LCfit_NZ$years, t(mxt[x, ]), xlim = range(LCfit_NZ$years, LCfor_NZ$years),
%      ylim = range(mxtHatPU97.5[x, ], mxtPredPU2.5[x, ], mxt[x, ]),
%      type = "p", xlab = "years", ylab = "mortality rates (log scale)", 
%      log = "y", pch = 20, col = "black", cex = 0.75)
% matlines(LCfit_NZ$years, t(mxtHat[x, ]), lty = 1, col = "black")
% matlines(LCfit_NZ$years, t(mxtHatPU2.5[x, ]), lty = 5, col ="red")
% matlines(LCfit_NZ$years, t(mxtHatPU97.5[x, ]), lty = 5, col ="red")
% matlines(LCfor_NZ$years, t(mxtCentral[x, ]), lty = 4, col = "black")
% matlines(LCsim_NZ$years, t(mxtPred2.5[x, ]), lty = 3, col = "black", lwd = 2)
% matlines(LCsim_NZ$years, t(mxtPred97.5[x, ]), lty = 3, col = "black", lwd = 2)
% matlines(LCsimPU_NZ$years, t(mxtPredPU2.5[x, ]), lty = 5, col = "red")
% matlines(LCsimPU_NZ$years, t(mxtPredPU97.5[x, ]), lty = 5, col = "red")
% text(1986, mxtHatPU2.5[x, "1995"],
%      labels = c("x = 40",  "x = 60", "x = 80"))
% @ 
\includegraphics[width=7.5cm]{plotLCPred.pdf}    
\caption{95\% Prediction intervals for mortality rates $q_{xt}$ at ages $x=40$ (bottom lines), $x=60$ (middle lines) and $x=80$ (top lines) for the Poisson Lee-Carter model fitted to the New Zealand male population for ages 0-89 and the period 1985-2008. Dots show historical mortality rates for 1985-2008 and solid black lines show the corresponding fitted rates.  Dashed lines represent central forecast, black dotted lines represent 95\% prediction intervals excluding parameter uncertainty and dot-dashed red lines depict 95\% confidence and prediction intervals including parameter uncertainty.}\label{fig:LCNZmxtPred}  
\end{figure}

\noindent
In Figure~\ref{fig:LCNZmxtPred} we can clearly see that parameter uncertainty has an important impact on the prediction intervals. This is particularly noticeable at age 40 where in year 2030, for instance, the width of the prediction interval with parameter uncertainty is around 3 times bigger than that without parameter uncertainty. These results are in line with those obtained by \citet{Li2014} using the same dataset.\footnote{We note that our prediction intervals {without and with} parameter uncertainty correspond to  methods (I) and (IVa) in \citet{Li2014}, respectively.}

\section{Conclusion}\label{sec:Conclusions}
In this paper we have introduced the family of generalized age-period-cohort stochastic mortality models by paralleling the standard framework of generalized linear models.  In addition, we have presented the \R package \pkg{StMoMo}  which takes advantage of the unifying framework of the GAPC family to provide tools for fitting a diverse number of stochastic mortality models, assessing their goodness of fit and {also} performing mortality projections. A key feature of the GAPC family and of \pkg{StMoMo} is that they not only encompass models from the Lee-Carter and CBD families, but can also accommodate possible new models. Furthermore, model risk is a prevalent issue when forecasting mortality and we {therefore} believe that the possibility of easily implementing and comparing a wide range of models makes \pkg{StMoMo} a valuable addition to the toolkit for measuring and managing longevity risk.\\

Our package can be expanded in several directions. The current version of \pkg{StMoMo} only allows the use of a log link with a Poisson distribution of deaths or a logit link with a Binomial distribution of deaths. However, we plan to expand  the possible combinations of error distribution and link function to include, for instance, Binomial errors with a complementary log-log link as suggested by \cite{Currie2014}. \\

%In agreement with the standard in the literature, {the \pkg{StMoMo} package} assumes a multivariate random walk with drift for the dynamics of the period index $\boldsymbol{\kappa}_t$. Nevertheless, the package could be expanded to accommodate more general time series models which might be more appropriate for some datasets or some models. For instance, \citet{Renshaw2006} use an ARIMA$(2,1,0)$ for $\kappa_t^{(1)}$ when forecasting mortality for England and Wales males with the Lee-Carter model with cohorts, while \citet{Plat2009a} uses general univariate ARIMA$(p,d,q)$ for each $\kappa_t^{(i)}$, $i = 1,\ldots, N$, fitted using seemingly unrelated regressions to allow for correlations between the different period indexes. \\

In the GAPC family it is assumed that the age-modulating terms $\beta^{(i)}_x$, $i=0,\ldots, N$, are either non-parametric functions of age which need to be estimated or parametric functions of age with a pre-specified functional form $f^{(i)}(x)$. This latter case could be extended to include the more general case of a pre-specified functional form with a set of parameters that need to be estimated, that is, $\beta^{(i)}_x = f^{(i)}(x;\theta_i)$ with $\theta_i$ being some model parameters. This generalization would allow the implementation of the family of models considered in \citet{Hunt2014} \rev{and the inclusion of models where age terms are given by smooth parametric functions such as polynomial splines. Smoothness of the parameters in a stochastic mortality model is a topic that has received attention in the literature and offers a potential route for further extensions of \pkg{StMoMo}, see for instance \citet{Delwarde2007, Currie2013a}.}\\

Finally, the increasing attention that multiple population mortality models are receiving in the mortality forecasting literature provides an avenue for a wealth of extensions of our package. In particular, we are {considering the development of} a two-population version of \pkg{StMoMo} that implements the class of relative two-population models considered in \citet{Villegas2015}.


\section*{Acknowledgements}\label{sec:Acknowledgements}
This work stems from research funded by the Institute \& Faculty of Actuaries and the Life \& Longevity Markets Association. \rev{Andr\'es M. Villegas acknowledges support by the Australian Research Council Centre of Excellence in Population Ageing Research (project number CE110001029).} We thank Sveinn Gunnlaugsson and Mario Sison for their assistance in the development and testing of the preliminary code underlying \pkg{StMoMo}. We are indebted to Steven Haberman for his encouragement in developing this project and for commenting on a previous version of this paper.  We also thank Anastasios Bardoutsos for his suggestions on the structure of the package and Andrew Hunt for many useful discussions which have helped improved this work. Andr\'es Villegas thanks Ana Deb\'on for first showing him the great things that can be done with the \R package \pkg{gnm} and selflessly sharing her code and knowledge. This paper has been presented at the 19th congress on Insurance: Mathematics and Economics in Liverpool, the \R in Insurance 2015 conference in Amsterdam, the Longevity 11 conference in Lyon, the 7th Australasian Actuarial Education and Research Symposium in the Gold Coast, and the Polytechnic University of Valencia. We thank conference participants for their comments and suggestions.   


\bibliography{StMoMo}

\end{document}