\documentclass{article}

%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Spatio-temporal modelling with IDE}

\usepackage{amssymb,amsmath,amstext}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{color}
\usepackage{dcolumn}
\usepackage{dsfont}
\usepackage{epsfig}
\usepackage[T1]{fontenc}
\usepackage[]{graphicx}
\usepackage{ifthen}
\usepackage{longtable}
\usepackage{makeidx}
\usepackage{natbib}
\usepackage{setspace}
\usepackage{multirow}
\usepackage{tabularx}
\usepackage{threeparttable}
\usepackage{times}
\usepackage{titlesec}
\usepackage{url}
\usepackage{verbatim}

\newcommand{\mbf}[1]{\mathbf{#1}}
\newcommand{\mbv}[1]{\mbox{\boldmath$#1$\unboldmath}}
\newcommand{\bfu}{{ \bf u}}
\def\ba{\mathbf{a}}
\def\bb{\mathbf{b}}
\def\bc{\mathbf{c}}
\def\bd{\mathbf{d}}
\def\be{\mathbf{e}}
\def\bg{\mathbf{g}}
\def\bh{\mathbf{h}}
\def\bk{\mathbf{k}}
\def\bm{\mathbf{m}}
\def\br{\mathbf{r}}
\def\bs{\mathbf{s}}
\def\bu{\mathbf{u}}
\def\bv{\mathbf{v}}
\def\bw{\mathbf{w}}
\def\bx{\mathbf{x}}
\def\by{\mathbf{y}}
\def\bz{\mathbf{z}}
\def\bA{\mathbf{A}}
\def\bB{\mathbf{B}}
\def\bC{\mathbf{C}}
\def\bD{\mathbf{D}}
\def\bE{\mathbf{E}}
\def\bF{\mathbf{F}}
\def\bG{\mathbf{G}}
\def\bH{\mathbf{H}}
\def\bI{\mathbf{I}}
\def\bJ{\mathbf{J}}
\def\bK{\mathbf{K}}
\def\bL{\mathbf{L}}
\def\bM{\mathbf{M}}
\def\bP{\mathbf{P}}
\def\bQ{\mathbf{Q}}
\def\bR{\mathbf{R}}
\def\bS{\mathbf{S}}
\def\bT{\mathbf{T}}
\def\bU{\mathbf{U}}
\def\bV{\mathbf{V}}
\def\bW{\mathbf{W}}
\def\bX{\mathbf{X}}
\def\bY{\mathbf{Y}}
\def\bZ{\mathbf{Z}}
\def\bfzero{\mathbf{0}}
\def\bfone{\mathbf{1}}

\newcommand{\intd}{\textrm{d}}
\newcommand{\bfomega}{\mbox{\boldmath $\omega$}}
\newcommand{\bfalpha}{\mbox{\boldmath $\alpha$}}
\newcommand{\bfbeta}{\mbox{\boldmath $\beta$}}
\newcommand{\bfxi}{\mbox{\boldmath $\xi$}}
\newcommand{\bfeta}{\mbox{\boldmath $\eta$}}
\newcommand{\bftau}{\mbox{\boldmath $\tau$}}
\newcommand{\bfdelta}{\mbox{\boldmath $\delta$}}
\newcommand{\bftheta}{\mbox{\boldmath $\theta$}}
\newcommand{\bfTheta}{\mbox{\boldmath $\Theta$}}
\newcommand{\bfell}{\mbox{\boldmath $\ell$}}
\newcommand{\bfepsilon}{\mbox{\boldmath $\varepsilon$}}
\newcommand{\bfPhi}{\mbox{\boldmath $\Phi$}}
\newcommand{\bfPsi}{\mbox{\boldmath $\Psi$}}
\newcommand{\bfpsi}{\mbox{\boldmath $\psi$}}
\newcommand{\bfphi}{\mbox{\boldmath $\phi$}}
\newcommand{\bfmu}{\mbox{\boldmath $\mu$}}
\newcommand{\bfsigma}{\mbox{\boldmath $\sigma$}}
\newcommand{\bfnu}{\mbox{\boldmath $\nu$}}
\newcommand{\bfgamma}{\mbox{\boldmath $\gamma$}}
\newcommand{\bflambda}{\mbox{\boldmath $\lambda$}}
\newcommand{\bfLambda}{\mbox{\boldmath $\Lambda$}}
\newcommand{\bfSigma}{\mbox{\boldmath $\Sigma$}}
\newcommand{\bfPi}{\mbox{\boldmath $\Pi$}}
\newcommand{\var}{\textrm{var}}
\newcommand{\cov}{\textrm{cov}}
\newcommand{\bdiag}{\textrm{bdiag}}
\newcommand{\corr}{\textrm{corr}}
\newcommand{\pr}{\textrm{Pr}}
\newcommand{\fn}[1]{\texttt{\hlkwd{#1}}}
\newcommand{\num}[1]{\texttt{\hlnum{#1}}}
\newcommand{\strn}[1]{\texttt{\hlstr{#1}}}
\newcommand{\args}[1]{\texttt{\hlkwc{#1}}}
\newcommand{\R}{\texttt{R}}
\newcommand{\cc}[1]{\texttt{#1}}
\def\deg{$^{\circ}$ }
\newcommand{\rf}{\vskip .1in\par\sloppy\hangindent=1pc\hangafter=1
                 \noindent}

\newcommand{\indexprint}[1]{#1\index{#1}}

\DeclareMathOperator{\diag}{diag}
\DeclareMathOperator{\Gau}{Gau}
\DeclareMathOperator{\trace}{trace}
\DeclareMathOperator{\tr}{tr}

\usepackage[margin=1in,includefoot,includehead]{geometry}


\bibliographystyle{plainnat}

\title{Spatio-temporal integro-difference equation (IDE) modelling: The \tt{R} package}
\author{Andrew Zammit-Mangion}
%\author[1]{Noel Cressie}
%\affiliation{National Institute for Applied Statistics Research Australia~(NIASRA), School of Mathematics and Applied Statistics, University of Wollongong, New South Wales 2522, Australia}

\begin{document}


<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
# set global chunk options
# opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(formatR.arrow=TRUE,width=90)
knitr::opts_chunk$set(dpi=100, eval = FALSE)
@
%\SweaveOpts{concordance=TRUE}

\maketitle

In this vignette we use the package {\bf IDE} to fit spatio-temporal integro-difference equation (IDE) models, as well as predict and forecast from spatio-temporal data. We will explore two cases using simulated data where the true model is known. For this vignette, we need the following packages:

<<warning = FALSE, message = FALSE>>=
library("dplyr")
library("FRK")
library("ggplot2")
library("IDE")
library("sp")
library("spacetime")
@

<<echo = FALSE, message=FALSE, warning=FALSE>>=
library("ggplot2")
library("gridExtra")
@


The first-order spatio-temporal IDE process model used in the package {\bf IDE} is given by
\begin{equation}
Y_t(\bs) = \int_{D_s} m(\bs,\bx;\bftheta_p) Y_{t-1}(\bx) \; \intd \bx  + \eta_t(\bs); \;\;\; \bs,\bx \in D_s,
\label{eq:linearIDE}
\end{equation}
for $t=1,2,\ldots$, where $m(\bs,\bx;\bftheta_p)$ is a {\it transition kernel}, depending on parameters $\bftheta_p$ that specify ``redistribution weights'' for the process at the previous time over the spatial domain, $D_s$, and $\eta_t(\bs)$ is a time-varying (but statistically independent in time) continuous mean-zero Gaussian spatial process.   Note that it is assumed here that the parameter vector $\bftheta_p$ does not vary with time, as one often does, but it could in general.  So, the process at location $\bs$ and time $t$ is given by the weighted average (integral) of the process throughout the domain at the past time, where the weights are given by the transition kernel, $m(\cdot)$.  The ``adjustment,'' given by $\eta_t(\bs)$, is assumed to be Gaussian and accounts for spatial dependencies in $Y_t(\cdot)$ that are not captured by this weighted average. Another way to think about $\eta_t(\cdot)$ is that it adds smaller-scale dependence that is removed in the inherent smoothing that occurs when $\{Y_{t-1}(\cdot)\}$ is averaged over space, in order to give $Y_t(\cdot)$ a realistic spatial dependence structure. In general, $\int_{D_s} m(\bs,\bx;\bftheta_p) \intd \bx < 1$ for the process to be stable (non-explosive) in time.

The redistribution kernel $m(\bs,\bx;\bftheta_p)$ used by the package {\bf IDE} is given by

\begin{equation}
%m(s,x;\bftheta_p) = \frac{1}{\theta_{p,2} \sqrt{2 \pi}} \exp\left(-\frac{1}{2 \; \theta_{p,2}}(x - \theta_{p,1} - s)^2 \right) ,
m(\bs,\bx;\bftheta_p) = {\theta_{p,1}(\bs)} \exp\left(-\frac{1}{\theta_{p,2}(\bs)}\left[(x_1 - \theta_{p,3}(\bs) - s_1)^2 + (x_2 - \theta_{p,4}(\bs) - s_2)^2 \right] \right),
\label{eq:Gausskern}
\end{equation}

\noindent where the spatially-varying kernel amplitude is given by $\theta_{p,1}(\bs)$ and controls the temporal stationarity, the spatially-varying length-scale (variance) parameter $\theta_{p,2}(\bs)$ corresponds to a kernel scale (aperture) parameter (i.e., the kernel width increases as $\theta_{p,2}$ increases), and the mean (shift) parameters $\theta_{p,3}(\bs)$ and $\theta_{p,4}(\bs)$ correspond to a spatially-varying shift of the kernel relative to location $\bs$. Spatially-invariant kernels (i.e., where the elements of $\bftheta_p$ are not functions of space) are also allowed.

\subsection*{Simulation example with a spatially-invariant kernel}

The package {\bf IDE} contains a function \fn{simIDE} that simulates the behaviour of a typical dynamic system governed by linear transport. The function takes just three arguments, the number of time points to simulate, the number of (spatially fixed) observations to use, and a flag indicating whether to use a spatially-invariant kernel or not.

<<message=FALSE>>=
SIM1 <- simIDE(T = 10, nobs = 100, k_spat_invariant = 1)
@

\noindent The returned list \cc{SIM1} contains the simulated process in the data frame \cc{s\_df}, the observed data in the data frame \cc{z\_df}, and also as an \cc{STIDF} object \cc{z\_STIDF}. It also contains two {\bf ggplot2} plots, \cc{g\_truth} and \cc{g\_obs} which can be readily plotted as follows.

<<results = 'hide', fig.keep = 'none'>>=
print(SIM1$g_truth)
print(SIM1$g_obs)
@

\noindent While the transport action is clearly noticeable in the process evolution, there is also a clear spatial trend. Covariates are included through the use of a standard \cc{R} formula when calling the function \fn{IDE}. Additional arguments to \fn{IDE} include the dataset, which needs to be of class \cc{STIDF}, the temporal discretization to use (we will use 1 day) of class \cc{difftime}, and the gridsize on which the integrations (as well as predictions) will be carried out. Other arguments include user-specified basis functions for the process and the transition kernel, which for now we will not specify. By default, the IDE model will decompose the process using two resolutions of bisquare basis functions, and assume a spatially-invariant Gaussian transition kernel.

<<message = FALSE>>=
IDEmodel <- IDE(f = z ~ s1 + s2,
                data = SIM1$z_STIDF,
                dt = as.difftime(1, units = "days"),
                grid_size = 41)
@

The returned object \cc{IDEmodel} is of class \cc{IDE} and contains initial parameter estimates, as well as predictions for $\bfalpha_t, t = 1,\dots,T$ at these initial parameter estimates. The parameters in this case are the measurement-error variance, the variance of the random disturbance $\eta_t(\bs)$ (whose covariance structure is fixed), the kernel parameters, and the regression coefficients $\bfbeta$.

Fitting the IDE is a computationally intensive procedure. The default method currently implemented uses a differential evolution optimization algorithm from the package {\bf DEoptim}, which is a global optimization algorithm that can be easily parallelized. Fitting takes only a few minutes on a 60-core machine but can take an hour or two on a standard desktop. Fitting can be done by running

<<eval = FALSE>>=
fit_results_sim1 <- fit.IDE(IDEmodel,
                           parallelType = 1)
@

\noindent where \args{parallelType}\cc{ = }\num{1} ensures that all available cores on the machine are used for fitting.

The list \cc{fit\_results\_sim1} contains two fields: \cc{optim\_results} that contains the output of the optimization algorithm, and \cc{IDEmodel} that contains the fitted IDE model. The fitted kernel can be visualized by using the function \fn{show\_kernel}.

<<results = 'hide', fig.keep = 'none'>>=
show_kernel(fit_results_sim1$IDEmodel)
@

\noindent The fitted kernel is shifted to the left and upwards, correctly representing the south-easterly transport evident in the data. The estimated kernel parameters $\bftheta_p$ are

<<>>=
fit_results_sim1$IDEmodel$get("k") %>% unlist()
@
\noindent which can be compared to the true values \cc{c(}\num{150}, \num{0.002}, \num{-0.1}, \num{0.1}\cc{)}. The estimated amplitude is lower and the aperture is larger than those values used to simulate the true field, but the shift parameters are remarkably similar. The estimated regression coefficients are

<<>>=
fit_results_sim1$IDEmodel$get("betahat")
@

\noindent which also compare well to the true values \cc{c(}\num{0.2}, \num{0.2}, \num{0.2}\c{)}. Also of interest are the modulus of the eigenvalues of the evolution matrix $\bM$. These can be extracted as follows.

<<>>=
abs_ev <- eigen(fit_results_sim1$IDEmodel$get("M"))$values %>% abs()
summary(abs_ev)
@

\noindent Since the largest of these is less than 1, the IDE process exhibits stable behavior.

For prediction, one may either specify a prediction grid, or use the default one used for approximating the integrations set up by \fn{IDE}. The latter is usually sufficient so we use this without exception for the examples we consider. %In the former, one should specify an \cc{STFDF}, for example as follows.
When a prediction grid is not supplied, the function \fn{predict} returns a data frame with predictions spanning the data temporal horizon (forecasts and hindcasts are explored later).

<<>>=
ST_grid_df <- predict(fit_results_sim1$IDEmodel)
@

The prediction and prediction standard error can now be plotted using standard {\bf ggplot2} commands as follows.

<<results = 'hide', fig.keep = 'none'>>=
gpred <- ggplot(ST_grid_df) +             # Plot the predictions
  geom_tile(aes(s1, s2, fill=Ypred)) +
  facet_wrap(~t) +
  scale_fill_distiller(palette="Spectral", limits = c(-0.1,1.4)) +
  coord_fixed(xlim=c(0, 1), ylim = c(0, 1))

gpredse <- ggplot(ST_grid_df) +           # Plot the prediction s.es
  geom_tile(aes(s1, s2, fill=Ypredse)) +
  facet_wrap(~t) +
  scale_fill_distiller(palette="Spectral") +
  coord_fixed(xlim=c(0, 1), ylim = c(0, 1))
@

\noindent In Figure~\ref{fig:IDEsimresults}, we show the observations, the true process, the predictions, and the prediction standard errors from the fitted model. Note how prediction standard errors are high in regions of sparse observations, as expected.

<<echo = FALSE, fig.keep = 'none', results = 'hide', message = FALSE>>=
library("gridExtra")
g <- grid.arrange(SIM1$g_truth + scale_x_continuous(breaks = c(0,0.5)) +
                            scale_fill_distiller(palette = "Spectral", limits = c(0.1,1.4)),
                  SIM1$g_obs + scale_x_continuous(breaks = c(0,0.5)) +
                            scale_fill_distiller(palette = "Spectral", limits = c(0.1,1.4)),
                  gpred + scale_x_continuous(breaks = c(0,0.5)) +
                            scale_fill_distiller(palette = "Spectral", limits = c(0.1,1.4)),
                  gpredse + scale_x_continuous(breaks = c(0,0.5)) ,
                  nrow = 2)
ggsave(g, file = "./img/Chapter_5/IDEsimresults.png", width = 12, height = 10, dpi = 300)
@

\begin{figure}[t!]
\begin{center}
\includegraphics[width=\linewidth,angle=0]{IDEsimresults.png}
\end{center}
\caption{Simulated process (top-left panel), simulated data (top-right panel), predictions following the fitting of the IDE model (bottom-left panel) and the respective prediction standard errors (bottom-right panel).}
\label{fig:IDEsimresults}
\end{figure}

<<echo= FALSE, eval = FALSE>>=
s1_pred <- s2_pred <- seq(0,1,length.out = 71)
st_grid <- expand.grid(s1 = s1_pred,
                      s2 = s2_pred,
                      date = unique(time(SIM1$z_STIDF)))
pred_grid <- STIDF(sp = SpatialPoints(st_grid[,c("s1","s2")]),
                 time = st_grid$date,
                 data = st_grid %>% select(-s1, -s2, -date))

## Predict using prior guesses
ST_grid_df <- predict(fit_results_sim1$IDEmodel,
                      newdata = pred_grid) %>%
              as.data.frame()
@

\subsection*{Simulation example with a spatially-varying kernel}

In the previous example we considered the case of a spatially-invariant kernel, that is, the case when the kernel $m(\bs,\br;\bftheta_p)$ is just a function of $\|\br - \bs\|$. In this example we consider the case when one or more of the $\bftheta_p$ are allowed to be spatially-referenced. Such models are needed when the spatio-temporal process exhibits, for example, considerabe spatially-varying drift. One such process can be simulated using the function \fn{simIDE} and specifying \args{k\_spat\_invariant}\cc{ = }\num{0}. To model such data we need to have a large \args{nobs} and many time points, and we set \args{T}\cc{ = }\num{15}. This is important, as it is difficult to obtain reasonable estimates of spatially distributed parameters unless the data covers a large part of the spatial domain for at least a few consecutive time points.

<<>>=
SIM2 <- simIDE(T = 15, nobs = 1000, k_spat_invariant = 0)
@

As above, the process and the data can be plotted through

<<results = 'hide', fig.keep = 'none'>>=
print(SIM2$g_truth)
print(SIM2$g_obs)
@

\noindent Note how the field appears to rotate quickly anti-clockwise and arrive to a nearly complete standstill towards the lower part of the domain. The spatially-varying advection that generated this field can be visualized using

<<eval = FALSE>>=
show_kernel(SIM2$IDEmodel, scale = 0.2)
@

\noindent In this command, the argument \args{scale} scales the arrow sizes by 0.2, that is, the shift per time point is five times the displacement indicated by the arrow.

Spatially-varying kernels can be introduced by specifying the argument \args{kernel\_basis} inside the call to {\bf IDE}. The basis functions that {\bf IDE} uses are the of the same class as those used by {\bf FRK}. Below we construct 9 bisquare basis functions that are equally spaced in the domain.

<<>>=
mbasis_1 <- auto_basis(manifold = plane(),   # functions on the plane
                       data = SIM2$z_STIDF,  # data
                       nres = 1,             # 1 resolution
                       type = 'bisquare')    # type of functions
@

\noindent Type \fn{show\_basis}\cc{(mbasis\_1)} to plot these basis functions.

Now, recall that $\theta_{p,1}$ corresponds to the amplitude of the kernel, $\theta_{p,2}$ to the scale or aperture, $\theta_{p,3}$ to the $x$-direction shift ``drift,'' and $\theta_{p,4}$ to the $y$-direction shift (drift). Here we decide to let $\theta_{p,1}$ and $\theta_{p,2}$ be spatially invariant (usually a reasonable assumption), and decompose $\theta_{p,3}$ and $\theta_{p,4}$ as sums of basis functions given in \cc{mbasis\_1}.

<<>>=
kernel_basis <- list(thetam1 = constant_basis(),
                     thetam2 = constant_basis(),
                     thetam3 = mbasis_1,
                     thetam4 = mbasis_1)
@

Modelling proceeds as before, except that now we specify the argument \args{kernel\_basis} when calling \fn{IDE}.

<<>>=
IDEmodel <- IDE(f = z ~ s1 + s2 + 1,
                data = SIM2$z_STIDF,
                dt = as.difftime(1, units = "days"),
                grid_size = 41,
                kernel_basis = kernel_basis)
@

\noindent Fitting also proceeds by calling the function \fn{fit.IDE}. Below we use the argument \args{itermax}\cc{ = }\num{400} to specify the maximum number of iterations for the optimization routine to use.

<<eval = FALSE>>=
fit_results_sim2 <- fit.IDE(IDEmodel,
                           parallelType = 1,
                           itermax = 400)
@

\noindent  As above, be warned that this operation is very computationally intensive. The fitted spatially-varying kernel can be visualized through
<<eval = FALSE>>=
show_kernel(fit_results_sim2$IDEmodel)
@
\noindent and the true and fitted spatially-varying drift parameters are shown side by side in Figure~\ref{fig:Sim2kernels}. Note how the fitted drifts capture the broad directions and magnitudes of the true model. Predictions and prediction standard errors can be obtained and visualized using \fn{predict} as above.

\begin{figure}[t!]
\begin{center}
\includegraphics[width=\linewidth,angle=0]{kernels_sim2.png}
\end{center}
\caption{True drifts (left panel) and estimated drifts (right panel). \label{fig:Sim2kernels}}
\end{figure}


\end{document}