---
title: "4. Method"
output: rmarkdown::html_vignette
bibliography: biblio.bib
vignette: >
  %\VignetteIndexEntry{4. Method}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## General setup

Let us set up the notations first. Suppose a there exists a partition of a
region ${\rm D} \in {\cal R}^2$ (e.g., a city). This partition is denoted by
$A_i$, $i = 1, \ldots, n$. Moreover, there exists another partition of the same
city, denoted $B_j$, where $j = 1, \ldots, m$. These partitions can be seen as
two different administrative divisions within the same city. It is common for
different government agencies to release data for different divisions of a same
city, country, or state.

## Model-based approach

Assume we observe a random variable $Y(\cdot)$ at each region $A_i$ and we are
interested in predict/estimate this variable in each of the regions $B_j$. Now
suppose the random variable $Y(\cdot)$ varies continuously over ${\rm D}$ and is
defined as follows
\[
Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in
{\rm D} \subset {\cal R}^2.
\]
where
\[
 S(\cdot) \sim {\rm GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \;
 \text{ and } \;
\varepsilon(\cdot) \overset{{\rm i.i.d.}}{\sim} {\rm N}(0, \sigma^2 \rho(\cdot;
\, \phi, \kappa)),
\]
with $S$ and $\varepsilon$ independent of each other. For now, let's make the
unrealistic assumption that all those parameters are known. Then, our assumption
is that the observed data is as follows
\begin{align*}
Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, {\rm d}
\mathbf{s} \\
& = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) +
\varepsilon(\mathbf{s})] \, {\rm d} \mathbf{s} \\
& = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) {\rm d}
\mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) {\rm
d} \mathbf{s},
\end{align*}
where $\lvert \cdot \rvert$ returns the area of a polygon.
Furthermore, it can be shown that (using Fubini's Theorem and some algebraic
manipulation)
\[
{\rm Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert}
\int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi,
\kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}' + \mathbf{I}(i = j)
\frac{\tau}{\lvert A_i \rvert},
\]
where $\rho(\cdot ; \, \phi, \kappa)$ is a positive definite correlation
function. Now, let ${\rm R}_{\kappa}(\phi)$ be a correlation matrix such that
\[
{\rm R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert}
\int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi,
\kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}',
\]
thus,
\[
Y(A_1, \cdots, A_n) \sim {\rm N}( \mu \mathbf{1}_n, \sigma^2 {\rm
R}_{\kappa}(\phi) + \tau {\rm diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1
\rvert^{-1})).
\]
Then, if we assume $(Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots,
A_m)^{\top})$ to be jointly normal, we use can the conditional mean of
$Y^{\top}(B_1, \cdots, A_m)^{\top}$ given $Y^{\top}(A_1, \cdots, A_n)$ to
estimate the observed random variable in the partition $B_1, \ldots, B_m$.

---

Now, suppose the parameters $\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)$
are unknown. The Likelihood of $Y(A_1, \ldots, A_n)$ can still be computed.

In particular, if we use the parametrization $\nu = \tau / \sigma^2$, we have
closed form for the Maximum Likelihood estimators both for $\mu$ and
$\sigma^2$. Thus, we can optimize the profile likelihood for $\phi$ and $\nu$
numerically. Then, we resort on conditional Normal properties again to compute
the predictions in a new different set of regions.

## Areal Interpolation (AI)

Areal interpolation is a nonparametric approach that interpolates $Y(A_i)$'s to
construct $Y(B_j)$'s. Define an $m \times n$ matrix $\mathbf{W} = \{ w_{ij} \}$,
where $w_{ij}$ is the weight associated with the polygon $A_i$ in constructing
$Y(B_j)$.  The weights are
$w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert$ [@goodchild1980areal;
@gotway2002combining].  The interpolation for
$\hat Y(B_1, \ldots, B_m)$ is constructed as
\begin{equation}
  \label{eq:np-est}
  \hat{Y}(B_1, \ldots, B_m) = \mathbf{W}
  Y(A_1, \ldots, A_n).
\end{equation}
The expectation and variance of the predictor are, respectively,
\[
  {\rm E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W}
  {\rm E}[Y(A_1, \ldots, A_n)]
\]
and
\begin{equation}
  \label{eq:np-matcov}
  \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W}
  \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}.
\end{equation}
In practice, the covariance matrix $\textrm{Var}[Y(A_1, \ldots, A_n)]$
is unknown and, consequently needs to be estimated.

The variance each predictor $\text{Var}[\hat Y(B_i)]$ is needed as an
uncertainty measure. It relies on both the variances of $Y(A_j)$'s and their
covariances:
\begin{align}
  \label{eq:np-single-var}
  \textrm{Var}[\hat{Y}(B_i)] 
  = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l
  \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right].
\end{align}
The variances are often observed in survey data, but the covariances are
not. For practical purpose, we propose an approximation for
$\textrm{Cov}[ Y(A_i), Y(A_l)]$ based on Moran's I, a global spatial
autocorrelation. Specifically, let $\rho_I$ be the Moran's I calculated with a
weight matrix constructed with first-degree neighbors. That is, $\rho_I$ is the
average of the pairwise correlation for all neighboring pairs. For regions $A_i$
and $A_l$, if they are neighbors of each other, our approximation is
\begin{align}
  \label{eq:cova}
  \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]
  = \rho_I \sqrt{\text{Var}[Y(A_i)]  \text{Var}[Y(A_l)]}.
\end{align}
The covariance between non-neighboring $A_i$ and $A_l$ is discarded. The final
uncertainty approximation for $\textrm{Var}[\hat{Y}(B_i)]$ will be an
underestimate. Alternatively, we can derive, at least, an upper bound for the
variance of the estimates by using a simple application from the
Cauchy--Schwartz inequality, in which case, $\rho_I$ is replaced with~1.

# Reference