% Compile with 'R CMD Sweave xx.Rnw'
%   or 'Rscript -e "library(knitr);knit(xx.Rnw)"'
% In the package DESCRIPTION file specify:
%   VignetteBuilder: knitr
%   Suggests: knitr
% For RStudio installation to compile vignettes use
%   'devtools::install(build_vignettes = TRUE)'

\documentclass[12pt]{article}

%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{An effective biologically-based taper equation}
%\VignetteDepends{lattice}

\usepackage[utf8]{inputenc}
\usepackage{charter,inconsolata}
\usepackage[T1]{fontenc}
\PassOptionsToPackage{hyphens}{url}
\usepackage[breaklinks]{hyperref}
\hypersetup{pdfstartview={FitH -32768},pdfborder={0 0 0},
            bookmarksopen}

% --- math  ---
\usepackage{amsmath,bm}
\newcommand{\vc}[1]{\bm{#1}}
\newcommand{\mat}[1]{{\mathrm #1}}  % or \bf
\newcommand{\der}[2]{\frac{{\mathrm d}#1}{{\mathrm d}#2}}
\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\dr}[2]{{\mathrm d}#1/{\mathrm d}#2}
\newcommand{\dd}{\,{\mathrm d}}
%\newcommand{\mod}[1]{_{(\mbox{\scriptsize mod }#1)}} % see also pmod
\newcommand{\diag}{\mathop{\mathgroup\simoperators diag}\nolimits}
% or \newcommand{\diag}{\,\mbox{diag}}
\newcommand{\abs}{\mathop{\mathgroup\simoperators abs}\nolimits}
% or \newcommand{\abs}{\mbox{abs}}
\providecommand{\e}{\mathrm e} % included in amsmath?
\DeclareMathOperator{\sgn}{sgn}
\newcommand{\absv}[1]{\lvert #1 \rvert}

% ---  bibliography  ---
\usepackage{natbib}
%default:  \bibpunct{(}{)}{;}{a}{,}{,}
%\bibliographystyle{elsart-harv}

% ---  floats ---
%\usepackage[pdftex]{graphicx}
%\usepackage{slpflts}

\newcommand{\captionfont}{\small} % or {\sf}
% or \newcommand{\captionfont}{}

% [width], tag, caption
\newcommand{\fig}[3][]{\begin{figure}[htbp]\leavevmode\centering%
\includegraphics[width=#1\textwidth]{#2.pdf}\caption{\captionfont #3}\label{fig:#2}\end{figure}}

\hypersetup{
 pdfauthor={Oscar Garcia},%
 pdftitle={%{TITLE}%
 An effective biologically-based taper equation
}%
%%,pdfkeywords={}%
}

\newcommand{\lang}[1]{\textbf{#1}}
\newcommand{\pkg}[1]{\textbf{#1}}
\newcommand{\code}[1]{\texttt{#1}}

%% Fix wrapping of long lines in R output %%
% Thanks to Scott Pakin, author of \spverbatim
\makeatletter
\let\orig@verbatim=\verbatim
\begingroup
  \catcode`|=0
  \catcode`[=1
  \catcode`]=2
  \catcode`\{=12
  \catcode`\}=12
  \catcode`\\=12
  |gdef|spv@xverbatim#1\end{verbatim}[#1|end[verbatim]]
|endgroup
\renewenvironment{verbatim}{%
  \def\@xobeysp{\mbox{}\space}%
  \let\@xverbatim=\spv@xverbatim
  \orig@verbatim
}{%
}
\makeatother


\title{An effective biologically-based \\ taper equation}
\author{Vignette for the \pkg{dyntaper} \lang{R} package
  \thanks{\url{https://github.com/ogarciav/dyntaper}}
}
\date{Oscar Garc\'ia, August 2022}

\begin{document}
\maketitle

\setcounter{tocdepth}{4}
\tableofcontents


<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(fig.align='center', # fig.show='hold',
dev='pdf', out.width='.6\\textwidth') # , highlight=FALSE)
oldopt <- options(width=67)
@


\section{Introduction}
\label{sec:intro}

Taper or stem profile equations (or functions) predict stem diameters as a function of height above ground, total tree height, and diameter at breast height (dbh). It is important to realize that this can only be an approximation: one can expect a dominant tree to have a different bole shape than a suppressed tree with the same height and dbh in another stand. Stand density and other variables can also have an effect. Attempts at introducing additional variables have usually shown little or no improvement in predictions, although in some instances stand variables can partially compensate for a bad choice of taper equation form. Even if improved predictions could be achieved, it is difficult to beat the practicality and convenience of requiring only height and dbh.

Also, stem measurements are notoriously imprecise, being affected by stem irregularities, bark measurement errors, and deviations from a circular cross-section. The taper equation is essentially a convenient fiction, postulating a smooth regular shape applicable to every tree. These models are very useful, but one should keep the limitations in perspective. No need to agonize over precise mathematical details or elaborate statistical niceties (more on this later).

Practical taper models have been empirical and static, without much biological justification and giving the stem profile at one point in time. A realistic mathematical description of the entire tree has required a large number of adjustable parameters. In contrast, I explain here a model based on plausible approximations to the mechanisms of wood formation, that generates robust whole-tree profiles with few free parameters \citep{dyntaper}. The model is dynamically sound, producing a coherent development over time, which may be important in some applications.

In what follows, I first describe the model motivation. This includes the conformation of annual wood layers, and how they accumulate to give rise to the stem shape. Then, we look at the methods and \lang{R} functions used to calculate diameter profiles, heights for given diameters, and volumes between specified heights or diameters. Finally, I demonstrate some methods of parameter estimation and model evaluation. Detailed mathematical derivations are given in the Appendix. Related approaches have been explored, for instance, by \citet{deleuze}, and \citet{valentine}.

\section{Theory}
\label{sec:theo}

This section is not strictly necessary for applying the model. Feel free to skim over it, coming back some other day if you become curious.

\fig[0.5]{vinci}{Relationship between wood cross-section and foliage, according to Leonardo da Vinci (ca.~1500).}

Stem form is the result of the accumulation of annual wood layers produced by the cambium. There are various theories about what determines the thickness pattern of the layers. For instance, a popular one is that the layer's cross-sectional area is proportional to the amount of foliage above each point. The idea can be traced back to Leonardo da Vinci (Fig.~\ref{fig:vinci}), it was developed by Pressler in 1864, and rediscovered by Japanese scientists in the 1960s (the  ``pipe model theory''). Pressler assumed a uniform vertical distribution of foliage in the crown, so that the area increment increases linearly downwards, from 0 at the tip of the tree to some maximum at the base of the green crown, remaining constant below that. This, and its consequences for the development of stem form, were nicely illustrated by \citet{mitchell}, see Fig.~\ref{fig:mitchell}.

\fig{mitchell}{From \citet{mitchell}.}

Apart from the Vinci--Pressler--pipe hypothesis, there are other theories based on biomechanical considerations. \citet{larson} is still a nice overview. Another oldie but goodie is \citet{gray}. Observational data for the annual layer thickness is typically very noisy, but more than a straight line, it often suggests something like an asymptotic curve for the increase in area with distance from the top. For instance, ignoring scale factors,
<<>>=
curve(pmin(x, 1), from=0, to=3)     # Pressler
curve(1 - exp(-x), 0, 3, add=TRUE)  # exponential
curve(x / (x + 1), 0, 3, add=TRUE)  # hyperbolic
@
\noindent
Regardless of the exact layer shape, the stem profile is generated by the telescoping of the layers, as they accumulate while moving upwards (Fig.~\ref{fig:mitchell}).

A glaring deficiency of the model so far is that it ignores that most trees exhibit a  basal flaring or \emph{butt swell}, caused by an additional thickening of the annual wood layer near the base. This is suggested, somewhat timidly, by Mitchell in 4a (Fig.~\ref{fig:mitchell}). We add then a butt-swell component to the area increment, which should decrease as the height above the tree base increases. In symbols, we have a top area increment $\varphi(x)$ that increases with distance $x$ from the top, plus a basal increment $\eta(h)$ that decreases with the height level $h$. Therefore, the cross-sectional area increment at a level $h$ in a tree with total height $H$ is
\begin{equation} \label{eq:ds}
    \varphi(H - h) + \eta(h) \;,
\end{equation}
noting that $x = H - h$.

We can now accumulate these increments to obtain the total stem cross-sectional area at any given height level $h$. And from that, the diameter and taper function. It is convenient to take eq.~\eqref{eq:ds} as relative to height growth, expressed as area increment per unit of height increment. Then, we can accumulate by integrating over height, from when the tree height reached the level $h$ to the current total height $H$.

To get tractable results we make a simplifying assumption: the functions $\varphi(x)$ and $\eta(h)$ do not change over time or among trees (almost). More precisely, they do not change while the tree grows between $h$ and $H$. And the actual growth is assumed to be \emph{proportional} to eq.~\eqref{eq:ds}, with a proportionality factor that can vary from tree to tree. Then, the stem cross-section at level $h$ in a tree of height $H$ is
proportional to
\[
    s(h, H) = \int_h^H \left[\varphi(y - h) + \eta(h)\right] \dd y
            = \int_h^H \varphi(y - h) \dd y + (H - h) \eta(h) \;,
\]
or
\begin{equation} \label{eq:s}
    s(h, H) = \Phi(H - h) + (H - h) \eta(h), \quad \text{with}
      \quad \Phi(x) = \int_0^x \varphi(u) \dd u
\end{equation}
(trust me!)

The most questionable assumption here is the constant shape of $\varphi(x)$. In Pressler's model, for instance, that would mean a constant crown length, which may not be accurate if there is artificial pruning or large stand density changes. It is found, however, that these details have relatively minor effects on the final profile,
and the simplification is just too convenient.

It remains to choose suitable forms for $\varphi(x)$ and $\eta(h)$.

In the case of $\eta$, we want some ``decay'' function that, ignoring scale factors, starts at 1 for $h=0$ and decreases to 0 as $h$ increases. For instance, a negative exponential $e^{-h}$, or a hyperbola $1 / (1+h)$. More generally, $(1 - p h)^{1/p}$ includes these and many other forms as special cases for given values of $p$: the exponential is the limit as $p \rightarrow 0$, and the hyperbola is obtained with $p = -1$. Define then a general decay function
\begin{align*} \label{eq:d}
    \delta(y, p) &= (1 - p y)_+^{1/p} \text{ if } p \neq 0 \;, \\
    \delta(y, 0) &= \e^{-y} \;.
\end{align*}
The notation $(\cdot)_+$ is shorthand for the non-negative truncation $max\{\cdot, 0\}$, which can be relevant when $p > 0$. This $\delta()$ is implemented in the package function \code{decay()}:
<<>>=
library(dyntaper)
for(p in seq(1, -1, -0.5)) curve(decay(y, p), xname="y", from=0,
  to=3, add=(p != 1))
@
\noindent

As discussed before, the top increment function $\varphi(x)$ should resemble a ``ramp'', increasing from 0 at $x=0$ up to a horizontal asymptote. It looks like a decay function turned upside-down. In fact, $1 - \delta(x, p)$ happens to give the Pressler, exponential, and hyperbolic ramps, for $p$ equal to 1, 0, and -1, respectively. Verify by plotting \code{1 - decay(x, p)}.

Thus, with appropriate scaling parameters $b_i$, we adopt the general forms
\[
    \varphi(x) = 1 - \delta(x / b_1, b_2) \quad \text{and} \quad
     \eta(h) = b_3 \delta(h / b_4, b_5) \;.
\]
Substituting in eq.~\eqref{eq:s}, after some algebra it is found that the cross-section profile model is proportional to the base curve
\begin{equation} \label{eq:base}
    s(h, H) = H - h - b_1 I_\delta[(H-h)/b_1, b_2] + b_3 (H - h) \delta(h / b_4, b_5) \;.
\end{equation}
Details in the Appendix, which includes also the calculation of $I_\delta(y, p)$, the integral  of $\delta$.
The function \code{tbase()} computes this, for instance,
<<>>=
b <- c(2.569, 0, 1.042, 0.3012, -1)  # params. for Douglas fir in BC
curve(tbase(h, 32, b), from=0, to=32, xname="h")  # H = 32 meters
@
\noindent
Of course, a diameter base curve is obtained by taking the square root. Plot it.

Finally, the tree-dependent proportionality factor can be resolved by forcing the curve to go through the dbh $D$ at the breast height $h_b$:
\begin{equation} \label{eq:taper}
    d(h, H, D) = D \sqrt{\frac{s(h, H)}{s(h_b, H)}} \;.
\end{equation}
This assumes that all the diameters are  outside bark, or all are inside bark. If, for instance, the stem diameter measurements are inside bark but the dbh is outside bark, one could substitute $k D$ for $D$, where $k$ is an estimated bark conversion factor.
The taper equation \eqref{eq:taper} is computed with \code{taper()}, see examples below.
The functions \code{decay}, \code{Id}, and \code{tbase} are used internally by other functions, normally not directly by the user.

Admittedly, this derivation uses some fairly rough approximations. At worst, you can view it as a way of obtaining a reasonable stem profile description that is flexible and not too complex. Integration tends to lessen the impact of growth mechanism details, and conversely, differencing amplifies observation noise.  Which explains why after some 1.5 centuries very different stem form generating hypotheses coexist; it is difficult to discriminate among them based on external stem measurements.

This taper model fits well the form of entire trees, with just 3 or 5 free parameters, depending on whether we count the shape parameters $b_2$ and $b_5$ or not. Rounding the shape parameters to small integers does not make much difference, resulting in simplified forms of eq.~\eqref{eq:base}. The simplest is the \emph{exponential-exponential} version with $b_2 = b_5  = 0$. This resembles the Brink equations used successfully by several authors \citep{brink,arias}. See also \citet{koirala}. In \citet{dyntaper}, the best model was the \emph{exponential-hyperbolic}, with $b_2 = 0$ and $b_5 = -1$. See specific forms in eq.~\eqref{eq:spec}, below.

Exercise: \emph{Variable-form}, or \emph{variable-exponent}, is one of the categories into which taper models are commonly classified. A typical example is
\begin{equation*}
    d = D \left(\frac{H - h}{H - h_b}\right)^{f(h, H, D)}
\end{equation*}
for some complicated function $f()$. Other similar expressions inside the parenthesis, and powers of $D$, are also used. Convince yourself that any taper function $d = g(h, H, D)$ (such as eq.~\eqref{eq:taper}) can be written in this form by making $f(h, H, D) = \log[g(h, H, D) / D] / \log[(H -h) / (H - h_b)]$.


\section{Taper} \label{sec:taper}

The taper model is given by eqs.~\eqref{eq:base} and \eqref{eq:taper}, and is implemented in function \code{taper()}. For instance, for a tree with a total height of 32 m and dbh of 24 cm (breast height 1.3 m), using the same vector \code{b} of coefficients as before,
<<>>=
curve(
  taper(h, H=32, D=0.956*24, b=b, bh=1.3),
from=0, to=32, xname="h")
@
\noindent
The parameters were for an inside-bark model, and 0.956 is the outside- to inside-bark conversion factor.
Setting \code{area=TRUE} returns cross-sectional areas instead of diameters.

The parameter $b_1$ is supposed to be related to crown length, $b_3$ reflects the contribution of butt swell, and $b_4$ determines how high the but swell extends up from the ground.
The shape parameters $b_2$ and $b_5$ affect the distribution of diameter (or area) increment in the top of the tree and in the butt swell, respectively.

The exact values of $b_2$ and $b_5$ have a relatively small effect on stem form, and I suggest fixing them at ``nice'' values like 0, 1, or -1 in the final model. The choice can be guided by leaving free these shape parameters during model development.
Then, one may view the general model as defining a family of more parsimonious taper equations. The base eq.~\eqref{eq:base} can be written more explicitly as:
\begin{align} \label{eq:spec}
    s(h, H) = H - h - &\begin{cases}
              b_1 \left(1 - \e^{-\frac{H - h}{b_1}}\right) & \text{if } b_2 = 0 \\
              b_1 \ln \left(\frac{H-h}{b_1} + 1\right)  & \text{if } b_2 = -1 \\
              \frac{b_1}{b_2+1} \left\{1 - \left[1 - \frac{b_2+1}{b_1} (H - h)\right]_+^{\frac{1}{b_2}+1} \right\} & \text{otherwise}
                          \end{cases} \nonumber \\
            + b_3 (H - h) &\begin{cases}
                              \e^{-\frac{h}{b_4}}  & \text{if } b_5 = 0 \\
                              \frac{b_4}{h + b_4}  & \text{if } b_5 = -1 \\
                              \left[1 - \frac{b_5}{b_4} h\right]_+^\frac{1}{b_5}  & \text{otherwise}
                          \end{cases}
\end{align}


\section{Height for a given diameter}
\label{sec:h}

Often, it is necessary to estimate the height $h$ at which the stem has a certain diameter $d$. This is the inverse of the taper function eq.~\eqref{eq:taper}, for fixed $H$ and $D$. There is no closed-form expression for the inverse, but values can be computed numerically with function \code{hlevel()}.

For instance, the height at which the diameter is 15 cm in a tree 32 m tall with dbh 24 cm is
<<>>=
hlevel(15, H=32, D=0.956*24, b=b, bh=1.3)
@
\noindent
The leading \verb|##| are not displayed by \lang{R}, they are used here to distinguish outputs from inputs. Setting the parameter \code{area = TRUE} produces the height for a given cross-sectional area.

Exercises: (a) Check that the diameter at breast height is as it should be. (b) What happens if the diameter does not exist (e.~g., $d = 30$)?

\section{Volumes}
\label{sec:vol}

A common use of taper equations is the computation of stem volume. Either total volume, or the volume between two given height levels. Levels might be specified as those corresponding to a certain diameter, and then the corresponding height level can be obtained with \code{hlevel()} (Section \ref{sec:h}).

Volume is given by the integral of the cross-sectional area between the two height levels. Our taper model can be integrated analytically so that no numerical approximations are needed (Appendix). The general expression is a bit messy, but values can be calculated with function \code{volume()}.

As an example, for the same tree used before, the volume between a 30 cm stump and a 10 cm diameter limit is:
<<>>=
h10 <- hlevel(10, 32, 0.956*24, b, 1.3)  # height for diameter 10
volume(h1=0.3, h2=h10, H=32, D=0.956*24, b=b, bh=1.3, rhd=100)
@
\noindent
Before it was not necessary to worry about the units of diameter and height, it did not matter if they were different. For volume, it \emph{does} matter, and one has to specify \code{rhd}, the ratio between the units of height and diameter. In this instance, height in meters and diameter in centimeters give \code{rhd = 100}. If the heights were in feet and the diameters in inches, we would have \code{rhd = 12}.

Exercises: Find (a) total volume;  (b) sawlogs volume between stump and a limit diameter of 20 cm, and pulpwood volume above that up to a 10 cm limit.


\section{Parameter estimation}
\label{sec:est}

Let's demonstrate an example of model fitting. The package includes a small dataset with measurements from 10 eucalypt trees, taken from \citet{brink}:
<<>>=
summary(brink)
library(lattice)
xyplot(dib ~ h, groups=Tree, data=brink, type="b")
@
\noindent
The diameter measurements \code{dib} for each height level \code{h} are inside bark, while the tree dbh \code{Dob} is outside bark. Diameters are in centimeters, heights in meters.

We need to convert dbh values from outside to inside bark. The data includes the observed inside-bark diameter at the breast height of 1.35 m, from which one can estimate a conversion factor $k$:
<<>>=
Dib <- brink$dib[brink$h == 1.35]
Dob <- brink$Dob[brink$h == 1.35]
mean(Dib / Dob)
@
\noindent
Although less intuitive, I would rather use a log transform. It produces the same value for converting from outside to inside bark or vice-versa, and it may help with heteroscedasticity:
<<>>=
exp(mean(log(Dib / Dob)))
k <- 0.908  # no practical difference, this should be  good enough
@
\noindent

Now, should we fit diameters or cross-sectional areas? In theory, areas would produce better volume estimates \citep{dyntaper}. On the other hand, researchers usually evaluate taper models based on diameter predictions. Let's stick to diameters here, and use the non-linear least-squares function \code{nls()}. It is a good idea to start with a simple model, so first, fit the exponential-exponential version:
<<>>=
expexp <- nls(dib ~ taper(h, H, k*Dob, c(b1, 0, b3, b4, 0), 1.35),
              data=brink, start=c(b1=4, b3=1, b4=1))
summary(expexp)
AIC(expexp)  # Akaike's criterion
@
\noindent
Free up the shape parameters:
<<error=TRUE, purl=FALSE>>=
full <- nls(dib ~ taper(h, H, k*Dob, c(b1, b2, b3, b4, b5), 1.35),
            data=brink, start=c(coef(expexp), b2=0, b5=0))
@
\noindent
This is a common issue with \code{nls()}, here it may be triggered by over-parametrization. There are reportedly more robust alternatives. For instance, function \code{nlsLM()} from package \pkg{minpack.lm} succeeds in converging to a solution. Or simply, nudge \code{b5} a little to get the algorithm unstuck:
<<>>=
(full <- nls(dib ~ taper(h, H, k*Dob, c(b1, b2, b3, b4, b5), 1.35),
             data=brink, start=c(coef(expexp), b2=0, b5=0.1)))
AIC(full)
@
\noindent
A bit better, according to Akaike. Even better is a more parsimonious version suggested by the $b_2$ and $b_5$ estimates:
<<>>=
(hyplin <- nls(dib ~ taper(h, H, k*Dob, c(b1, -1, b3, b4, 1),
               1.35), data=brink, start=coef(expexp)))
AIC(hyplin)
bhl <- c(b1=2.225, b2=01, b3=0.2826, b4=1.087, b5=1)
@
\noindent
Do not take this too seriously, it is a small dataset.
Feel free to play around with other possibilities, plotting results, etc.

One could have fitted the model using the measured instead of the converted \code{Dib}. You might like to see how much difference that makes. Which is better?

Of course, if you want to publish, ordinary least-squares (OLS) won't cut it. Currently, using mixed-effects methods is a \emph{de facto} publication requirement (AI approaches are also accepted). Fear not, procedures similar to those above can be used with packages \pkg{nlme} or \pkg{lme4}. It actually helps if you do not understand how the methods work!
Oh, and do not use simple letters like $h$ and $H$, something like $\text{TreeHt}^\text{(tot)}_i$ makes equations more impressive.

The usual argument is that OLS assumptions of uncorrelated residuals are not valid, and therefore parameter error estimates and hypothesis tests are distorted. In practice, taper equations are used for prediction, and OLS has good prediction statistical properties independently of distributional assumptions. In fact, studies that have compared OLS and mixed-effects predictions have found OLS to be better \citep[e.~g.,][]{arias,he}.

Exercise: Think about the following questions
\begin{enumerate}
\item A certain empirical model has a parameter $b_6$ without biological meaning. What is the scientific interpretation of the hypothesis $b_6 = 0$?
\item A \emph{CAR} scheme is typically used to model correlations between measurements in the same tree. This is a time series technique that assumes that a measurement is linearly related to the previous measurement or measurements. In a taper model, should one assume that the linear relationship is with the measurement(s) to the left or to the right?
\item In a mixed-effects model, what does it mean that $b_6$ is ``random''?
\item One possible answer to the previous question is that the distribution of $b_6$ (generally assumed normal) reflects the distribution of the ``true'' parameter $b_6$ among the individuals of the target population. That seems to imply a simple random sample from the population. Are trees in taper datasets a random sample? Should they be?
\end{enumerate}

The pairs of measurements that can possibly have a non-zero correlation are those where both measurements belong to the same tree. What is their percentage relative to the total number of pairs?
<<>>=
n <- nrow(brink)  # number of measurements
totpairs <- n * (n - 1) / 2  # total pairs
(m <- table(brink$Tree))  # measurements per tree
corrpairs <- sum(m * (m - 1) / 2)  # pairs within trees
100 * corrpairs / totpairs  # % of possibly correlated pairs
@
\noindent
It is found that the percentage of possibly correlated pairs for a sample of $T$ trees is approximately $100 / T$. With typical database sizes, it seems unlikely that modeling or ignoring the correlations could make much of a difference.


\section{Validation}
\label{sec:val}

Model evaluation should consider not only overall statistics like root-mean-square error (RMSE) or AIC, but also the fit for various variable values. Observations in the lower or upper part of the stem, and for small or large trees. For instance, the following gives the mean bias for small, medium, and large trees, at lower, medium, and upper levels:
<<>>=
with(brink,
  tapply(residuals(hyplin), list(
    DBH = cut(Dob, quantile(Dob, 0:3 / 3), include.lowest=TRUE),
    RelHt = cut(h / H, 3)
  ), mean)
)
@
\noindent

Here are the RMSEs for the same data groups:
<<>>=
with(brink,
  tapply(residuals(hyplin), list(
    DBH = cut(Dob, quantile(Dob, 0:3 / 3), include.lowest=TRUE),
    RelHt = cut(h / H, 3)
  ), function(x) sqrt(mean(x^2)))
)
@
\noindent

To facilitate these analyses it may be convenient to write a function such as
<<>>=
gridify <- function(x, rows, cols, smmryfn, rbreaks, cbreaks){
  tapply(x, list(
    cut(rows, rbreaks, include.lowest=TRUE),
    cut(cols, cbreaks, include.lowest=TRUE)
  ), smmryfn)
}
@
\noindent

Exercises: (a) Re-calculate the biases using \code{gridify()}. (b) Use the above to plot bias or RMSE over 5 relative height levels, for 2 total height classes. (c) Calculate the number of trees in each class, grouping by dbh and by total height (hint: \code{smmryfn = length}). What would happen with more trees?

<<cleanup, include=FALSE, cache=FALSE>>=
options(oldopt)
@


\begin{thebibliography}{99} \addcontentsline{toc}{section}{References}

\bibitem[Arias-Rodil \emph{et al}(2015)Arias-Rodil \emph{et al}]{arias} Arias-Rodil, M., Diéguez-Aranda, U., Rodríguez Puerta, F., López-Sánchez, C. A., Canga Líbano, E., Cámara Obregón, A. and Castedo-Dorado, F. (2015) Modelling and localizing a stem taper function for \emph{{P}inus radiata} in {S}pain. \emph{Canadian Journal of Forest Research 45}(6), 647-–658.
(\url{https://doi.org/10.1139/cjfr-2014-0276}).

\bibitem[Brink and von Gadow(1986)Brink and von Gadow]{brink} Brink, C. and von Gadow, K. (1986) On the use of growth and decay functions for modelling stem profiles. \emph{EDV in Medizin und Biologi 17}, 20–-27.

\bibitem[Deleuze and Houllier(2002)Deleuze and Houllier] Deleuze, C. and Houllier, F. (2002) A flexible radial increment taper equation derived from a process-based carbon partitioning model. \emph{Annals of Forest Science 59}(2), 141-–154.
(\url{https://doi.org/10.1051/forest:2002001}).

\bibitem[Garc\'ia(2015)Garc\'ia]{dyntaper} Garc\'ia, O. (2015) Dynamic modelling of tree form. \emph{MCFNS 7}, 9–15. (\url{http://mcfns.net/index.php/Journal/article/view/MCFNS7.1_2}).

\bibitem[Gray(1956)Gray]{gray} Gray, H. R. (1956) The Form and Taper of Forest-Tree Stems. Oxford University, Imperial Forestry Institute, Institute Paper 32, 74p.
(\url{https://web.archive.org/web/20190801160752/http://www.bodley.ox.ac.uk/users/millsr/isbes/ODLF/IP32.pdf}).

\bibitem[He \emph{et al}(2021)He \emph{et al}]{he} He, P., Hussain, A., Shahzad, M. K., Jiang, L. and Li, F. (2021) Evaluation of four regression techniques for stem taper modeling of {D}ahurian larch (\emph{Larix gmelinii}) in Northeastern China. \emph{Forest Ecology and Management 494} 119336.
(\url{https://doi.org/10.1016/j.foreco.2021.119336}).

\bibitem[Koirala \emph{et al}(2021)Koirala \emph{et al}]{koirala} Koirala, A., Montes, C. R., Bullock, B. P. and Wagle, B. H. (2021) Developing taper equations for planted teak (\emph{Tectona grandis} L. f.) trees of central lowland Nepal. \emph{Trees, Forests and People 5} 100103.
(\url{https://doi.org/10.1016/j.tfp.2021.100103}).

\bibitem[Larson(1963)Larson]{larson} Larson, P. R. (1963) Stem Form Development of Forest Trees. \emph{Forest Science Monograph 5}, Society of American Foresters, Washington, DC.
(\url{https://doi.org/10.1093/forestscience/9.s2.a0001}).

\bibitem[Mitchell(1975)Mitchell]{mitchell} Mitchell, K. J. (1975) Dynamics and Simulated Yield of {D}ouglas-Fir. \emph{Forest Science Monograph 17}, Society of American Foresters, Washington, DC.

\bibitem[Valentine \emph{et al}(2012)Valentine \emph{et al}]{valentine} Valentine, H.T., Mäkelä, A., Green, E.J., Amateis, R.L., Mäkinen, H. and Ducey, M.J. (2012) Models relating stem growth to crown length dynamics: application to loblolly pine and {N}orway spruce. \emph{Trees 26}, 469-–478.
(\url{https://doi.org/10.1007/s00468-011-0608-0}).

\end{thebibliography}


\appendix

\begin{center} \section*{Appendix --- Mathematical derivations} \end{center}
 \addcontentsline{toc}{section}{Appendix: Mathematical derivations}

\subsection*{Decay function and its integrals}
\label{app:decay}

\subsubsection*{Decay function}
\label{sec:Adelta}

\begin{equation} \label{eq:delta}
    \delta(x, p) = \begin{cases} \exp(-x) & \text{if } p = 0 \\
                                 (1 - p x)_{+}^{1/p} & \text{if } p \ne 0
                   \end{cases}
\end{equation}

\subsubsection*{Integral}
\label{sec:Id}

\[
    I_\delta(x, p) \equiv \int_0^x \delta(y, p) \dd y = \int_0^x (1 - p y)_{+}^{1/p} \dd y
\]
if $p \neq 0$.
With $1 - p y \equiv u$, $y = (1 - u) / p$, and assuming $p+1 \neq 0$,

\[
    I_\delta(x, p) = \frac{1}{p} \int_{1-p x}^1 u_+^{1/p} \dd u
      = \frac{1}{p+1} \left[ 1 - (1-px)_+^{1/p + 1} \right] \;.
\]
Noticing that $1/p + 1 = \frac{p+1}{p}$,
\[
    I_\delta(x, p) = \frac{1}{p+1} \left\{ 1 - \delta[(p+1) x, \frac{p}{p+1}]\right\} \;.
\]
It works also for $p = 0$:
\[
    I_\delta(x, 0) = \int_0^x \exp(-y) \dd y = 1 - \exp(x) = 1 - \delta(x, 0)\;.
\]
If $p + 1 = 0$, i.e., $p = -1$,
\[
    I_\delta(x, -1) = \int_1^{1 + x} y^{-1} \dd y = \ln(x + 1) \;.
\]

Therefore, in general,
\begin{equation} \label{eq:intdelta}
    I_\delta(x,p) \equiv \int_0^x \delta(y, p) \dd y = \begin{cases}
                                             \ln(x+1) & \text{if } p = -1 \\
                                             \frac{1}{p+1} \left\{1 - \delta[(p+1) x, \frac{p}{p+1}] \right\} & \text{otherwise}
                              \end{cases}
\end{equation}
For $p=0$ this simplifies to
\begin{equation*} \label{eq:intd9}
    I_\delta(x,0) = 1 - \e^{-x} \;.
\end{equation*}


\subsubsection*{Double integral}
\label{sec:Idd}

\[
    I_{\delta\delta}(x,p) \equiv \int_0^x \int_0^y\delta(z, p) \dd z \dd y
        = \int_0^x I_\delta(y,p) \dd y \; .
\]

If $p = -1$,
\[
    I_{\delta\delta}(x,-1)
        = \int_0^x I_\delta(y,-1) \dd y = \int_0^x \ln(y+1) \dd y
        = (x+1) \ln(x+1) - x \;.
\]
Else,
\begin{align*}
    I_{\delta\delta}(x,p)
        &= \int_0^x I_\delta(y,p) \dd y
        = \frac{1}{p+1} \int_0^x \left\{1 - \delta[(p+1) y, \frac{p}{p+1}] \right\} \dd y \\
        &= \frac{x}{p+1} - \frac{1}{(p+1)^2} I_\delta[(p+1) x, \frac{p}{p+1}] \;.
\end{align*}

Then, if $p/(p+1) = -1$, i.~e., $p = -1/2$,
\[
    I_{\delta\delta}(x,-\tfrac{1}{2})
        = 2 x - 4 I_\delta[x / 2, -1] = 2 x - 4 \ln(x/2 + 1) \;.
\]
Otherwise, if $p \neq -1$ and $p \neq -1/2$,
\begin{align*}
    I_{\delta\delta}(x,p)
        &= \frac{x}{p+1} - \frac{1}{(p+1)^2} I_\delta[(p+1) x, \frac{p}{p+1}] \\
        &= \frac{x}{p+1} - \frac{1}{(p+1)(2p+1)} \left\{1 - \delta[(2p+1) x, \frac{p}{2p+1}] \right\}\;.
\end{align*}

Summarizing,
\begin{equation} \label{eq:ddI}
    I_{\delta\delta}(x,p) = \begin{cases}
        (x+1) \ln(x+1) - x & \text{if } p = -1 \\
        \frac{x}{p+1} - \frac{1}{(p+1)^2} I_\delta[(p+1) x, \frac{p}{p+1}] & \text{otherwise.} \\
        \end{cases}
\end{equation}
If $p = 0$ this simplifies to
\begin{equation*} \label{eq:dd0}
    I_{\delta\delta}(x,0) = x -1 + \e^{-x} \;.
\end{equation*}


\subsection*{Taper}
\label{sec:Ataper}

The cross-sectional area taper equation is
\begin{equation} \label{eq:staper}
    s(h, H, S) = S \frac{s(h, H)}{s(h_b, H)} \;,
\end{equation}
where $S$ is area at breast height $h_b$, and
\[
    s(h, H) = \Phi(H-h) + (H-h) \eta(h) \;,
\]
according to Section \ref{sec:intro}, or see equations (2) and (3) of \citet{dyntaper}.
Here,
\[
    \Phi(x) = \int_0^x \varphi(x) \dd x = \int_0^x [1 - \delta(x/b_1, b_2)]
    = x - b_1 I_\delta(x/b_1, b_2) \;,
\]
and $\eta(h) = b_3 \delta(h / b_4, b_5)$. Therefore,
\begin{equation} \label{eq:As}
    s(h, H) = H - h - b_1 I_\delta[(H-h)/b_1, b_2] + b_3 (H - h) \delta(h / b_4, b_5)
\end{equation}
(eqns.~(7) and (8) of \citet{dyntaper}). Note that the units for diameter and height do not need to be the same, conversion factors cancel out.


\subsection*{Volumes}
\label{sec:Avolumes}


The volume between any two heights $h_1$ and $h_2$ is obtained by integration:
\[
    v(h_1, h_2, S, H) = k \left|\int_{h_1}^{h_2} s(h, H, S) \dd h \right|
     = \frac{k S}{s(h_b, H)} \left|\int_{h_1}^{h_2} s(h, H) \dd h \right| \;.
\]
Here $k$ adjusts for any difference in measurement units between diameters and heights. E.~g., if $s$ is in cm$^2$ and $h$ is in meters, then $k = 10^{-4}$ for volume in cubic meters. If $s$ is in square inches and $h$ is in feet, then $k = 1/144$ for $v$ in cubic feet.

We can also write
\begin{equation} \label{eq:vol}
    v(h_1, h_2, S, H)= \frac{S}{s(h_b, H)} \left|I_s(h_2) - I_s(h_1)\right| \;,
\end{equation}
where $I_s(h)$ is the indefinite integral
\[
    I_s(h) = \int s(h, H) \dd h = \int \Phi(H-h) \dd h + \int (H-h) \eta(h) \dd h  \;.
\]
From eq.~\eqref{eq:As},
\[
    I_s(h) = Hh - h^2/2 + b_1^2 I_{\delta\delta}[(H-h)/b_1, b_2] + b_3 \int (H - h) \delta(h / b_4, b_5) \dd h \;.
\]
The last integral can be obtained through integration by parts with $u = (H - h)$, $\dd v = \delta(h/b_4, b_5) \dd h$,
noting that $v = b_4 I_\delta(h/b_4, b_5)$:
\begin{align*}
    \int u \dd v &= uv - \int v \dd u = b_4 (H - h) I_\delta(h/b_4, b_5)
        + b_4 \int I_\delta(h/b_4, b_5) \dd h \\
        &= b_4 (H - h) I_\delta(h/b_4, b_5)
        + b_4^2 I_{\delta\delta} (h/b_4, b_5) \;.
\end{align*}
Therefore,the volume is given by eq.~\eqref{eq:vol} and
\begin{multline} \label{eq:Is}
    I_s(h) = Hh - h^2/2 + b_1^2 I_{\delta\delta}[(H-h)/b_1, b_2] + \\
     b_3 b_4 (H - h) I_\delta(h/b_4, b_5) +
        b_3 b_4^2 I_{\delta\delta} (h/b_4, b_5) \;.
\end{multline}

\end{document}