\documentclass[11pt, a4paper]{article}
%\VignetteIndexEntry{Sound analysis}
%\VignettePackage{seewave}
%\VignetteKeyword{sound}
%\VignetteKeyword{time-series}
%\VignetteDepends{tuneR}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage[hmargin=2.5cm,vmargin=3cm]{geometry}
\usepackage{verbatim}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{caption}  % to allow line break in figure captions
\usepackage{hyperref}
\hypersetup{
    unicode=false,          % non-Latin characters in Acrobat’s bookmarks
    pdftoolbar=false,        % show Acrobat’s toolbar?
    pdfmenubar=false,        % show Acrobat’s menu?
    pdffitwindow=true,      % page fit to window when opened
    pdftitle={seewave_analysis},    % title
    pdfauthor={Jerome SUEUR},     % author
    pdfsubject={seewave_analysis},   % subject of the document
    pdfnewwindow=true,      % links in new window
    pdfkeywords={keywords}, % list of keywords
    colorlinks=true,       % false: boxed links; true: colored links
    linkcolor=blue,          % color of internal links
    citecolor=green,        % color of links to bibliography
    filecolor=magenta,      % color of file links
    urlcolor=blue           % color of external links
}
\usepackage{Sweave}
\usepackage{fancyhdr}
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0.5pt}
\renewcommand{\footrulewidth}{0.5pt}
\fancyhead[C]{Introduction to sound analysis with \texttt{seewave}}
\fancyhead[LR]{}
\fancyfoot[L]{\textit{J. Sueur}}
\fancyfoot[C]{\thepage}
\fancyfoot[R]{\today}
\setlength{\parindent}{0pt}
\newcommand{\pkg}{\textsf}
\newcommand{\R}{\pkg{R}}
\newcommand{\signal}{\pkg{signal}}
\newcommand{\tuneR}{\pkg{tuneR}}
\newcommand{\audio}{\pkg{sound}}
\newcommand{\seewave}{\pkg{seewave}}
\newcommand{\soundecology}{\pkg{soundecology}}
\newcommand{\monitoR}{\pkg{monitoR}}


\author{Jérôme Sueur\\
Muséum national d'Histoire naturelle\\
CNRS UMR 7205 ISYEB, Paris, France\\
}

\title{\includegraphics[width=0.4\textwidth]{seewave_logo}  \\
\bigskip
A very short introduction to sound analysis for those who like elephant trumpet calls or other wildlife sound\\
\bigskip
}

\begin{document}
\definecolor{Soutput}{rgb}{0,0,0.56}
\definecolor{Sinput}{rgb}{0.56,0,0}
\DefineVerbatimEnvironment{Sinput}{Verbatim}
{formatcom={\color{Sinput}},fontsize=\footnotesize, baselinestretch=0.75}
\DefineVerbatimEnvironment{Soutput}{Verbatim}
{formatcom={\color{Soutput}},fontsize=\footnotesize, baselinestretch=0.75}
\setkeys{Gin}{width=0.9\textwidth}   % relative size of the plots in the text (\set includegraphics width options)
\SweaveOpts{width=6,height=4} % change the size of the plots when they are generated by R
% to shrink the margins of all graphics
<<echo=false>>=
options(SweaveHooks=list(fig=function()
par(mar=c(5.1, 4.1, 1, 1))))
@


\maketitle

This document is a very brief introduction to sound analysis principles. It is mainly written for students starting with bioacoustics. The content should be updated regularly. Demonstrations are based on the package \seewave. For more details see the book \href{Sound analysis and synthesis with R}{https://www.springer.com/us/book/9783319776453}.

\tableofcontents
\newpage

%%%%%%%%%%%%%%%%%%%%
%% DIGITIZATION
%%%%%%%%%%%%%%%%%%%%

\section{Digitization}

\subsection{Sampling}

%% ADD ALIASING, NYQUIST-SHANNON SAMPLING THEOREM

Digital recording is not a continuous but a discrete process of data acquisition. Sound is recorded through regular samples. These samples are taken at a specified rate, named the sampling frequency or sampling rate $f$ given in Hz or kHz. The most common rate is 44,100 Hz = 44.1 kHz but lower rate can be used for low frequency sound (\textit{e.g.} 22.05 kHz) or higher rate can be used for high frequency sound (up to 192 kHz).\par

Figure \ref{fig:sampling} shows 5 ms of a pure tone sound (440 Hz) sampled at 44.1 kHz and 22.05 kHz respectively. The discretization of sound digitization should not be underestimated as a too low sampling rate can lead to frequency artefacts.

\begin{figure}
\centering
<<samplingfrequency, fig=TRUE, echo=FALSE>>=
library(seewave)
s.44kHz <- synth(d=0.005, cf=440, f=44100, out="Wave")
s.11kHz <- synth(d=0.005, cf=440, f=22050, out="Wave")
layout(matrix(1:2, nr=2), height=c(0.8,1))
par(mar=c(0, 4, 1, 1), oma=c(1,1,0,0))
oscillo(s.44kHz, type="o", xaxt="n", tlab="", alab="", cex=0.5)
par(mar=c(4.5,4,0,1))
oscillo(s.11kHz, type="o", tlab="", alab="", cex=0.5)
mtext("Time (s)", side=1, out=TRUE, at=0.55, line=-1)
mtext("Amplitude", side=2, out=TRUE, line=-2, at=0.5)
@
\caption{\label{fig:sampling} Digital sound is a discrete process along a time scale: the same sound sampled at two different rates: 44.1 kHz (above) and 22.05 kHz (bottom) respectively.}
\end{figure}

\subsection{Quantisation}
Another important parameter of digitization is the process of quantisation that consists in assigning a numerical value to each sample according to its amplitude. These numerical values are attributed according to a bit scale. A quantisation of 8 bit will assign amplitude values along a scale of $2^{8} = 256$ states around $0$ (zero). Most recording systems use a $2^{16} = 65 536$ bit system.\par

Quantisation can be seen as a rounding process. A high bit quantisation will produce values close to reality, \textit{i. e.} values rounded to a high number of significant digits, when a low bit quantisation will produce values far from reality, \textit{i. e.} values rounded a low number of significants digits. Low quantisation can lead to impaired quality signal.


\begin{figure}
\centering
<<quantisation, fig=TRUE, echo=FALSE>>=
par(mar=c(1,4,1,1))
plot(sin(seq(0,2*pi,length=100)), type="l", ann=FALSE, xaxt="n", yaxt="n", col="red", lwd=2, ylim=c(-1.3,1))
par(new=TRUE)
barplot(c(c(1,2,3,3,3,3,2,1)+0.5,c(-1,-1,-2,-3,-3,-3,-3,-2,-1)-.5),yaxt="n", space=1.5, ylim=c(-4.5,3.5), col="#00000025")
y <- seq(-4.5, 3.5, by=1)
#y <- seq(-3,3, len=2^3+1)
abline(h=y, col="grey")
abline(h=0, lwd=2)
axis(side=2, at=y[-length(y)]+0.5, label=c("100", "101", "110", "111", "000", "001", "010", "011"), las=1, tick=FALSE)
box()
@
\caption{\label{fig:quantisation} Digital sound is a discrete process along amplitude scale: a 3 bit ($=2^3=8$) quantisation (grey bars) gives a rough representation of a continuous sine wave (red line). }
\end{figure}


\subsection{File format}

Wihtin \R, digitized sound can be stored in three categories of files:
\begin{itemize}
\item uncompressed format (\texttt{.wav}): the full information is stored in a heavy file.
\item lossy compressed format (\texttt{.mp3}): the information is reduced. Time, amplitude and frequency parameters can be impaired.
\item losslessly compressed format (\texttt{.flac}): the full information is stored in a reduced size file.
\end{itemize}

All these formats generate binary files, sound being encoded into a succession of \texttt{0} and \texttt{1}. When importing these formats into \R{} through \tuneR{} the data are transformed into a decimal format. This implies an important increase in data size.


\section{Amplitude envelope}

The amplitude envelope or amplitude contour is the profile of sound energy over time. The envelope can be expressed along a relative or an absolute energy scale. There are two ways to obtain a relative amplitude envelope (see Figure \ref{fig:envelope}):
\begin{itemize}
\item by computing the absolute value of the waveform,
\item by computing the Hilbert transform of the waveform.
\end{itemize}

An example of the two envelopes types is shown in the figure \ref{fig:envelope} for the song of the bird \textit{Zonotrichia capensis} (Figure \ref{fig:tico-pict}) included in the \texttt{tico} data.

\begin{figure}
\centering
<<envelope, fig=TRUE, echo=false>>=
data(tico) 
env(tico, envt="abs", colwave="grey")
par(new=TRUE)
env(tico, colwave=2)
legend("topright", lty=1, col=c("grey",2), legend=c("Absolute", "Hilbert"), bty="n")
@
\caption{\label{fig:envelope} Two ways to compute the amplitude envelope of a sound: the absolute value or the Hilbert transform of the time wave. }
\end{figure}

\begin{figure}
\centering
\includegraphics{Zonotrichia_capensis}
\caption{\label{fig:tico-pict} The rufous-collared sparrow \textit{Zonotrichia capensis} also named tico-tico in Portuguese. Picture by Ladislav Nagy, Wikimedia Commons.}
\end{figure}

The envelope can then be used to measure the duration of the different temporal parts of the sound as shown in figure \ref{fig:timer} using the function \texttt{timer()} or to analyse the amplitude modulation rates with the function \texttt{ama()}.\par

\begin{figure}
\centering
<<timer, echo=false, fig=true>>=
timer(tico,f=22050,threshold=5,msmooth=c(50,0))
@
\caption{\label{fig:timer} Use of the amplitude envelope to automatically measure the temporal pattern of a sound.\\
\texttt{timer(tico,f=22050,threshold=5,msmooth=c(50,0))}}
\end{figure}


\section{Discrete Fourier Transform (DFT)}

\subsection{Definitions and principle}

\begin{figure}
\centering
\includegraphics[width=0.4\textwidth]{Joseph_Fourier}
\caption{\label{fig:Fourier} Joseph Fourier around 1823. Engraving by Jules Boilly (Public Domain)}
\end{figure}


Start first with some terminology:

\begin{description}
\item[Fourier transform (FT)] This is a reversible mathematical transform named after the French mathematician Joseph Fourier (1768-1830) (Figure \ref{fig:Fourier}). The transform decomposes a time series into a sum of finite series of sine or cosine functions.
\item[Fast Fourier Transform (FFT)] This is an algorithm to compute quickly the FT. 
\item[Discrete Fourier transform (DFT)] This is a specific form of the FT applied to a time wave, typically a sound. Each sine / cosine function has a specified frequency and a relative amplitude. These two parameters are used to build the frequency spectrum of the original time wave. The DFT is then a way to switch from the \textit{time domain} to the \textit{frequency domain}.
\end{description}

The signal \textit{s} depicted in the figure \ref{fig:s} was made by the addition of three original waves with three different carrier frequencies $\omega_{i}$: $\omega_{1} = 1$ kHz, $\omega_{2} = 2$ kHz, and $\omega_{3} = 3$ kHz. The waves were added in phase ($\Phi = 0$) but with three differrent relative amplitudes : $a_{1} = 1$, $a_{2} = 0.5$, and $a_{3} = 0.25$). The carrier frequencies $\omega_{i}$ and the relative amplitude of each sine function can be plotted in X-Y graph as shown in figure \ref{fig:s-decomposed}. This graph is a \textbf{frequency spectrum}.

\begin{figure}
\centering
<<s, fig=TRUE, echo=FALSE>>=
d <- 0.05
f <- 8000
s1 <- synth(d=d, f=f, cf=1000, a=1, out="Wave")
s2 <- synth(d=d, f=f, cf=2000, a=0.5, out="Wave")
s3 <- synth(d=d, f=f, cf=3000, a=0.25, out="Wave")
s <- s1+s2+s3
oscillo(s)
@
\caption{\label{fig:s} A time wave $s$ sampled at 8000 Hz for 0.05 s.}
\end{figure}


\begin{figure}
\centering
<<s-decomposed, fig=TRUE, echo=FALSE, return=hide>>=
op <- layout(matrix(1:3, nr=3), height=c(1,0.8,1.2))
par(mar=c(0,4,1,1), oma=c(1,0,0,0))
oscillo(s1, labels=FALSE,xaxt="n", bty="o")
legend("topright", legend="1 kHz", bty="n", bg="white", cex=3, text.col=2)
par(mar=c(0,4,0,1))
oscillo(s2, labels=FALSE,xaxt="n", bty="o")
legend("topright", legend="2 kHz", bty="n", bg="white", cex=3, text.col=2)
par(mar=c(4.5,4,0,1))
oscillo(s3, labels=FALSE, bty="o")
legend("topright", legend="3 kHz", bty="n", bg="white", cex=3, text.col=2)
mtext("Time (s)", side=1, out=TRUE, at=0.55, line=-1)
mtext("Amplitude", side=2, out=TRUE, line=-2, at=0.5)
par(op)
@
\caption{\label{fig:s-decomposed} Decomposition of the time wave $s$ into three sine functions. See figure \ref{fig:s}.}
\end{figure}


\begin{figure}
\centering
<<s-spec, echo=FALSE, fig=TRUE>>=
x <- 1:3
y <- c(1, 0.5, 0.25)
plot(x, y, xlim=c(0,4), ylim=c(0,1), pch=19, xlab=expression(paste("Frequency ", omega[i], " (kHz)")), ylab=expression(paste("Relative amplitude ", a[i], " (no scale)")), las=1)
segments(x0=x,y0=0, y1=y)
@
\caption{\label{fig:s-spec} Frequency spectrum of the signal $s$. See figures \ref{fig:s} and \ref{fig:s-decomposed}.}
\end{figure}


\subsection{Complete sound}

The number of sine functions $n$ is determined by the number of samples
$N$ of the original time wave following $n = 0.5 \times N$. If the DFT is computed on \texttt{tico} data, which includes 39,578 samples, the DFT will decompose the sound into $0.5 \times 39578 = 19789$ sine functions. The first sine function will have a frequency $w_{1} = f_{s} / N = 22050 / 39578 = 0.557$ Hz (Figure \ref{fig:DFT}). This is equivalent to the frequency resolution $\Delta_{f}$ of the decomposition.

\begin{figure}
\centering
<<DFT, echo=FALSE, fig=TRUE>>=
spec(tico, col="grey")
local <- spec(tico, wl = 512, at = 1.1, plot=FALSE)
mean <- meanspec(tico, plot=FALSE)
lines(local, col=2)
lines(mean, col=4)
legend("topright", legend=c("DFT on complete sound", "DFT on a sound section", "Mean spectrum (STFT)"), lty=1, col=c("grey", "red", "blue"), bty="n")
@
\caption{\label{fig:DFT} Three categories of frequency spectra computed on \texttt{tico} : (\textit{1}) the spectrum of complete sound, (\textit{2}) the spectrum computed at 1.1 s with a 512 samples window, and (\textit{3}) the mean spectrum computed with the STFT (see section \ref{sec:STFT}).}
\end{figure}


\subsection{Sound section}

Such a high frequency resolution is often not required, if not irrelevant. In addition, computing the FFT of the whole sound might not be appropriate if there is frequency modulation along time, \textit{i}. \textit{e}. the frequency of the sound is not constant along the time scale. \par

A first solution is to compute the DFT locally, on a specific sound section. The size of this section, or window, can be set up in seconds or in number of samples, a more accurate solution. \par

We can, for instance, compute the DFT in the middle of the third note produced by the tico bird that is at 1.1 s (Figure \ref{fig:DFT}). The length of the FFT is controlled with the argument \texttt{wl} for \textbf{w}indow \textbf{l}ength. If we choose a window size of 512 samples, we will end up with a decomposition into $0.5 \times 512 = 256$ sine functions with a frequency precision $\Delta_{f} = 22050 / 512 = 43.07$ Hz. \par

Increasing the window size will increase frequency resolution but the decomposition will be less accurate in terms of time as more signal will be selected. Inversely, reducing the window size will be more specific in terms of time (position) but the frequency resolution will decrease. This trade-off is an example of the uncertainty or Heisenberg principle that stipulates that there is a limit in the precision of pairs of parameters, here the time and frequency parameters. 

\subsubsection{Window shape}

When computing the DFT, the shape of analysis window is by default a rectangle. However this shape is not always appropriate as it induces artefacts like side frequency lobes. A way to avoid this is to multiply the original window with a function of the same length with a particular shape. This shape can be rectangular -- in that case nothing is changed to the original signal -- triangular (Bartlett window), or sinusoidal (Blackman, Hamming, flat top, and Hanning windows) (see figure \ref{fig:ftwindow} for three examples of window shapes and figure \ref{fig:ftwindow-test} for a test on a simple signal). \par

The default window shape used in \seewave{} is the Hanning window but other windows could be more appropriate depending on main signal features.

\begin{figure}
\centering
<<ftwindow, echo=false, results=hide, fig=true>>=
a<-ftwindow(512,wn="rectangle")
b<-ftwindow(512,wn="bartlett")
c<-ftwindow(512, wn="hanning")
all<-cbind(a,b,c)
matplot(all,type="l", col=c("green", "red", "blue"), lty=1, ylab="Amplitude", xlab="Index")
legend(x=380,y=0.95, legend=c("Rectangle", "Bartlett", "Hamming"), col=c("green", "red", "blue"), lty=1, bty="n")
@
\caption{\label{fig:ftwindow} Three different Fourier window shapes. Try \texttt{example(ftwindow)} for other shapes.}
\end{figure}


\begin{figure}
\centering
<<ftwindow-test, echo=false, fig=true>>=
dB <- "max0"
spec(s, wn="rectangle", dB=dB, alim=c(-100,0), col="darkgreen")
bartlett <- spec(s, wn="bartlett", dB=dB, plot=FALSE)
lines(bartlett, col="red")
hamming <- spec(s, wn="hanning", dB=dB, plot=FALSE)
lines(hamming, col="blue")
legend("topright", legend=c("Rectangle", "Bartlett", "Hanning"), col=c("darkgreen", "red", "blue"), lty=1, bty="n", cex=0.75)
@
\caption{\label{fig:ftwindow-test} Using different DFT window shapes can lead to different spectral profiles. The frequency spectrum of the signal $s$ (see figure \ref{fig:s}) was computed with three different window shapes. Side lobes differ between the three spectra}
\end{figure}

\section{Short-time Fourier Transform (STFT)}
\label{sec:STFT}

\subsection{Principle}

Computing the FFT on the whole sound or a single section might not be informative enough. An intuitive solution is to compute the DFT on
successive sections along the signal. A window is then slided along the signal and a DFT is computed at each slide or jump. This is what the short-time Fourier
transform (STFT) does.\par

A good way to understand how it works is to use the function \texttt{dynspec()}. The successive DFT can be tracked when moving along the signal with a sliding cursor. Here is an example with a DFT window of 1024 samples: \par

<<dynspec, eval=FALSE>>=
dynspec(tico, wl=1024, osc=TRUE)
@

Basically the STFT returns a matrix of values where columns are the successive spectra along time. This can be summarized as a $a_{np}$ matrix : with $a_{ij}$ the Fourier coefficients, $n$ the number of frequency $\omega$ (window size / 2) and $p$ the number of Fourier windows computed along the signal:

$$
  \bordermatrix{           & t_{1} & \ldots & t_{j} & \ldots & t_{p}   \cr
                \omega_{1} & a_{11} & \ldots & a_{1j} & \ldots & a_{1p} \cr
                \vdots     & \vdots & \ddots & \vdots & \ddots & \vdots \cr 
                \omega_{i} & a_{i1} & \ldots & a_{ij}& \ldots & a_{ip} \cr
                \vdots     & \vdots & \ddots & \vdots & \ddots & \vdots \cr
                \omega_{n} & a_{n1} & \ldots & \ldots & \ldots & a_{np}\cr
                }
$$

This matrix is nice but a plot of it would even be better. There are three ways to friendly visualize this matrix:
\begin{itemize}
\item a waterfall plot, see the function \texttt{wf()},
\item a density plot, or spectrogram, see the function \texttt{spectro()}.
\item a 3D plot, or 3D-spectrogram, see the function \texttt{spectro3D()},
\end{itemize}

\subsection{Spectrogram}

\subsubsection{3D in a 2D plot}
The density plot option, or \textbf{spectrogram}, is the most popular representation used in bioacoustics. It has the main advantage not to be based on a 3D representation that is not appropriate for human eye inspection. The principle is quite simple: the successive DFT are plotted against time with the relative amplitude of each sine function of each DFT being depicted in reference to a colour scale. The usual way is to use a dB scale for amplitudes with negative dB values refering to a maximum of 0 dB. \par

If we keep the \texttt{tico} example, applying a STFT with a sliding window of 512 samples will result in the computation of $39578 / 512 = 77$ FFTs. The spectrogram will therefore be made of 77 sections (columns) as shown in figure \ref{fig:spectro-grid}. \par

\begin{figure}
\centering
<<spectro-grid, echo=false, fig=true>>=
spectro <- spectro(tico, grid=FALSE, scale=FALSE)
abline(v=seq(0, 1.795, length=78), col="lightgrey")
box()
@
\caption{\label{fig:spectro-grid} STFT on \texttt{tico} song showing 77 successive FFT windows containing 512 samples each.}
\end{figure}


\subsubsection{Overlap}

The temporal and the frequency resolutions of the spectrogram are linked, with $\Delta_{f} = \Delta_{t}^{-1}$. In the latter case, the frequency
resolution is $22050 / 512 = 43.06$ Hz and the time resolution is $512/22050 = 0.0232 s$. As mentionned above, increasing the size of the window will increase
frequency resolution but decrease time resolution.  \par

However, there is a trick to counteract this two-dimension precision limit. In the first example, the DFT window was coarsely jumping from
a position to another but we can make the jump slightly better. The solution is to simply allow an overlap between successive
windows. This overlap is usually set up in percentage: the default value is 0\% as in figure \ref{fig:spectro-grid}. A percentage of 50\%
will double the number of DFTs, hence increasing the time resolution by a factor of 2 (now 153 FFTs) when the frequency resolution is not reduced. The overlap parameter is set up with the argument \texttt{ovlp} of
the function \texttt{spectro()}(see figure \ref{fig:spectro-grid-ovlp}). A value of 100\% is of course a non-sense as the sliding window will stay on the spot. Increasing the
overlap inscrease computing time as more FFT are computed. We advice to keep reasonable values for computing efficiency. \par

\begin{figure}
\centering
<<spectro-grid-ovlp, results=hide, echo=false, fig=true>>=
spectro.ovlp <- spectro(tico, ovlp=50, scale=FALSE, grid=FALSE)
abline(v=seq(0, 1.795, length=78*2), col="lightgrey")
box()
@
\caption{\label{fig:spectro-grid-ovlp} STFT  on \texttt{tico} song showing the 50\%
  overlapping FFT windows. Each of the 144 FFTS contains 512 samples.}
\end{figure}

\subsubsection{Values}
The spectrogram is a graphical function but the values
along the three scales can be saved. The value of
\texttt{spectro()} is a list containing three components:
\begin{itemize}
  \item \texttt{\$time} or \texttt{[[1]]} returns the values of the time axis,
\item \texttt{\$frequency}  or \texttt{[[2]]} returns the values of the frequency axis,
\item \texttt{\$amp}  or \texttt{[[3]]} returns the amplitude values of the successive FFT
  decompositions or spectra.
\end{itemize}

These components can be used to plot the spectrogram manually (Figure \ref{fig:spectro-components}). The successive spectra computed by the successive DFT can also be picked up and plot as the function \texttt{spec()} would do (Figure \ref{fig:spectro-spec}). \par

\begin{figure}
   \centering
<<spectro-components, echo=FALSE, fig=TRUE>>=
op <- par(mar=rep(1,4))
filled.contour(x=spectro$time, y=spectro$freq, z=t(spectro$amp))
par(op)
@
\caption{Redrawing the graphical output of the function \texttt{spectro()} with the native \texttt{filled.contour()} function, with the command: \texttt{filled.contour(x=spectro[[1]], y=spectro[[2]], z=t(spectro[[3]]))}.}
\label{fig:spectro-components}
\end{figure}


\begin{figure}
  \centering
<<spectro-spec, echo=FALSE, fig=TRUE>>=
plot(x=spectro$freq, y=spectro$amp[,15], type="o", xlab="Frequency (kHz)", ylab="Relative amplitude (dB)")
@
  \caption{One of the dB spectra computed by the STFT.\\
  \texttt{plot(x=spectro[[2]], y=spectro[[3]][,15], type="o", xlab="Frequency (kHz)", ylab="Relative amplitude (dB)")}
}
  \label{fig:spectro-spec}
\end{figure}

\subsection{Mean spectrum}

The columns of the STFT matrix can be averaged giving the so-called mean or average spectrum as shown in the figure \ref{fig:DFT}. The frequency resolution and the shape of the mean spectrum will of course change when changing the window size (\texttt{wl}) and overlap (\texttt{ovlp}) arguments. \par

The function to compute the mean spectrum is \texttt{meanspec()}.

\section{Instantaneous frequency}

\subsection{Zero-crossing}

The zero-crossing is a rather simple technique which consists in measuring successive time intervals at which the wave crosses the zero amplitude line. This gives a measure of the period $T$ of a full cycle and the instantaneous frequency is obtained by simply computing $f = T^{-1}$. The signal has to be quite periodic to make the method reliable.\par

The main problem in zero crossing procedure is linked to the discrete process of sound sampling. The signal to be analysed might not always have values equal or very close to zero. This makes the zero-crossing results quite approximative. An example of this issue is illustrated in the upper panel of the figure \ref{fig:zc}. It is therefore sometimes necessary to oversample the signal by interpolation. This process adds values closer to zero and then increase the accuracy of the measure as shown in the lower panel of the figure \ref{fig:zc}. An example of such measure on the usual \texttt{tico} song in the figure \ref{fig:zc-tico}. \par

\begin{figure}
  \centering
<<zc, fig=TRUE, echo=FALSE>>=
s <- synth(d=0.05, f=8000, cf=440)
s.interpol <- approx(s, n = length(s)*10)
s.interpol <- as.matrix(s.interpol$y)
layout(matrix(1:2, nr=2), height=c(0.8,1))
par(mar=c(0,4,1,1))
oscillo(s, f = 8000, type="o", cex = 0.75, to=0.01, xaxt="n", alab="", tlab="")
arrows(x0=0.0046, x1=0.006855, y0=0, y1=0, col="red", angle=90, length=0.1, lwd=1.5, code=3)
par(mar=c(4.5,4,0,1))
oscillo(s.interpol, f = 8000*10,  cex = 0.75,  type="o", to=0.01, alab="", tlab="")
arrows(x0=0.0046, x1=0.006855, y0=0, y1=0, col="red", angle=90, length=0.1, lwd=1.5, code=3)
mtext("Time (s)", side=1, out=TRUE, at=0.55, line=-1)
mtext("Amplitude", side=2, out=TRUE, line=-2, at=0.6)
@
  \caption{Zero-crossing: principle and interpolation to reduce innacurracy of measurement. The upper panel shows a 440 Hz signal sampled at 8000 Hz. The sampling is too low to measure properly the period betwee successive cycles. The lower pannel plots the same wave with $\times 10$ interpolation factor. New samples are added, it is now possible to measure the periodicty of the wave.}
  \label{fig:zc}
\end{figure}

\begin{figure}
  \centering
<<zc-tico, echo=FALSE, return=hide, fig=TRUE>>=
f <- 22050
sel <- cutw(tico, f=f, from=0.5, to=0.9)
layout(matrix(1:2, nr=2), height=c(0.8,1))
par(mar=c(0, 4, 1, 1), oma=c(1,1,0,0))
zc(sel, f=f, threshold=5, xlab="", ylab="", xaxt="n", cex=0.75, warning=FALSE)
par(mar=c(4.5, 4, 0, 1))
zc(sel, f=f, threshold=5, interpol=20,xlab="", ylab="", cex=0.75, warning=FALSE)
mtext("Time (s)", side=1, out=TRUE, at=0.55, line=-1)
mtext("Frequency (kHz)", side=2, out=TRUE, line=-1, at=0.6)
@
  \caption{Zero-crossing: measuring the instantaneous frequency of a note the \texttt{tico} song without (upper panel) and with interpolation (bottom panel).}
  \label{fig:zc-tico}
\end{figure}


\subsection{Hilbert transform}

The Hilbert transform is a decomposition of a signal $x(t)$ into the amplitude envelope and the instantaneous frequency. More specifically, the amplitude envelope is the modulus of the analytic signal, defined as $z(t) = x(t) + iy(t)$ where $y(t)$ is the Hilbert transform, and the instantaneous frequency is the derivative of the phase of $z(t)$ with respect to time. \par

The Hilbert transform can be thus used to track both amplitude and frequency modulations. \par

The amplitude enveloppe is obtained with the function \texttt{env()} and the instantaneous frequency is obtained with the function \texttt{ifreq()}.

\subsection{Cepstral transform}

The Cepstral transform is the inverse Fourier transform of the logarithm of the spectrum. The real cepstrum (an anagram of spectrum) is the real part of the Cestral transform. The scale of the independent variable (usually the \textit{y}-axis) of the cepstrum is named quefrequency. The quefrequency scale is not intuitive but can be transformed in frequency (Hz). The cepstrum is useful for detecting the fundamental frequency of an harmonic series, it corresponds to the first peak of the cepstrum as shown in the figure \ref{fig:ceps}. Note that this dectection will work properly with harmonic signals only. The function \texttt{cepstro()} is short-term version of the Cepstral function: successive cepstrum are computed along the signal with a sliding window in a similar way as the STFT (see section \ref{sec:STFT}). \par


\begin{figure}
  \centering
<<ceps, echo=FALSE, fig=TRUE>>=
data(peewit)
par(mar=c(4.5,4.5,4.5,2))
ceps(peewit, at=0.4, wl=1024, col=2, alim=c(0,2000))
@
 \caption{\label{fig:ceps} A cepstral analysis on a \textit{Vanellus vanellus} call (data \texttt{peewit}) with the following call: \texttt{ceps(peewit,at=0.4,wl=1024, col=2)}.}
\end{figure}


\section{Other transforms}
  
There are several other options to analyse a signal. Among others, we could list the following ones that are not included in \seewave:
\begin{itemize}
\item Mel-frequency cepstral transform, see the function \texttt{melfcc()} of the package \tuneR ~ [not tested]
\item Wavelet transform, see the packages \texttt{biwavelet}, \texttt{rwt}, \texttt{wavelets}, \texttt{waveslim}, \texttt{wavethresh} and \texttt{wmtsa} [not tested]. 
\item Gabor transform -- not yet implemented in \R.
\item Wigner-Ville transform -- not yet implemented in \R.
\end{itemize}

\section{References}

\subsection{Specific papers}
Deecke VB, Janik VM (2006) -- Automated categorization of bioacoustic signals: avoiding perceptual pitfalls. \textit{Journal of the Acoustical Society of America}, 119: 645-653.\\
Depraetere M, Pavoine S, Jiguet F, Gasc A, Duvail S, Sueur J (2012) -- Monitoring animal diversity using acoustic indices: implementation in a temperate woodland. \textit{Ecological Indicators}, 13: 46-54.\\
Gasc A, Sueur J, Jiguet F, Devictor V, Grandcolas P, Burrow C, Depraetere M Pavoine S (2013) -- Assessing biodiversity with sound: do acoustic diversity indices reflect phylogenetic and functional diversities of bird communities? \textit{Ecological Indicators}, 25 : 279-287. \\
Gasc A, Sueur J, Pavoine S, Pellens R, Grandcolas P (2013b) -- Biodiversity sampling using a global acoustic approach: contrasting sites with micro-endemics in New Caledonia. \textit{PLoS ONE}, 8(5): e65311.\\
Kasten EP, Gage SH, Fox J, Joo W (2012) -- The remote environmental assessment laboratory's acoustic library: an archive for studying soundscape ecology. \textit{Ecological Informatics}, 12, 50-67. \\
Lellouch L, Pavoine S, Jiguet F, Glotin H, Sueur J (2014) -- Monitoring temporal change of bird communities with dissimilarity acoustic indices. \textit{Methods in Ecology and Evolution}, in press.\\
Farina A, Pieretti N, Piccioli L (2011) The soundscape methodology for long-term bird monitoring: a Mediterranean Europe case-study. \emph{Ecological Informatics}, 6, 354-363.
Rodriguez A, Gasc A, Pavoine S, Gaucher P, Grandcolas P, Sueur J (2013) -- Temporal and spatial dynamics of animal sound within a neotropical forest. \textit{Ecological Informatics}, in press.\\
Sueur J, Pavoine S, Hamerlynck O, Duvail S (2008) -- Rapid acoustic survey for biodiversity appraisal. \textit{PLoS ONE}, 3(12): e4065.\\
Sueur J, Farina A, Gasc A, Pieretti N, Pavoine S (2014) -- Acoustic indices for biodiversity assessment and landscape investigation. \textit{Acta Acustica united with Acustica}, in press.

\subsection{Books}
Au WWL, Hastings MC (2008) \textit{Principles of marine bioacoustics}, Springer.\\
Bradbury JW, Vehrencamp SL (1998) \textit{Principles of animal communication}, Sinauer Associates.\\
Fletcher NH (1992) \textit{Acoustic systems in biology}, Oxford University Press.\\
Gerhardt HC, Huber F (2002) \textit{Acoustic communication in insects and anurans}, University of Chicago Press.\\
Hopp, SL, Oweren MJ, Evan CS (1998) \textit{Animal acoustic communication}, Springer.\\
Marler P, Slabbekoorn H (2004) \textit{Nature´s Music. The Science of Birdsong}, Academic Press, Elsevier.\\
Rossing TD (2007) \textit{Handbook of acoustics}, Springer.\\
Rumsey F, McCormick T (2002) \textit{Sound and recording - an introduction}, Elsevier.\\
Sueur J (2018) \textit{Sound analysis and synthesis with R}, Springer. \\
Speaks CE (1999) \textit{Introduction to sound}, Singular.\\

\subsection{Dedicated journals}
\textit{Animal Behaviour} -- \href{publisher website}{http://www.journals.elsevier.com/animal-behaviour/}\\
\textit{Bioacoustics} -- \href{publisher website}{http://www.tandfonline.com/toc/tbio20/current}\\
\textit{Journal of the Acoustical Society of America} -- \href{publisher website}{http://asadl.org/jasa/}\\

\end{document}