\documentclass[a4paper]{article} \usepackage{amsmath}%the AMS math extension of LaTeX. \usepackage{amssymb}%the extended AMS math symbols. \usepackage{bm}%Use 'bm.sty' to get `bold math' symbols \usepackage{natbib} \usepackage{Sweave} \usepackage{float}%Use `float.sty' \usepackage[left=3.5cm,right=3.5cm]{geometry} \usepackage[utf8]{inputenx} %%\VignetteIndexEntry{Examples for 2-AC paper} %%\VignetteDepends{sensR, ordinal} \title{\textsf{R}-companion to:\\ Estimation of the Thurstonian model for the 2-AC protocol} \author{Rune Haubo Bojesen Christensen, \\ Hye-Seong Lee \& \\ Per Bruun Brockhoff} %% \numberwithin{equation}{section} \setlength{\parskip}{2mm}%.8\baselineskip} \setlength{\parindent}{0in} \DefineVerbatimEnvironment{Sinput}{Verbatim}%{} {fontshape=sl, xleftmargin=1em} \DefineVerbatimEnvironment{Soutput}{Verbatim}%{} {xleftmargin=1em} \DefineVerbatimEnvironment{Scode}{Verbatim}%{} {fontshape=sl, xleftmargin=1em} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{0mm}}{\vspace{0mm}} \newcommand{\var}{\textup{var}} \newcommand{\I}{\mathcal{I}} \newcommand{\bta}{\bm \theta} \newcommand{\ta}{\theta} \newcommand{\tah}{\hat \theta} \newcommand{\di}{~\textup{d}} \newcommand{\td}{\textup{d}} \newcommand{\Si}{\Sigma} \newcommand{\si}{\sigma} \newcommand{\bpi}{\bm \pi} \newcommand{\bmeta}{\bm \eta} \newcommand{\tdots}{\hspace{10mm} \texttt{....}} \newcommand{\FL}[1]{\fvset{firstline= #1}} \newcommand{\LL}[1]{\fvset{lastline= #1}} % figurer bagerst i artikel %% \usepackage[tablesfirst, nolists]{endfloat} %% \renewcommand{\efloatseparator}{\vspace{.5cm}} \usepackage{lineno} %% \linenumbers \newcommand*\patchAmsMathEnvironmentForLineno[1]{% \expandafter\let\csname old#1\expandafter\endcsname\csname #1\endcsname \expandafter\let\csname oldend#1\expandafter\endcsname\csname end#1\endcsname \renewenvironment{#1}% {\linenomath\csname old#1\endcsname}% {\csname oldend#1\endcsname\endlinenomath}}% \newcommand*\patchBothAmsMathEnvironmentsForLineno[1]{% \patchAmsMathEnvironmentForLineno{#1}% \patchAmsMathEnvironmentForLineno{#1*}}% \AtBeginDocument{% \patchBothAmsMathEnvironmentsForLineno{equation}% \patchBothAmsMathEnvironmentsForLineno{align}% \patchBothAmsMathEnvironmentsForLineno{flalign}% \patchBothAmsMathEnvironmentsForLineno{alignat}% \patchBothAmsMathEnvironmentsForLineno{gather}% \patchBothAmsMathEnvironmentsForLineno{multline}% } \begin{document} \bibliographystyle{chicago} \maketitle \SweaveOpts{echo=TRUE, results=verb, width=4.5, height=4.5, keep.source=FALSE} \SweaveOpts{prefix.string=figs} \setkeys{Gin}{width=.7\textwidth} \fvset{listparameters={\setlength{\topsep}{0pt}}, gobble=0, fontsize=\small} <>= RUN <- FALSE #redo computations and write .RData files ## Change options: op <- options() ## To be able to reset settings options("digits" = 7) options("htmlhelp" = TRUE) ## options("width" = 75) options("SweaveHooks" = list(fig=function() par(mar=c(4,4,1.5,1)+.5))) options(continue=" ") @ This document describes how the examples in ``Estimation of the Thurstonian model for the 2-AC protocol'' \citep{christensen11a} can be executed in the free statistical software \textsf{R} \citep{R11} using the free \textsf{R} packages \textsf{sensR} and \textsf{ordinal} \citep{christensen10c, christensen10d} developed by the authors. \section{Example 1: Estimation of $d'$ and $\tau$} It is assumed that we have $\bm n = (2, 2, 6)$ observations in the three categories. We may estimate $\tau$, $d'$ and their standard errors with the \texttt{twoAC} function in the \textsf{sensR} package: <>= library(sensR) fit <- twoAC(c(2,2,6)) fit @ Alternatively we may compute $\tau$ and $d'$ manually: <>= n <- c(2, 2, 6) gamma <- cumsum(n / sum(n)) z <- qnorm(gamma)[-3] z <- z * sqrt(2) (tau <- (z[2] - z[1])/2) (d <- -z[1] - tau) @ \section{Example 2: Inference for d-prime} \label{sec:example-x:-inference} The likelihood based confidence intervals and the one-sided discrimination significance test where the null hypothesis is ``no sensory difference'', i.e., $d'_0 = 0$ using the likelihood root statistic are immediately available using the \texttt{twoAC} function: <>= twoAC(c(2, 2, 6), d.prime0 = 0, conf.level = 0.95, statistic = "likelihood", alternative = "greater") @ The relative profile likelihood and Wald approximation can be obtained with: <>= pr <- profile(fit) plot(pr) z <- pr$d.prime$d.prime w <- (coef(fit)[2,1] - z) / coef(fit)[2,2] lines(z, exp(-w^2/2), lty = 2) ## pl <- plot(pr, fig = FALSE) ## abline(h = attr(pr, "limits")) ## text(2.7, .17, "95% limit") ## text(2.7, .06, "99% limit") @ \begin{figure} \centering \includegraphics[width = 0.5\textwidth]{figs-example2-fig} \caption{Relative profile likelihood curve (solid) and Wald approximation (dashed) for the data in example 1 and 2. The horizontal lines indicate 95\% and 99\% confidence intervals based on the likelihood function and the Wald approximation.} \label{fig:profLikeAndWald} \end{figure} \section{Example 3: Power calculations} \label{sec:power-calculations} The function \texttt{twoACpwr} computes the power for the 2-AC protocol and a significance test of the users choice. The power assuming $\tau = 0.5$, $d' = 1$, the sample size is $N = 20$ for a two-sided preference test with $\alpha = 0.5$ is found with: <>= twoACpwr(tau = 0.5, d.prime = 1, size = 20, d.prime0 = 0, alpha = 0.05, alternative = "two.sided", tol = 1e-5) @ Apart from the power, we are told that the actual size of the test, $\alpha$ is close to the nominal 5\%. The reason that the two differ is due to the discreteness of the observations, and hence the test statistic. We are also told that with $N = 20$ there are 231 possible outcomes of the 2-AC protocol. In computing the power 94 of these are discarded and power is computed based on the remaining 137 samples. The fraction of samples that are discarded is determined by the \texttt{tol} parameter. If we set this to zero, then no samples are discarded, however, the increase in precision is irrelevant: <>= twoACpwr(tau = 0.5, d.prime = 1, size = 20, d.prime0 = 0, alpha = 0.05, alternative = "two.sided", tol = 0) @ The last three numbers in the output are the cell probabilities, so with $\tau = 0.5$ and $d' = 1$ we should, for example, expect around 22\% of the answers in the ``no difference'' or ``no preference'' category. \section{Example 4: Estimation and standard errors via cumulative probit models} Estimates of $\tau$ and $d'$ and their standard errors can be obtained from a cumulative probit model. We begin by defining the three leveled response variable \texttt{resp} and fitting a cumulative link model (CLM) using the function \texttt{clm} in package \textsf{ordinal} with weights equal to the observed counts and a probit link. Standard output contains the following coefficient table: <>= library(ordinal) response <- gl(3,1) fit.clm <- clm(response ~ 1, weights = c(2, 2, 6), link = "probit") (tab <- coef(summary(fit.clm))) @ The $\tau$ and $d'$ estimates are obtained by: <>= theta <- tab[,1] (tau <- (theta[2] - theta[1])/sqrt(2)) (d.prime <- (-theta[2] - theta[1])/sqrt(2)) @ The variance-covariance matrix of the parameters can be extracted by the \texttt{vcov} method and the standard errors computed via the provided formulas: <>= ## SE <- sqrt(diag(vcov(fit.clm))) VCOV <- vcov(fit.clm) (se.tau <- sqrt((VCOV[1,1] + VCOV[2,2] - 2*VCOV[2,1])/2)) (se.d.prime <- sqrt((VCOV[1,1] + VCOV[2,2] + 2*VCOV[2,1])/2)) ## ## (SE[2] + SE[1]) * sqrt(2) / 2 ## theta <- fit.clm$Theta ## ## Delta <- theta[2] - theta[1] ## ## theta0 <- sum(theta)/2 ## ## (d <- -sum(fit.clm$Theta * sqrt(2))/2) ## ## theta[2] - theta0 ## ## diff(fit.clm$Theta * sqrt(2))/2 ## ### ## (tau <- (theta[2] - theta[1])/sqrt(2)) ## (d <- (-theta[2] - theta[1])/sqrt(2)) @ Observe how these estimates and standard errors are identical to those in Example 1. We could also have used the \texttt{clm2twoAC} function from package \textsf{sensR} which extract estimates and standard errors from a \texttt{clm} model fit object: <>= clm2twoAC(fit.clm) @ \section{Example 5: A regression model for $d'$} Assume that a study is performed and gender differences in the discrimination ability is of interest. Suppose that $(20, 20, 60)$ is observed for women and $(10, 20, 70)$ is observed for men. The standard output from a cumulative probit model contains the coefficient table: <>= n.women <- c(2, 2, 6)*10 ## n.women <- c(2, 2, 6)*10 n.men <- c(1, 2, 7)*10 ## n.men <- c(2, 2, 6)*10 wt <- c(n.women, n.men) response <- gl(3,1, length = 6) gender <- gl(2, 3, labels = c("women", "men")) fm2 <- clm(response ~ gender, weights = wt, link = "probit") (tab2 <- coef(summary(fm2))) @ The estimate of $\tau$ (assumed constant between genders) and the gender specific estimates of $d'$ can be extracted by: <>= theta <- fm2$alpha (tau <- (theta[2] - theta[1])/sqrt(2)) (d.women <- (-theta[2] - theta[1])/sqrt(2)) (d.men <- d.women + fm2$beta * sqrt(2)) @ Again we could use the \texttt{clm2twoAC} function to get the coefficient table for the 2-AC model from the CLM-model: <>= clm2twoAC(fm2) @ Observe that $d'$ for women is given along with the difference in $d'$ for men and women rather than $d'$ for each of the genders. The Wald test for gender differences is directly available from the coefficient table with $p = 0.0657$. The corresponding likelihood ratio test can be obtained by: <>= fm3 <- update(fm2,~.-gender) anova(fm2, fm3) @ which is slightly closer to significance. The 95\% profile likelihood confidence interval for the difference between men and women on the $d'$-scale is: <>= confint(fm2) * sqrt(2) ## pr <- profile(fm2, alpha = 1e-5) ## plot(pr, n=1e3) @ The likelihood ratio test for the assumption of constant $\tau$ is computed in the following. The likelihood ratio statistic and associated $p$-value are <>= logLik(fm2) tw <- twoAC(n.women) tm <- twoAC(n.men) (LR <- 2*(tw$logLik + tm$logLik - fm2$logLik)) pchisq(LR, 1, lower.tail=FALSE) @ The Pearson $X^2$ test of the same hypothesis is given by <>= freq <- matrix(fitted(fm2), nrow=2, byrow=TRUE) * 100 Obs <- matrix(wt, nrow=2, byrow=TRUE) (X2 <- sum((Obs - freq)^2 / freq)) pchisq(X2, df=1, lower.tail=FALSE) @ so the Pearson and likelihood ratio tests are very closely equivalent as it is so often the case. \section{Regression model for replicated 2-AC data} \label{sec:regr-model-repl} <>= ## load("C:/Users/rhbc/Documents/Rpackages/sensR/pkg/inst/doc/HSdata.RData") load("HSdata.RData") repData <- hs.long @ The data used in example 6 are not directly available at the time of writing. Assume however, that data are available in the \textsf{R} \texttt{data.frame} \texttt{repData}. Then the cumulative probit mixed model where preference depends on reference and consumers (\texttt{cons}) are random can be fitted with: <>= fm3.agq <- clmm2(preference ~ reference, random = cons, nAGQ = 10, data = repData, link = "probit", Hess = TRUE) summary(fm3.agq) @ Here we asked for the accurate 10-point adaptive Gauss-Hermite quadrature approximation. The 2-AC estimates are available using the \texttt{clm2twoAC} function: <>= clm2twoAC(fm3.agq) @ The standard deviation of the consumer-specific $d'$s is given by: <>= fm3.agq$stDev * sqrt(2) @ The profile likelihood curve can be obtained using the profile method: <>= pr <- profile(fm3.agq, range = c(.7, 1.8)) ## save(pr, file = "profileSigma.RData") @ <>= ## load("C:/Users/rhbc/Documents/Rpackages/sensR/pkg/inst/doc/profileSigma.RData") load("profileSigma.RData") @ And then plottet using: <>= plpr <- plot(pr, fig = FALSE) plot(sqrt(2) * plpr$stDev$x, plpr$stDev$y, type = "l", xlab = expression(sigma[delta]), ylab = "Relative profile likelihood", xlim = c(1, 2.5), axes = FALSE) axis(1); axis(2, las = 1) abline(h = attr(plpr, "limits")) text(2.4, .17, "95% limit") text(2.4, .06, "99% limit") @ The resulting figure is shown in Figure~\ref{fig:ProfileSigma}. The profile likelihood confidence intervals are obtained using: <>= confint(pr, level = 0.95) * sqrt(2) @ \begin{figure} \centering \includegraphics[width = 0.5\textwidth]{figs-example6-4-fig} \caption{Profile likelihood for $\sigma_{\delta}$ Horizontal lines indicate 95\% and 99\% confidence limits.} \label{fig:ProfileSigma} \end{figure} The probabilities that consumers prefer yoghurt A, have no preference or prefer yoghurt B can, for an average consumer be obtained by predicting from the CLMM: <>= newdat <- expand.grid(preference = factor(c("A", "N", "B"), levels = c("A", "N", "B"), ordered = TRUE), reference = factor(c("A", "B"))) pred <- predict(fm3.agq, newdata = newdat) @ The predictions for the extreme consumers have to be obtained by hand. Here we ask for the predictions for the 5th percentile and 95th percentile of the consumer population for reference A and B: <>= q95.refA <- diff(c(0, pnorm(fm3.agq$Theta - qnorm(1-.05) * fm3.agq$stDev), 1)) q05.refA <- diff(c(0, pnorm(fm3.agq$Theta - qnorm(.05) * fm3.agq$stDev), 1)) q95.refB <- diff(c(0, pnorm(fm3.agq$Theta - fm3.agq$beta - qnorm(1-.05) * fm3.agq$stDev), 1)) q05.refB <- diff(c(0, pnorm(fm3.agq$Theta - fm3.agq$beta - qnorm(.05) * fm3.agq$stDev), 1)) @ Plotting follows the standard methods: <>= par(mar = c(0, 2, 0, .5) + 0.5) plot(1:3, pred[1:3], ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "", pch = 19) ## ref A axis(1, lwd.ticks = 0, at = c(1, 3), labels = c("", "")) axis(2, las = 1) points(1:3, pred[4:6], pch = 1) ## ref B lines(1:3, pred[1:3]) ## ref A lines(1:3, pred[4:6], lty = 2) ## ref B text(2, 0.6, "Average consumer") legend("topright", c("Reference A", "Reference B"), lty = 1:2, pch = c(19, 1), bty = "n") @ <>= par(mar = c(0, 2, 0, .5) + 0.5) plot(1:3, q05.refA, ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "", pch = 19) ## ref A ## axis(1, at = 1:3, labels = c("prefer A", "no preference", "prefer B")) axis(1, lwd.ticks = 0, at = c(1, 3), labels = c("", "")) axis(2, las = 1) points(1:3, q05.refB, pch = 1) ## ref B lines(1:3, q05.refA) ## ref A lines(1:3, q05.refB, lty = 2) ## ref B text(2, 0.6, "5th percentile consumer") @ <>= par(mar = c(2, 2, 0, .5) + 0.5) plot(1:3, q95.refA, ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "", pch = 19) ## ref A axis(1, at = 1:3, labels = c("prefer A", "no preference", "prefer B")) axis(2, las = 1) points(1:3, q95.refB, pch = 1) ## ref B lines(1:3, q95.refA) ## ref A lines(1:3, q95.refB, lty = 2) ## ref B text(2, 0.6, "95th percentile consumer") @ The resulting figure is shown in Fig.~\ref{fig:modelFit}. \begin{figure} \centering \includegraphics[width = 0.5\textwidth]{figs-example6-8} \includegraphics[width = 0.5\textwidth]{figs-example6-9} \includegraphics[width = 0.5\textwidth]{figs-example6-10} \caption{The probabilities of prefering yoghurt A, having no preference, or prefering yoghurt B for an average consumer ($b = 0$) and for fairly extreme consumers ($b = \pm 1.64\sigma_b$).} \label{fig:modelFit} \end{figure} \section{End note} Versions details for \textsf{R} and the packages \textsf{sensR} and \textsf{ordinal} appear below: <>= sessionInfo() @ <>= options(op) @ \bibliography{twoAC} %% \newpage %% \appendix %% \input{Appendix.tex} \end{document}