%\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{Using tlm to fit, visualize and interpret regression models with transformed variables} %\VignetteKeywords{regression models, transformations, logarithm} %\VignettePackage{collin} \documentclass[11pt]{article} \usepackage{amssymb} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amsthm} %\usepackage[a4paper]{geometry} \usepackage[left=2.5cm,top=3cm,bottom=3cm,right=2.5cm]{geometry} %\usepackage[pdftex]{color,graphicx,epsfig} %\usepackage[applemac]{inputenc} \usepackage[utf8]{inputenc} \usepackage[english]{babel} \usepackage{url} \usepackage{xcolor} \usepackage[colorlinks=true,linkcolor=black!20!blue,citecolor=black!25!green,urlcolor=black!15!blue]{hyperref} \usepackage{url} %\usepackage{verbatim} %\usepackage[colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=black]{hyperref} %\usepackage[square, comma, sort&compress, numbers]{natbib} \usepackage[super,square,comma,sort&compress,numbers]{natbib} %\usepackage{natbib} \bibliographystyle{unsrtnat} %\setcitestyle{authoryear,open={(},close={)}} %\usepackage[nottoc,numbib]{tocbibind} % For bibliography in the table of contents \usepackage[nottoc]{tocbibind} % For bibliography in the table of contents but no "Chapter" label inside \usepackage{authblk} % authors and affiliations %%% Begin captions format: %\usepackage[font=sf, labelfont=bf, labelsep=period, position=top]{caption} %\usepackage[footnotesize, labelfont=bf, labelsep=period, position=top]{caption} \usepackage[hang,footnotesize,bf]{caption} \setlength{\captionmargin}{70pt} % margins %%% End captions format: \newcommand{\Robject}[1]{\texttt{#1}} %\newcommand{\Rfunction}[1]{{\small\texttt{#1}}} \newcommand{\Rpackage}[1]{\textsf{#1}} \newcommand{\Rclass}[1]{\textit{#1}} \newcommand{\R}{\textsf{R}} \newcommand{\tlm}{\textsf{tlm}} %\DeclareGraphicsRule{.pdftex}{pdf}{.pdftex}{} \title{Using the \R\ package \tlm\ to fit, visualize and interpret linear, logistic and Poisson regression models with transformed variables\\ \vspace{4mm} \small{(\tlm\ version \Sexpr{utils::packageDescription("tlm")$Version})}} \author[1,2,3]{Jose Barrera-G\'omez\thanks{jose.barrera@isglobal.org}} \author[1,2,3]{Xavier Basaga\~na\thanks{xavier.basagana@isglobal.org}} \affil[1]{ISGlobal} \affil[2]{Universitat Pompeu Fabra (UPF)} \affil[3]{CIBER Epidemiolog\'ia y Salud P\'ublica (CIBERESP)} \renewcommand\Authands{ and } \renewcommand\Affilfont{\itshape\small} \begin{document} <>= library(knitr) library(xtable) #options(scipen = 999) # to avoid scientic notation in knitr outputs opts_chunk$set(size = 'small', fig.align = 'center') @ \maketitle \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 1. Introduction %%% %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} This document is a user's guide for the \R\footnote{\R\ is a free and open source software and it is available at \url{http://cran.r-project.org/}.} package \tlm, which provides the effect of an explanatory variable of interest, $X$, on a response variable, $Y$, under a linear, logistic or Poisson regression model with transformations in $X$ and/or $Y$. In the case of the linear regression model, log and power transformations in any of $X$ and $Y$ are allowed. In the case of logistic and Poisson regression models, log or power transformation in $X$ are allowed. Other explanatory variables can be in the model, in which case adjusted measures for $Y$ and adjusted effects of $X$ on $Y$ are automatically computed. The package also works if there are no transformations. The package provides both numerical and graphical outputs as well as information on interpreting results. The methodology is described in the original work by \citet{Barrera2015}, whose illustrative examples, among others, are reproduced here. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 2. Getting started %%% %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \section{Getting started} Start an \R\ session and load the package by typing <>= library(tlm) @ You can get a brief overview of the package by typing <>= help(package = "tlm") @ This user's guide can be recovered by typing <>= browseVignettes("tlm") @ %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% 2.1. Fitting the model %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Fitting the model} \label{sec:tlm} The first step is to fit the model in the transformed space (i.e., considering that $Y$ and $X$ are already transformed, if any transformation is assumed), which is performed by the function \Robject{tlm}. We can get information on this function using the help: <>= ?tlm @ \noindent The main arguments of the function \Robject{tlm} are: \begin{itemize} \item \Robject{formula}: the model formula with the usual syntax as in \Robject{lm} and \Robject{glm}. Left-hand-side indicates the response variable (whose values are assumed to be already transformed). First term in right-hand-side indicates the explanatory variable of interest (whose values are assumed to be already transformed). Right-hand-side can include additional terms (e.g. adjusting variables) but the explanatory variable of interest cannot be involved in any of them. \item \Robject{family}: the model family. It can be \Robject{gaussian}, for linear regression (default); \Robject{binomial}, for logistic regression with logit link, o \Robject{poisson}, for Poisson regression. \item \Robject{data}: a \Robject{data.frame} containing the variables in the model. \item \Robject{ypow} and \Robject {xpow}: the power transformations already done in the response and in the explanatory variable of interest, respectively. The value 1 (default) means no transformation; the value 0 means log transformation. \item \Robject{\dots}: additional arguments for the underlying \Robject{lm} or \Robject{glm} fitting. \end{itemize} As a result of a call to the function \Robject{tlm}, we obtain an object of class \Robject{"tlm"} which can be passed to the following functions or methods: \begin{itemize} \item \Robject{summary} to summarize the fitted model (section \ref{sec:summary}) \item \Robject{plot} to visualize the fitted model (section \ref{sec:plot}) \item \Robject{MY} to obtain adjusted measures (section \ref{sec:MY}) \item \Robject{effect} to obtain adjusted effects (section \ref{sec:effects}) \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% 2.2. Summary of the fitted model %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Summary of the fitted model} \label{sec:summary} The method \Robject{summary} provides information on the fitted model. It essentially provides the summary of the underlying \Robject{lm} or \Robject{glm} fitted in the transformed space. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% 2.3. Visualization of the fitted model %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Visualization of the fitted model} \label{sec:plot} The specific method \Robject{plot} provides, for an object of class \Robject{"tlm"} (output of the \Robject{tlm} function), a graphical representation of the fitted model. There are three types of visualization, depending on the value of the argument \Robject{type} passed to \Robject{plot}: \begin{itemize} \item \Robject{type = "original"} (default): the fitted model is plotted in the original space of the variables. If a visualization of the fitted model is reported, this is the proper option. \item \Robject{type = "transformed"}: the fitted model is plotted in the transformed space of the variables (where the model has been fitted). Plots obtained for this option are intended to visually explore the model goodness of fit and should not be reported because values of the transformed variables are meaningless (e.g. the logarithm of cotinine levels (ng/ml) has no sense). \item \Robject{type = "diagnosis"}: a fitted model diagnostics plot is shown as with \Robject{plot.lm} or \Robject{plot.glm}. \end{itemize} Other arguments of the specific method \Robject{plot} are: \begin{itemize} \item \Robject{observed}: logical indicating whether the observations are shown in the plot. Default is \Robject{FALSE}. \item \Robject{xname}, \Robject{yname}: character indicating the name of the explanatory and the response variable of interest for labeling the plot axes. Default are \Robject{"x"} and \Robject{"y"}, respectively. \item \Robject{level}: numeric indicating the confidence level for the confidence of the expectation of the response variable according to the fitted model. Default is 0.95. \end{itemize} The specific method \Robject{plot} automatically labels the $Y$ axis with the appropriate name of the measure (i.e. mean, geometric mean, etc). %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% 2.4. Adjusted measures %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Adjusted measures} \label{sec:MY} Once the model has been fitted with the function \Robject{tlm}, the resulting object can be passed to the function \Robject{MY}, which provides measures of the response variable for given values of the explanatory variable. If the model contains other explanatory variables, then adjusted measures are automatically computed. These adjusted measures are obtained by setting the remaining explanatory variables in the model at their means. We can get information on this function using the help: <>= ?MY @ \noindent The main arguments of the function \Robject{MY} are: \begin{itemize} \item \Robject{object}: an object of class \Robject{"tlm"}, the result of a call to the function \Robject{tlm}. \item \Robject{x}: the value(s) of the explanatory variable of interest for which the expected measure of the response variable should be computed. \item \Robject{npoints}: if \Robject{x} is not provided, the number of points where the measure should be computed. Default is 10. \item \Robject{space}: the space of the variables in which measures should be computed. It can be \Robject{"original"} (default) or \Robject{"transformed"}. \item \Robject{level}: the confidence level for measures. Default is 0.95. \end{itemize} The function \Robject{MY} automatically provides the unit of the measure (mean, geometric mean, median, probability or logodds, depending on the case). %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% 2.5. Adjusted effects %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Adjusted effects} \label{sec:effects} The fitted model can be passed to the function \Robject{effect} in order to obtain the effect of $X$ on $Y$ in the original space of the variables, adjusted for other potential adjusting variables passed in \Robject{formula} to the \Robject{tlm} function. We can get information on this function using the help: <>= ?effect @ \noindent The main arguments of the function \Robject{effect} are: \begin{itemize} \item \Robject{object}: an object of class \Robject{"tlm"}, the result of a call to the function \Robject{tlm}. \item \Robject{x1}: the value(s) of the explanatory variable where the effect should be computed. \item \Robject{x2}: the alternative value(s) of the explanatory variable (changing from \Robject{x1}) for which the effect should be computed. \item \Robject{c}: the additive change in the explanatory variable. \item \Robject{q}: the multiplicative change in the explanatory variable. \item \Robject{r}: the percent change in the explanatory variable. \item \Robject{npoints}: the number of points where the effect should be computed. \item \Robject{level}: the confidence level for the effect. \end{itemize} In order to compute effects, the change in the explanatory variable should be characterized. It can be done in several ways. For instance, providing: (1) \Robject{x1} and \Robject{x2}; (2) \Robject{x1} and one of \Robject{c}, \Robject{q} or \Robject{r}; (3) \Robject{x1}, \Robject{npoints} and one of \Robject{c}, \Robject{q} or \Robject{r}. Only one of the arguments \Robject{c}, \Robject{q} or \Robject{r} is used, prevailing \Robject{c} and then \Robject{q}. If no enough arguments are passed, the interquartile range will be considered and a summary effect should be computed, if it exists. If the explanatory variable is categorical, then the effect is computing for a change between the reference level and each of the remaining levels of the explanatory variable. Confidence intervals are computed by transforming the endpoints of the intervals in the transformed scale when it is possible, while non-parametric bootstrap is used otherwise. The function \Robject{effect} automatically provides the unit of the effect measure (mean, geometric mean, median or odds ratio, depending on the case).\\ In addition, the function \Robject{effectInfo} provides further information on how to interpret the effect (use \Robject{?effectInfo} for further details). %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3. Illustrative examples %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \section{Illustrative examples} %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.1. Linear regression model %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Linear regression model} %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.1.1. Log transformation in the response %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{Log transformation in the response} \label{sec:imt} Consider the evaluation of the association between the intima media thickness of the carotid artery (IMT), measured in mm, $Y$, and age, in years, $X$. Variable $Y$ was log transformed to achieve normality. The \Robject{imt} data were simulated to emulate true data pattern observed in a real study \cite{Rivera2013}.\\ First, we can load data (further information about data is available with the help function, \Robject{?imt}) and see its first rows and a summary: <<>>= data(imt) dim(imt) head(imt) summary(imt) @ Suppose that we are interested in the effect of age on IMT, under a linear regression model with log transformation in the response variable, IMT. The model can be fitted as follows: <<>>= modimt <- tlm(logimt ~ age, data = imt, ypow = 0) @ \noindent where \Robject{ypow = 0} indicates that the response variable is already log transformed. Note that we have not set the family since default family is Gaussian. The fitted model results in: <<>>= modimt @ Further information on the fitted model is available using the method \Robject{summary}: <<>>= summary(modimt) @ A numerical representation of the relationship between age and IMT is available using the function \Robject{MY}. As default, the measure of the response variable is computed in 10 points along the range of the explanatory variable: <<>>= MY(modimt) @ The number of points can be set using \Robject{npoints}: <<>>= MY(modimt, npoints = 3) @ We can also choose a specific set of values of the explanatory variable for which the measure of the response variable should be computed. For instance, the first and third quartile: <<>>= q13 <- quantile(imt$age, probs = c(1, 3)/4) MY(modimt, x = q13) @ Measures can also be computed in the transformed space: <<>>= MY(modimt, x = q13, space = "transformed") @ A graphical representation of the relationship between age and IMT is available using the method \Robject{plot}. For instance, the two following instructions provide left and right plots in Figure \ref{fig:imt}, respectively: <>= plot(modimt, type = "transformed", observed = TRUE, xname = "Age (years)", yname = "IMT") plot(modimt, observed = TRUE, xname = "Age (years)", yname = "IMT (mm)") @ \begin{figure}[h!] \begin{center} <>= par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 3, 3), mgp = c(2.8, 0.6, 0)) <> @ \caption{Visualization of the fitted model \Robject{modimt}. Left: In the transformed space, the mean of the logarithm of IMT values as a function of age is shown. Note that this plot should not be reported since the logarithm of IMT values are meaningless. This type of plot is intended just to visually explore the model goodness of fit. Right: In the original space, the geometric mean (or equivalently the median) of IMT values as a function of age is shown. This type of plot is appropriate for reporting. Dashed lines represent 95\% confidence intervals for the measure.} \label{fig:imt} \end{center} \end{figure} The argument \Robject{observed} controls whether observations are shown in the plot (default is \Robject{FALSE}). Further information on the usage of the method \Robject{plot} is available using \Robject{?tlm}.\\ Diagnostics plot as in Figure \ref{fig:imtdiag} can be obtained with the following instruction: <>= plot(modimt, type = "diagnosis") @ The function \Robject{effectInfo} provides information on interpreting the relationship between age ($X$) and IMT ($Y$): <<>>= effectInfo(modimt) @ \begin{figure}[h!] \begin{center} <>= #par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 3, 3), mgp = c(2.8, 0.6, 0)) par(las = 1, mar = c(5, 5, 3, 3), mgp = c(2.8, 0.6, 0)) <> @ \caption{Diagnostics plot for the model \Robject{modimt}.} \label{fig:imtdiag} \end{center} \end{figure} \noindent Thus, we can see that, in this case, if we use additive changes in $X$ and percent (or multiplicative) changes in the geometric mean of $Y$, a summary effect can be obtained, which is independent of the value of $X$ for which the effect is computed. The function \Robject{effect} provides as default the expected change in IMT for an additive change in age equal to the interquartile range: <<>>= effect(modimt) @ Other measures of effects can be obtained. For instance, we may be interested in a difference of (geometric) means when changing age across the first quartile, the median and the third quartile: <<>>= q123 <- quantile(imt$age, probs = 1:3/4) # quartiles effect(modimt, x1 = q123) @ %\begin{Schunk} %\begin{Sinput} %> q123 <- quantile(imt$age, probs = 1:3/4) # quartiles %> effect(modimt, x1 = q123) %\end{Sinput} %\begin{Soutput} %Computing effects... % % %Adjusted change in the geometric mean of the response variable when the %explanatory variable changes from x1 to x2 (confidence interval for the %difference change based on 999 bootstrap samples): % % x1 x2 EstimateDiff lower95% upper95% EstimatePercent lower95% upper95% %25% 51 59 0.05366274 0.05039149 0.05674395 7.933669 7.411302 8.458576 %50% 59 66 0.05043602 0.04725599 0.05377603 6.908521 6.455654 7.363315 % %For further information on interpreting the effect use effectInfo(). %\end{Soutput} %\end{Schunk} As in this example, when a summary effect is not computed, then both difference and percent changes in the response are computed. The number of bootstrap samples is controlled by the argument \Robject{nboot} whose default value is 999\cite{Davison1997}. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.1.2. Log transformation in the explanatory variable %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{Log transformation in the explanatory variable} \label{sec:cot} Consider now the evaluation of the association between the birth weight, in grams, $Y$, and the cord serum cotinine level in the mother, in ng/ml, $X$. Variable $X$ was log transformed to achieve a more homogeneous distribution. The \Robject{cotinine} data were simulated to emulate true data patterns observed in a real study\cite{Pichini2000}. Data also contains a binary variable indicating whether the birth weight was low (defined as lower than 2500 g).\\ We can load and explore data (further information about data is available with the help function, \Robject{?cotinine}): <<>>= data(cotinine) dim(cotinine) head(cotinine) summary(cotinine) @ Suppose that we are interested in the effect of cotinine level on birth weight, under a linear regression model with log transformation in the explanatory variable, cotinine. The model can be fitted as follows: <<>>= modcot <- tlm(weight ~ logcotinine, data = cotinine, xpow = 0) @ \noindent where \Robject{xpow = 0} indicates that the explanatory variable is already log transformed. The fitted model provides the following results: <<>>= summary(modcot) @ In Figure \ref{fig:cot} (obtained with the following instructions) we can see the fitted model (left) and the relationship between cotinine level and birth weight, under the model, in the original space of the variables (right). <>= plot(modcot, type = "transformed", observed = TRUE, xname = "Cotinine", yname = "weight (kg)") plot(modcot, xname = "Cotinine (ng/ml)", yname = "weight (kg)") @ \begin{figure}[h!] \begin{center} <>= par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 4, 3), mgp = c(2.8, 0.6, 0)) <> @ \caption{Visualization of the fitted model \Robject{modcot}. Left: In the transformed space, mean weight (kg) as a function of the logarithm of cotinine levels is shown. Note that this plot should not be reported since the logarithm of cotinine levels are meaningless. This type of plot is intended just to visually explore the model goodness of fit. Right: In the original space, the mean weight (kg) as a function of cotinine levels is shown. This type of plot is appropriate for reporting. Dashed lines represent 95\% confidence intervals for the measure.} \label{fig:cot} \end{center} \end{figure} The function \Robject{effectInfo} provides information on interpreting the relationship between cotinine levels ($X$) and weight ($Y$): <<>>= effectInfo(modcot) @ In this case, if we use multiplicative (or percent) changes in $X$ and additive changes in the mean of $Y$, a summary effect can be obtained, which is independent of the value of $X$ for which the effect is computed. The function \Robject{effect} provides as default the expected change in weight for a percent change in cotinine levels equal to the interquartile ratio: <<>>= effect(modcot) @ Alternatively, by exploring Figure \ref{fig:cot}, we can see that several 10-fold changes occur in the population and choose the more common number $q = 10$, in which case the effect is <<>>= effect(modcot, q = 10) @ If we are interested in explore the effect of an additive change in cotinine levels, we can obtain, for example, effects for additive changes in $X$ along its range. For instance: <<>>= range(cotinine$cotinine) effect(modcot, x1 = 100, c = 200, npoints = 4) @ %\begin{Schunk} %\begin{Sinput} %> range(cotinine$cotinine) %\end{Sinput} %\begin{Soutput} %[1] 0.2 910.0 %\end{Soutput} %\begin{Sinput} %> effect(modcot, x1 = 100, c = 200, npoints = 4) %\end{Sinput} %\begin{Soutput} %Computing effects... % % %Adjusted change in the mean of the response variable when the explanatory %variable changes from x1 to x2 (confidence interval for the percent change %based on 999 bootstrap samples): % % x1 x2 EstimateDiff lower95% upper95% EstimatePercent lower95% upper95% %1 100 300 -87.89017 -120.19284 -55.58750 -2.8929637 -4.0227274 -1.7692666 %2 300 500 -40.86660 -55.88649 -25.84671 -1.3852256 -1.9861551 -0.8069561 %3 500 700 -26.91814 -36.81149 -17.02479 -0.9252415 -1.2979132 -0.5732388 %4 700 900 -20.10543 -27.49486 -12.71599 -0.6975258 -0.9934569 -0.4388765 % %For further information on interpreting the effect use effectInfo(). %\end{Soutput} %\end{Schunk} %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.1.3. Log transformation in both the response and the explanatory variable %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{Log transformation in both the response and the explanatory variable} \label{sec:cat} Consider an epidemiological study to assess the association between cat allergen levels ({\it Fel d 1}) in the bed mattress, $X$, and in the living room, $Y$, in homes of study participants, taking into account cat ownership, $C$. Both variables $X$ and $Y$ were log transformed to achieve linearity in their relationship. The \Robject{feld1} data were simulated to emulate true data patterns observed in a real study\cite{Basagana2002}.\\ We can load and explore data (further information about data is available with the help function, \Robject{?feld1}): <<>>= data(feld1) dim(feld1) head(feld1) summary(feld1) @ Suppose that we are interested in the association between allergen levels in the mattress and in the living room, under a linear regression model with log transformation in both variables, adjusting for cat ownership. The model can be fitted as follows: <<>>= modcat <- tlm(logroom ~ logmattress + cat, data = feld1, ypow = 0, xpow = 0) @ \noindent where \Robject{ypow = 0} and \Robject{xpow = 0} indicate that both the explanatory and the response variables are already log transformed. The fitted model provides the following results: <<>>= summary(modcat) @ We can create Figure \ref{fig:cat} with <>= plot(modcat, type = "transformed", observed = TRUE, xname = "Mattress levels", yname = "living room levels") plot(modcat, xname = "Mattress levels", yname = "living room levels") @ \noindent which provides a graphical representation of the relationship between allergen levels in the mattress and in the living room. Measures in Figure \ref{fig:cat} have been obtained after averaging the expected measure of $Y$ over all subjects in the data set. This is equivalent to saying that measures and effects are calculated for an average individual in the population and they can be interpreted as adjusted measures and adjusted effects\cite{Searle1980} . \begin{figure}[h!] \begin{center} <>= par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 4, 3) + 0.1, mgp = c(2.8, 0.6, 0)) <> @ \caption{Visualization of the fitted model \Robject{modcat}. Left: In the transformed space, the logarithm of mean {\it Fel d 1} levels in room as a function of the logarithm of {\it Fel d 1} levels in mattress is shown. Note that this plot should not be reported since the logarithm of {\it Fel d 1} levels are meaningless. This type of plot is intended just to visually explore the model goodness of fit. Right: In the original space, the geometric mean (or equivalently median) of {\it Fel d 1} levels in room as a function of {\it Fel d 1} levels in mattress is shown. This type of plot is appropriate for reporting. Dashed lines represent 95\% confidence intervals for the measure.} \label{fig:cat} \end{center} \end{figure} The function \Robject{effectInfo} provides information on interpreting the relationship between allergen levels in mattress ($X$) and in living room ($Y$): <<>>= effectInfo(modcat) @ In this case, if we use multiplicative (or percent) changes in both variables, a summary effect can be obtained, which is independent of the value of the explanatory variable for which the effect is computed.\\ The function \Robject{effect} provides as default the expected change in allergen levels in the living room for a percent change in allergen levels in mattress equal to the interquartile ratio: <<>>= effect(modcat) @ If we are interested in the effect of cat ownership on allergen levels in the living room, with log transformation in this, we should run a new model with cat ownership as the explanatory variable: <<>>= modcat2 <- tlm(logroom ~ cat, data = feld1, ypow = 0) modcat2 @ Then, we can computed measures: <<>>= MY(modcat2) @ \noindent and the effect: <<>>= effect(modcat2) @ A graphical representation of the relationship can be obtained (see Figure \ref{fig:cat2}): <>= plot(modcat2, yname = "room levels") @ \begin{figure}[h!] \begin{center} <>= m <- matrix(0, nrow = 2, ncol = 4) m[, 2:3] <- 1 layout(m) par(las = 1) <> @ \caption{Geometric mean (and 95\% confidence intervals) of allergen levels in the living room as a function of cat ownership.} \label{fig:cat2} \end{center} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.1.4. Power transformations %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{Power transformations} \label{sec:glucose} Consider now the modeling of the association between triglycerides, $X$, and glucose, $Y$, levels in blood, both measured in mg/dl. Variables $X$ and $Y$ were transformed under power functions $g(X) = 1 / \sqrt{X} = X^{-1/2}$ and $f(Y) = 1/Y^2 = Y^{-2}$, respectively, to achieve linearity. The \Robject{glucose} data were simulated to emulate true data pattern observed in a real study\cite{Rivera2013}.\\ First, we can load and explore data (further information about data is available with the help function, \Robject{?glucose}): <<>>= data(glucose) dim(glucose) head(glucose) summary(glucose) @ Then, we can fit and explore the model of interest: <<>>= modglucose <- tlm(inv2glu ~ inv12tri, data = glucose, ypow = -2, xpow = -1/2) summary(modglucose) @ The function \Robject{MY} provides a numerical representation of the relationship between triglycerides ($X$) and glucose ($Y$), under the fitted model, in the original scale of the variables: <<>>= MY(modglucose) @ This relationship can also be represented graphically in Figure \ref{fig:gluco}: <>= plot(modglucose, type = "transformed", observed = TRUE, xname = "Triglycerides (mg/dl)", yname = "glucose (mg/dl)") plot(modglucose, xname = "Triglycerides (mg/dl)", yname = "glucose (mg/dl)") @ \begin{figure}[h!] \begin{center} <>= par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 4, 3) + 0.1, mgp = c(3.6, 0.6, 0)) <> @ \caption{Visualization of the fitted model \Robject{modglucose}. Left: In the transformed space, the reciprocal of the squared mean of glucose levels as a function of the reciprocal of the square root of triglycerides levels is shown. Note that this plot should not be reported since the reciprocal of the squared mean of glucose levels and the square root of triglycerides levels are meaningless. This type of plot is intended just to visually explore the model goodness of fit. Right: In the original space, the geometric mean (or equivalently median) of glucose levels as a function of triglycerides levels is shown. This type of plot is appropriate for reporting. Dashed lines represent 95\% confidence intervals for the measure.} \label{fig:gluco} \end{center} \end{figure} The function \Robject{effectInfo} indicates that, under the fitted model, there is no summary effect: <<>>= effectInfo(modglucose) @ <>== # 2.5 and 97.5 percentiles of trigli (for text): summtrigli <- quantile(glucose$trigly, probs = c(2.5, 25, 50, 75, 97.5) / 100) trigli2.5 <- round(as.numeric(summtrigli[1]), 1) trigli97.5 <- round(as.numeric(summtrigli[5]), 1) @ Indeed, for general transformations it is not possible to find a summary measure that works for all values of $X$ and its change. In such cases, we can create tables with the four possible combinations that result when considering both additive and multiplicative changes in both $X$ and $Y$. These tables should report the effects for several basal values of $X$ along the observed range\cite{Elswick1997}. For instance, the 2.5th and 97.5th percentiles of triglycerides level were \Sexpr{round(trigli2.5)} mg/dl and \Sexpr{round(trigli97.5)} mg/dl, respectively. Thus, we can report the effects between pairs of consecutive values of $X = 50, 100, 150, 200$ and 250 mg/dl, that is to say, for an additive increase of $c = 50$ mg/dl, and also between pairs of consecutive values: 50, 75, 112.5, 168.8 and 253.1 mg/dl, that is, for an $r = 50\%$ increase: <<>>= # Effects for an additive change in triglycerides level: xc <- 50 * (1:5) xc effectXdiff <- effect(modglucose, x1 = xc) effectXdiff @ %\begin{Schunk} %\begin{Sinput} %> # Effects for an additive change in triglycerides level: %> xc <- 50 * (1:5) %> xc %\end{Sinput} %\begin{Soutput} %[1] 50 100 150 200 250 %\end{Soutput} %\begin{Sinput} %> effectXdiff <- effect(modglucose, x1 = xc) %\end{Sinput} %\begin{Soutput} %Computing effects... %\end{Soutput} %\begin{Sinput} %> effectXdiff %\end{Sinput} %\begin{Soutput} %Adjusted change in the median of the response variable when the explanatory %variable changes from x1 to x2 (confidence intervals based on 999 bootstrap %samples): % % x1 x2 EstimateDiff lower95% upper95% EstimatePercent lower95% upper95% %1 50 100 8.703222 6.334019 10.907257 10.114803 7.250154 12.960938 %2 100 150 4.802290 3.313412 6.464551 5.068507 3.506924 6.792880 %3 150 200 3.235200 2.178028 4.439836 3.249826 2.215835 4.370665 %4 200 250 2.397980 1.602089 3.372809 2.333003 1.597478 3.172078 %\end{Soutput} %\end{Schunk} <<>>= # Effects for an percent change in triglycerides level: xq <- 50 * 1.5^(0:4) xq effectXperc <- effect(modglucose, x1 = xq) effectXperc @ %\begin{Schunk} %\begin{Sinput} %> # Effects for an percent change in triglycerides level: %> xq <- 50 * 1.5^(0:4) %> xq %\end{Sinput} %\begin{Soutput} %[1] 50.000 75.000 112.500 168.750 253.125 %\end{Soutput} %\begin{Sinput} %> effectXperc <- effect(modglucose, x1 = xq) %\end{Sinput} %\begin{Soutput} %Computing effects... %\end{Soutput} %\begin{Sinput} %> effectXperc %\end{Sinput} %\begin{Soutput} %Adjusted change in the median of the response variable when the explanatory %variable changes from x1 to x2 (confidence intervals based on 999 bootstrap %samples): % % x1 x2 EstimateDiff lower95% upper95% EstimatePercent lower95% upper95% %1 50.00 75.000 5.152950 3.813373 6.363338 5.988710 4.361787 7.553217 %2 75.00 112.500 4.971764 3.542537 6.461374 5.451653 3.853272 7.140190 %3 112.50 168.750 4.724024 3.260444 6.322945 4.912204 3.396935 6.507858 %4 168.75 253.125 4.420430 2.976703 6.163127 4.381299 3.006967 5.941818 %\end{Soutput} %\end{Schunk} Exploring previous results, one can see that the more easily interpretable effect appears to be the additive change in the median of glucose level associated to a percent change in triglycerides level. Indeed, we can see that, for any given value of the triglycerides level along the observed range, a 50\% increase in triglycerides level is associated to around a \Sexpr{round(mean(effectXperc$effect[, 3]), 1)} mg/dl increase in the median glucose level. For the other measures, the effect is more dependent on the basal value of the triglycerides level. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% %%% %%% %%% 3.2. Logistic regression model with log transformation in the explanatory variable %%% %%% %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% \subsection{Logistic regression model with log transformation in the explanatory variable} Revisiting the cotinine example (Section \ref{sec:cot}), supose we are now interested in the association between low birth weight (defined as weight lower than 2500 g), $Y$, and cotinine level, $X$, after log transforming $X$. The model can be fitted as follows: <<>>= modcot2 <- tlm(underweight ~ logcotinine, family = binomial, data = cotinine, xpow = 0) @ \noindent where \Robject{xpow = 0} indicates that the explanatory variable is already log transformed and the argument \Robject{family = binomial} indicates that the regression model is logistic with logit link (default is \Robject{family = gaussian}, for the lineal regression model). The fitted model provides the following results: <<>>= summary(modcot2) @ Then, we can obtain the probability of low birth weight as a function of cotinine level: <<>>= MY(modcot2) @ A graphical representation of the relationship between cotinine level and low birth weight can be obtained by: <>= plot(modcot2, type = "transformed", xname = "Cotinine (ng/ml) levels", yname = "low birth weight") plot(modcot2, xname = "Cotinine (ng/ml) levels", yname = "low birth weight") @ which provides Figure \ref{fig:cot2}. \begin{figure}[h!] \begin{center} <>= par(las = 1, mfrow = c(1, 2), mar = c(5, 5, 4, 3) + 0.1, mgp = c(2.8, 0.6, 0)) <> @ \caption{Visualization of the fitted model \Robject{modcot2}. Left: In the transformed space, the logarithm of the odds of low birth weight as a function of the logarithm of cotinine levels is shown. Note that this plot should not be reported since the logarithm of cotinine levels is meaningless, and the logarithm of the odds is difficult to interpret. This type of plot is intended just to visually explore the model goodness of fit. Right: In the original space, the probability of low birth weight as a function of cotinine levels is shown. This type of plot is appropriate for reporting. Dashed lines represent 95\% confidence intervals for the measure.} \label{fig:cot2} \end{center} \end{figure} Regarding effects, the function \Robject{effectInfo} indicates that, under the fitted model, we can summarize the effect of cotinine level ($X$) on low birth weight ($Y$) in terms of odds ratio (OR) for a percent (or multiplicative) change in cotinine level: <<>>= effectInfo(modcot2) @ Thus, the function \Robject{effect} provides as default the OR of low birth weight for a percent change in the cotinine level equal to the interquartile ratio: <<>>= effect(modcot2) @ Alternatively, the effect for a 10-fold change in the cotinine level is: <<>>= effect(modcot2, q = 10) @ \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% %%%% Bibliography %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\bibliographystyle{apalike} \cleardoublepage %\phantomsection %\addcontentsline{toc}{chapter}{Bibliography} \bibliography{bibliography}{} \end{document} % %%%%%%%%%%%%%%%%%%%%%%%% % %%%%%%%%%%%%%%%%%%%%%%%% % % References % %%%%%%%%%%%%%%%%%%%%%%%% % %%%%%%%%%%%%%%%%%%%%%%%% % % %\bibliographystyle{plain} % \bibliographystyle{apalike} % % % \begin{thebibliography}{99} % % %\bibliography{bayesGen} % % \bibitem{Barrera2015} Barrera-G\'omez J, Basaga\~na X. Models with transformed variables: interpretation and software. \emph{Epidemiology}. 2015;26(2):e16-17. DOI: 10.1097/EDE.0000000000000247. % % \bibitem{Rivera2013} Rivera M, Basaga\~na X, Aguilera I, Foraster M, Agis D, de Groot E, Perez L, Mendez MA, Bouso L, Targa J, Ramos R, Sala J, Marrugat J, Elosua R, K\"unzli N. Association between long-term exposure to traffic-related air pollution and subclinical atherosclerosis: The REGICOR study. \emph{Environmental Health Perspectives} 2013;121(2):223-230. DOI:10.1289/ehp.1205146. % % \bibitem{Davison1997} Davison AC, Hinkley DV. Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, New York. 1997. ISBN 9780521574716. % % \bibitem{Pichini2000} Pichini S, Basaga\~na X, Pacifici R, Garcia O, Puig C, Vall O, Harris J, Zuccaro P, Segura J, Sunyer J. Cord serum cotinine as a biomarker of fetal exposure to cigarette smoke at the end of pregnancy. Environmental Health Perspectives. 2000;108(11):1079-1083. % % \bibitem{Basagana2002} Basaga\~na X, Torrent M, Atkinson W, Puig C, Barnes M, Vall O, Jones M, Sunyer J, Cullinan P; AMICS study. Domestic aeroallergen levels in Barcelona and Menorca (Spain). Pediatric Allergy and Immunology. 2002;13(6):412-417. % % \bibitem{Searle1980} Searle SR, Speed FM, Milliken GA. Population marginal means in the linear model: An alternative to least squares means. Amer Stat. 1980;34(4):216-221. % % \bibitem{Elswick1997} Elswick RK Jr, Schwartz PF, Welsh JA. Interpretation of the odds ratio from logistic regression after a transformation of the covariate vector. Stat Med. 1997;16(15):1695-1703. % % \end{thebibliography} % % \end{document}