Regression in Gifi

This vignette shows various regression options with optimal scaling (OS). In Gifi this idea is called morals (Young, Takane, and De Leeuw 1976). We start with simple linear regression, followed by other models of increasing complexity.

Throughout this unit we use a dataset included in the MPsychoR package on emotion development. Participant’s age and gender are predictors, and granularity is the response. Granularity refers to a person’s ability to separate their emotions into specific types. People with low granularity (here scaled in the positive direction) struggle to separate their emotions (e.g., reporting that sadness, anger, fear, and others all just feel “bad”), whereas people with high granularity are very specific in how they parse their emotions (e.g., easily distinguishing between nuanced emotions like disappointment and frustration).

Linear Regression

We first focus on simple regression settings with age as predictor and granularity as response. We start with emulating standard linear regression. We standardize the variables such that we get standardized regression coefficients right away.

library("Gifi")
library("MPsychoR")
data("granularity")
granularity1 <- scale(granularity[,1:2]) |> as.data.frame()
head(granularity1)
#>          gran       age
#> 1  1.34472707 -1.834131
#> 2  0.70924034 -1.834131
#> 3 -0.01834218 -1.815128
#> 4  0.79902412 -1.644100
#> 5  1.05238391 -1.758118
#> 6  1.82151872 -1.644100
plot(granularity1[,2:1], main = "Scatterplot")

fitlin1 <- lm(gran ~ -1 + age, data = granularity1)
coef(fitlin1)
#>        age 
#> -0.2416765

We get a standardized slope of -0.2417, and an \(R^2\) of 0.058.

Now we mimic this model using morals. Note that here we operate on the unstandardized, original data. Since we are performing linear (metric) transformations on granularity as well as on age, morals() will standardize the data internally. Note that morals() requires to set up the linear transformations using splines: type = "E" in knotsGifi() implies no interior knots, xdegrees = 1 and ydegrees = 1 tells morals() to fit a polynomial of degree 1, and we also set the ordinal arguments to FALSE.

xknots_age <- knotsGifi(granularity$age, type = "E")      
yknots_gran <- knotsGifi(granularity$gran, type = "E")
fitlin2 <-  morals(x = granularity$age, y = granularity$gran, 
   xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitlin2
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age, 
#>     yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE, 
#>     yordinal = FALSE)
#> 
#> Loss value: 0.379 
#> Number of iterations: 15 
#> 
#> Coefficients:
#>       x 
#> -0.2417 
#> 
#> Multiple R-squared: 0.058

The results are the same as in the lm() fit above.

Polynomial Regression

Let us extend these models by fitting quadratic regressions. First, we call lm() on the unstandardized data (we could also use the standardized data, it doesn’t matter). Then we emulate this quadratic regression with morals(). Note that, as opposed to the linear fit, we set the polynomial degree of the predictor to a value of 2.

fitquad1 <- lm(gran ~ age + I(age^2), data = granularity)
fitquad1
#> 
#> Call:
#> lm(formula = gran ~ age + I(age^2), data = granularity)
#> 
#> Coefficients:
#> (Intercept)          age     I(age^2)  
#>    1.275380    -0.071488     0.002052
fitquad2 <-  morals(x = granularity$age, y = granularity$gran, 
  xknots = xknots_age, yknots = yknots_gran, xdegrees = 2, ydegrees = 1, 
  xordinal = FALSE, yordinal = FALSE) 
fitquad2
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age, 
#>     yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE, 
#>     yordinal = FALSE)
#> 
#> Loss value: 0.281 
#> Number of iterations: 21 
#> 
#> Coefficients:
#>       x 
#> -0.4372 
#> 
#> Multiple R-squared: 0.191

Let us produce two plots based on the quadratic morals() fit and then explain how morals() achieves a quadratic fit. The top panel shows the optimally scaled predictor and response, including the regression line. The bottom panel shows quadratic line when age is kept on the original scale while the response is on the transformed scale.

plot(fitquad2$xhat, fitquad2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitquad2$xhat, fitquad2$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitquad2$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Quadratic Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitquad2$ypred[ind], col = "coral4", lwd = 2)

For both models we get an \(R^2\) of 0.191. This reflects a substantial increase compared to the linear fit.

Both lm() and morals() calls lead to the same results even though in the LM fit we get 2 slope parameters (one for the linear term and one for the quadratic term), whereas in morals() we only get a single one.

In the morals() call we specified a linear transformation for the response and a quadratic transformation for the predictor which leads to a quadratic regression fit (see plot(fitquad2, plot.type = "transplot")). Internally, morals() scales both variables according to these transformation functions in a way such that their relationship becomes linear (sometimes called bilinearizability; see top panel of the figure above). The parameter we get from the morals() output reflects the slope of this line. The sign of this parameter is irrelavant since it can happen that, based on the starting values used for the OS process, the sign of the transformed age scores can be switched.

If we produce the scatterplot with the original age scale we see what we actually fitted (bottom panel of the figure above): a quadratic regression where the granularity score decreases for the younger participants and increases for the older ones. On the y-axis we plot the transformed (i.e., standardized) response in order to be able to picture the curve. A cubic regression can be fitted by setting xdegrees=3.

Piecewise Linear Regression

The next model we fit is a piecewise linear regression. That is, we want to fit two lines which connect at a particular age knot. In spline terminology we fit a spline of degree 1 with one interior knot. Using the knots utility function for setting quantile knots, it places the knot at the median.

xknots_age2 <- knotsGifi(granularity$age, "Q", n = 1)
xknots_age2
#> [[1]]
#>  50% 
#> 15.3 
#> 
#> attr(,"type")
#> [1] "knotsQ"

Alternatively we could also set a knot at a particular age value of interest (e.g., 18 years) by modifying xknots_age2 as follows:

xknots_age2[[1]] <- 18

Now we fit the piecewise regression model and produce the effects plot:

fitpiece <-  morals(x = granularity$age, y = granularity$gran, 
   xknots = xknots_age2, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitpiece
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age2, 
#>     yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE, 
#>     yordinal = FALSE)
#> 
#> Loss value: 0.276 
#> Number of iterations: 27 
#> 
#> Coefficients:
#>      x 
#> 0.4482 
#> 
#> Multiple R-squared: 0.201

plot(fitpiece$xhat, fitpiece$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitpiece$xhat, fitpiece$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitpiece$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Piecewise Linear Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitpiece$ypred[ind], col = "coral4", lwd = 2)

We get an \(R^2\) of 0.201. The regression fit with the break at 18 years is shown in the bottom panel of the figure above.

Spline Regression

Next, we fit a spline regression using a quadratic spline (xdegrees = 2) with 3 interior quantile knots (i.e., we set them at the terciles), and produce the effects plots as above:

xknots_age3 <- knotsGifi(granularity$age, "Q", n = 3)
xknots_age3
#> [[1]]
#>   25%   50%   75% 
#> 11.35 15.30 19.80 
#> 
#> attr(,"type")
#> [1] "knotsQ"
fitspline <-  morals(granularity$age, granularity$gran, 
   xknots = xknots_age3, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitspline
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age3, 
#>     yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE, 
#>     yordinal = FALSE)
#> 
#> Loss value: 0.268 
#> Number of iterations: 25 
#> 
#> Coefficients:
#>      x 
#> 0.4639 
#> 
#> Multiple R-squared: 0.215

plot(fitspline$xhat, fitspline$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitspline$xhat, fitspline$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitspline$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Spline Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitspline$ypred[ind], col = "coral4", lwd = 2)

Compared to the quadratic and the piecewise linear fit, there is only a slight improvement in \(R^2\) 0.215. The bottom panel of the figure above shows the corresponding smooth regression fit.

Monotone Regression

Next we present a monotone regression fit. Montone regression fits a step function into the scatterplot. If the step function is monotonically decreasing, it is called antitone regression. If it is monotonically increasing, isotone regression. Isotone regression is described in detail in De Leeuw, Hornik, and Mair (2009), who also provide a flexible implementation of various isotone regression techniques by means of the package isotone. In Gifi, a monotone regression can be fitted as follows (the knots are set at the data points).

xknots_age4 <- knotsGifi(granularity$age, "D")
fitmono <-  morals(x = granularity$age, y = granularity$gran, 
  xknots = xknots_age4, yknots = yknots_gran, ydegrees = 1, yordinal = FALSE)

plot(fitmono$xhat, fitmono$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitmono$xhat, fitmono$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitmono$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Monotone Morals Fit")
sfun <- stepfun(sort(granularity$age)[-nrow(granularity)], fitmono$ypred[order(granularity$age)])
plot(sfun, col = "coral4", add = TRUE, pch = 19, cex = 0.7, lwd = 2)

Obviously this U-shaped relationship is not the best example for a monotone regression fit. Still, it shows nicely how monotone regression works. In the figure above we see that after a certain point the descending step function is not able anymore to capture the increasing granularity pattern for the older participants, since it is restricted to be monotone. Thus, it becomes a flat line. The \(R^2\) of 0.193 is still good because the model is very precise for the younger participants.

Nominal Transformation

The last model we fit uses nominal transformation of age which leads to the most flexible specification. This model is nonsense, of course: aside from totally overfitting the data, it doesn’t take any metric or ordinal age information into account. But it certainly demonstrates the flexibility of morals.

xknots_age5 <- knotsGifi(granularity$age, "D")
fitnom <-  morals(granularity$age, granularity$gran, 
   xknots = xknots_age5, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)

plot(fitnom$xhat, fitnom$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitnom$xhat, fitnom$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitnom$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Nominal Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitnom$ypred[ind], col = "coral4", lwd = 2)

This model has a remarkable \(R^2\)-value of 0.729. But we overfit, of course.

Cross-Validation

At this point the question is: Which model should we use? We have seen that, apart from the linear and the nominal model, all \(R^2\)-values were in the 0.2 range. In order to examine potential overfitting and stability of these models, we can perform a \(K\)-fold cross-validation (CV). Avery basic version is implemented in cv.morals(), subject to further refinement in the future. For a typical value of \(K=10\), the idea is to leave out 10 observations at random, fit morals() without these observations, and then use this model to predict the left out observations. For each model we compute a CV error:

\[\begin{equation} CV_{(K)} = \frac{1}{K} \sum_{k=1}^K ({\hat y_k} - y_k)^2, \end{equation}\]

where \(y_k\) are the observed (but scaled) response values in fold \(k\), and \(\hat y_k}\) are the predicted values based on the fit without the left-out observations. In other words, for prediction we use the optimally scaled linear morals model as shown in the left panels of the figures above. The smaller \(CV_{(K)}\), the less we tend to overfit the data.

set.seed(123)
cvlin <- cv(fitlin2, folds = 10)
cvquad <- cv(fitquad2, folds = 10)
cvpiece <- cv(fitpiece, folds = 10)
cvspline <- cv(fitspline, folds = 10)
cvmono <- cv(fitmono, folds = 10)
cvnom <- cv(fitnom, folds = 10)
cvvec <- c(cvlin, cvquad, cvpiece, cvspline, cvmono, cvnom)
r2vec <- c(fitlin2$smc, fitquad2$smc, fitpiece$smc, fitspline$smc,
          fitmono$smc, fitnom$smc)
cvr2 <- cbind(cvvec, r2vec)
dimnames(cvr2) <- list(c("linear", "quadratic", "piecewise", "spline",
                         "monotone", "nominal"), c("CV-error", "R2"))
round(cvr2, 5)
#>           CV-error      R2
#> linear     0.00671 0.05841
#> quadratic  0.00915 0.19114
#> piecewise  0.00947 0.20090
#> spline     0.00876 0.21516
#> monotone   0.00639 0.19345
#> nominal    0.01060 0.72935

For the nominal model the CV-error is large, but the \(R^2\) is high. Conversely, for the linear model is the CV-error is low, but the \(R^2\) is bad.

Multiple Linear Regression

Morals can easily be extended to multiple predictors. The starting point is the linear model from above. We add a categorical predictor (gender; converted into numerical), and an interaction between gender and age.

granularity2 <- granularity 
granularity2$gender <- as.numeric(granularity$gender)-1
granularity2 <- scale(granularity2) |> as.data.frame()
head(granularity2)
#>          gran       age     gender
#> 1  1.34472707 -1.834131 -1.1229253
#> 2  0.70924034 -1.834131  0.8843037
#> 3 -0.01834218 -1.815128  0.8843037
#> 4  0.79902412 -1.644100 -1.1229253
#> 5  1.05238391 -1.758118  0.8843037
#> 6  1.82151872 -1.644100  0.8843037
fitmlin1 <- lm(gran ~ -1 + age*gender, data = granularity2)
fitmlin1
#> 
#> Call:
#> lm(formula = gran ~ -1 + age * gender, data = granularity2)
#> 
#> Coefficients:
#>        age      gender  age:gender  
#>   -0.25475     0.07687     0.06547

Now we mimic this model using morals(). We operate on the standardized data and create the interaction terms as follows:

granularity2$int <- granularity2$age * granularity2$gender
xknots_age <- knotsGifi(granularity2[,2:4], "E")  
yknots_gran <- knotsGifi(granularity2$gran, "E")
fitmlin2 <-  morals(x = granularity2[,2:4], y= granularity2$gran, 
   xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitmlin2
#> Call:
#> morals(x = granularity2[, 2:4], y = granularity2$gran, xknots = xknots_age, 
#>     yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE, 
#>     yordinal = FALSE)
#> 
#> Loss value: 0.369 
#> Number of iterations: 27 
#> 
#> Coefficients:
#>     age  gender     int 
#> -0.2548  0.0769  0.0648 
#> 
#> Multiple R-squared: 0.068

The morals() result matches the one from the lm() call.

Multiple Ordinal Regression

So far we have not touched the response variable in terms of using a transformation other than linear (which just standardizes \(Y\)). Let us fit an ordinal response version using morals(). We categorize the granularity into a Likert-type item (5 categories). First we fit a proportional odds model using polr() with a quadratic age effect and a main effect for gender. By default, polr() uses a logit link.

library("MASS")
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:Gifi':
#> 
#>     mammals
grancat <- cut(granularity$gran, 5, labels = 1:5)
fitord1 <- polr(grancat ~ age + I(age^2) + gender, data = granularity)
summary(fitord1)
#> 
#> Re-fitting to get Hessian
#> Call:
#> polr(formula = grancat ~ age + I(age^2) + gender, data = granularity)
#> 
#> Coefficients:
#>                 Value Std. Error t value
#> age          -0.89427   0.191634 -4.6665
#> I(age^2)      0.02568   0.005996  4.2829
#> genderfemale  0.17924   0.309393  0.5793
#> 
#> Intercepts:
#>     Value   Std. Error t value
#> 1|2 -9.1019  1.5014    -6.0622
#> 2|3 -7.3763  1.4643    -5.0373
#> 3|4 -6.1961  1.4308    -4.3305
#> 4|5 -4.4419  1.3773    -3.2250
#> 
#> Residual Deviance: 410.3663 
#> AIC: 424.3663

There is basically no gender effect. In morals() we do not need to specify a particular link function. Having an ordinal response we can simply apply an ordinal transformation on it.

granularity3 <- granularity 
granularity3$gender <- as.numeric(granularity$gender)
xknots_age <- knotsGifi(granularity3[,2:3], "E")  
yknots_gran2 <- knotsGifi(grancat, "D")
fitord2 <-  morals(x = granularity3[,2:3], y = as.numeric(grancat),
  xknots = xknots_age, yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1, 
  xordinal = FALSE, yordinal = TRUE) 
fitord2
#> Call:
#> morals(x = granularity3[, 2:3], y = as.numeric(grancat), xknots = xknots_age, 
#>     yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1, 
#>     xordinal = FALSE, yordinal = TRUE)
#> 
#> Loss value: 0.27 
#> Number of iterations: 26 
#> 
#> Coefficients:
#>     age  gender 
#> -0.4573  0.0299 
#> 
#> Multiple R-squared: 0.211

plot(fitord2, "transplot", main = c("Granularity Categorical", "Age", "Gender"))

The figure shows the transformation plots. For the response we have an ordinal transformation which stretches/squeezes the 1-5 input categories. Age is quadratic, and gender is taken as nominal. The parameters do not have the same meaning as is a logistic regression context.

From the coefficients and in the plot below we see that there is a large effect for age, and the gender effect is close to 0. Again, these effects represent the slope parameters in the linearized regressions based on the transformed scores (see figures below).

plot(fitord2$xhat[,1], fitord2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitord2$xhat[,1][granularity3$gender == 1], fitord2$ypred[granularity3$gender == 1], col = "coral4", lwd = 2)
lines(fitord2$xhat[,1][granularity3$gender == 2], fitord2$ypred[granularity3$gender == 2], col = "cadetblue", lwd = 2)
legend(-0.02, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
plot(granularity3$age, fitord2$yhat, xlab = "Age", ylab = "Granularity (transformed)", main = "Ordinal Polynomial Morals Fit")
ind1 <- order(granularity3$age[granularity3$gender == 1])
lines(granularity3$age[granularity3$gender == 1][ind1], fitord2$ypred[granularity3$gender == 1][ind1], col = "coral4", lwd = 2)
ind2 <- order(granularity3$age[granularity3$gender == 2])
lines(granularity3$age[granularity3$gender == 2][ind2], fitord2$ypred[granularity3$gender == 2][ind2], col = "cadetblue", lwd = 2)
legend(20, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))

De Leeuw, J., K. Hornik, and P. Mair. 2009. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods.” Journal of Statistical Software 32 (5): 1–24. https://doi.org/10.18637/jss.v032.i05.

Young, F. W., Y. Takane, and J. De Leeuw. 1976. “Regression with Qualitative and Quantitative Variables: An Alternating Least Squares Approach with Optimal Scaling Features.” Psychometrika 43: 505–29. https://doi.org/10.1007/BF02296972.