This vignette shows various regression options with optimal scaling (OS). In Gifi this idea is called morals (Young, Takane, and De Leeuw 1976). We start with simple linear regression, followed by other models of increasing complexity.
Throughout this unit we use a dataset included in the MPsychoR
package on emotion development. Participant’s age and gender are predictors, and granularity is the response. Granularity refers to a person’s ability to separate their emotions into specific types. People with low granularity (here scaled in the positive direction) struggle to separate their emotions (e.g., reporting that sadness, anger, fear, and others all just feel “bad”), whereas people with high granularity are very specific in how they parse their emotions (e.g., easily distinguishing between nuanced emotions like disappointment and frustration).
We first focus on simple regression settings with age
as predictor and granularity
as response. We start with emulating standard linear regression. We standardize the variables such that we get standardized regression coefficients right away.
library("Gifi")
library("MPsychoR")
data("granularity")
scale(granularity[,1:2]) |> as.data.frame()
granularity1 <-head(granularity1)
#> gran age
#> 1 1.34472707 -1.834131
#> 2 0.70924034 -1.834131
#> 3 -0.01834218 -1.815128
#> 4 0.79902412 -1.644100
#> 5 1.05238391 -1.758118
#> 6 1.82151872 -1.644100
plot(granularity1[,2:1], main = "Scatterplot")
lm(gran ~ -1 + age, data = granularity1)
fitlin1 <-coef(fitlin1)
#> age
#> -0.2416765
We get a standardized slope of -0.2417, and an \(R^2\) of 0.058.
Now we mimic this model using morals. Note that here we operate on the unstandardized, original data. Since we are performing linear (metric) transformations on granularity as well as on age, morals()
will standardize the data internally. Note that morals()
requires to set up the linear transformations using splines: type = "E"
in knotsGifi()
implies no interior knots, xdegrees = 1
and ydegrees = 1
tells morals()
to fit a polynomial of degree 1, and we also set the ordinal arguments to FALSE
.
knotsGifi(granularity$age, type = "E")
xknots_age <- knotsGifi(granularity$gran, type = "E")
yknots_gran <- morals(x = granularity$age, y = granularity$gran,
fitlin2 <-xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitlin2#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.379
#> Number of iterations: 15
#>
#> Coefficients:
#> x
#> -0.2417
#>
#> Multiple R-squared: 0.058
The results are the same as in the lm()
fit above.
Let us extend these models by fitting quadratic regressions. First, we call lm()
on the unstandardized data (we could also use the standardized data, it doesn’t matter). Then we emulate this quadratic regression with morals()
. Note that, as opposed to the linear fit, we set the polynomial degree of the predictor to a value of 2.
lm(gran ~ age + I(age^2), data = granularity)
fitquad1 <-
fitquad1#>
#> Call:
#> lm(formula = gran ~ age + I(age^2), data = granularity)
#>
#> Coefficients:
#> (Intercept) age I(age^2)
#> 1.275380 -0.071488 0.002052
morals(x = granularity$age, y = granularity$gran,
fitquad2 <-xknots = xknots_age, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitquad2#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.281
#> Number of iterations: 21
#>
#> Coefficients:
#> x
#> -0.4372
#>
#> Multiple R-squared: 0.191
Let us produce two plots based on the quadratic morals()
fit and then explain how morals()
achieves a quadratic fit. The top panel shows the optimally scaled predictor and response, including the regression line. The bottom panel shows quadratic line when age is kept on the original scale while the response is on the transformed scale.
plot(fitquad2$xhat, fitquad2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitquad2$xhat, fitquad2$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitquad2$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Quadratic Morals Fit")
order(granularity$age)
ind <-lines(granularity$age[ind], fitquad2$ypred[ind], col = "coral4", lwd = 2)
For both models we get an \(R^2\) of 0.191. This reflects a substantial increase compared to the linear fit.
Both lm()
and morals()
calls lead to the same results even though in the LM fit we get 2 slope parameters (one for the linear term and one for the quadratic term), whereas in morals()
we only get a single one.
In the morals()
call we specified a linear transformation for the response and a quadratic transformation for the predictor which leads to a quadratic regression fit (see plot(fitquad2, plot.type = "transplot")
). Internally, morals()
scales both variables according to these transformation functions in a way such that their relationship becomes linear (sometimes called bilinearizability; see top panel of the figure above). The parameter we get from the morals()
output reflects the slope of this line. The sign of this parameter is irrelavant since it can happen that, based on the starting values used for the OS process, the sign of the transformed age scores can be switched.
If we produce the scatterplot with the original age scale we see what we actually fitted (bottom panel of the figure above): a quadratic regression where the granularity score decreases for the younger participants and increases for the older ones. On the y-axis we plot the transformed (i.e., standardized) response in order to be able to picture the curve. A cubic regression can be fitted by setting xdegrees=3
.
The next model we fit is a piecewise linear regression. That is, we want to fit two lines which connect at a particular age knot. In spline terminology we fit a spline of degree 1 with one interior knot. Using the knots utility function for setting quantile knots, it places the knot at the median.
knotsGifi(granularity$age, "Q", n = 1)
xknots_age2 <-
xknots_age2#> [[1]]
#> 50%
#> 15.3
#>
#> attr(,"type")
#> [1] "knotsQ"
Alternatively we could also set a knot at a particular age value of interest (e.g., 18 years) by modifying xknots_age2
as follows:
1]] <- 18 xknots_age2[[
Now we fit the piecewise regression model and produce the effects plot:
morals(x = granularity$age, y = granularity$gran,
fitpiece <-xknots = xknots_age2, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitpiece#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age2,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.276
#> Number of iterations: 27
#>
#> Coefficients:
#> x
#> 0.4482
#>
#> Multiple R-squared: 0.201
plot(fitpiece$xhat, fitpiece$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitpiece$xhat, fitpiece$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitpiece$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Piecewise Linear Morals Fit")
order(granularity$age)
ind <-lines(granularity$age[ind], fitpiece$ypred[ind], col = "coral4", lwd = 2)
We get an \(R^2\) of 0.201. The regression fit with the break at 18 years is shown in the bottom panel of the figure above.
Next, we fit a spline regression using a quadratic spline (xdegrees = 2
) with 3 interior quantile knots (i.e., we set them at the terciles), and produce the effects plots as above:
knotsGifi(granularity$age, "Q", n = 3)
xknots_age3 <-
xknots_age3#> [[1]]
#> 25% 50% 75%
#> 11.35 15.30 19.80
#>
#> attr(,"type")
#> [1] "knotsQ"
morals(granularity$age, granularity$gran,
fitspline <-xknots = xknots_age3, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitspline#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age3,
#> yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.268
#> Number of iterations: 25
#>
#> Coefficients:
#> x
#> 0.4639
#>
#> Multiple R-squared: 0.215
plot(fitspline$xhat, fitspline$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitspline$xhat, fitspline$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitspline$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Spline Morals Fit")
order(granularity$age)
ind <-lines(granularity$age[ind], fitspline$ypred[ind], col = "coral4", lwd = 2)
Compared to the quadratic and the piecewise linear fit, there is only a slight improvement in \(R^2\) 0.215. The bottom panel of the figure above shows the corresponding smooth regression fit.
Next we present a monotone regression fit. Montone regression fits a step function into the scatterplot. If the step function is monotonically decreasing, it is called antitone regression. If it is monotonically increasing, isotone regression. Isotone regression is described in detail in De Leeuw, Hornik, and Mair (2009), who also provide a flexible implementation of various isotone regression techniques by means of the package isotone
. In Gifi
, a monotone regression can be fitted as follows (the knots are set at the data points).
knotsGifi(granularity$age, "D")
xknots_age4 <- morals(x = granularity$age, y = granularity$gran,
fitmono <-xknots = xknots_age4, yknots = yknots_gran, ydegrees = 1, yordinal = FALSE)
plot(fitmono$xhat, fitmono$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitmono$xhat, fitmono$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitmono$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Monotone Morals Fit")
stepfun(sort(granularity$age)[-nrow(granularity)], fitmono$ypred[order(granularity$age)])
sfun <-plot(sfun, col = "coral4", add = TRUE, pch = 19, cex = 0.7, lwd = 2)
Obviously this U-shaped relationship is not the best example for a monotone regression fit. Still, it shows nicely how monotone regression works. In the figure above we see that after a certain point the descending step function is not able anymore to capture the increasing granularity pattern for the older participants, since it is restricted to be monotone. Thus, it becomes a flat line. The \(R^2\) of 0.193 is still good because the model is very precise for the younger participants.
The last model we fit uses nominal transformation of age which leads to the most flexible specification. This model is nonsense, of course: aside from totally overfitting the data, it doesn’t take any metric or ordinal age
information into account. But it certainly demonstrates the flexibility of morals.
knotsGifi(granularity$age, "D")
xknots_age5 <- morals(granularity$age, granularity$gran,
fitnom <-xknots = xknots_age5, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
plot(fitnom$xhat, fitnom$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitnom$xhat, fitnom$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitnom$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Nominal Morals Fit")
order(granularity$age)
ind <-lines(granularity$age[ind], fitnom$ypred[ind], col = "coral4", lwd = 2)
This model has a remarkable \(R^2\)-value of 0.729. But we overfit, of course.
At this point the question is: Which model should we use? We have seen that, apart from the linear and the nominal model, all \(R^2\)-values were in the 0.2 range. In order to examine potential overfitting and stability of these models, we can perform a \(K\)-fold cross-validation (CV). Avery basic version is implemented in cv.morals()
, subject to further refinement in the future. For a typical value of \(K=10\), the idea is to leave out 10 observations at random, fit morals()
without these observations, and then use this model to predict the left out observations. For each model we compute a CV error
:
\[\begin{equation} CV_{(K)} = \frac{1}{K} \sum_{k=1}^K ({\hat y_k} - y_k)^2, \end{equation}\]
where \(y_k\) are the observed (but scaled) response values in fold \(k\), and \(\hat y_k}\) are the predicted values based on the fit without the left-out observations. In other words, for prediction we use the optimally scaled linear morals model as shown in the left panels of the figures above. The smaller \(CV_{(K)}\), the less we tend to overfit the data.
set.seed(123)
cv(fitlin2, folds = 10)
cvlin <- cv(fitquad2, folds = 10)
cvquad <- cv(fitpiece, folds = 10)
cvpiece <- cv(fitspline, folds = 10)
cvspline <- cv(fitmono, folds = 10)
cvmono <- cv(fitnom, folds = 10)
cvnom <- c(cvlin, cvquad, cvpiece, cvspline, cvmono, cvnom)
cvvec <- c(fitlin2$smc, fitquad2$smc, fitpiece$smc, fitspline$smc,
r2vec <-$smc, fitnom$smc)
fitmono cbind(cvvec, r2vec)
cvr2 <-dimnames(cvr2) <- list(c("linear", "quadratic", "piecewise", "spline",
"monotone", "nominal"), c("CV-error", "R2"))
round(cvr2, 5)
#> CV-error R2
#> linear 0.00671 0.05841
#> quadratic 0.00915 0.19114
#> piecewise 0.00947 0.20090
#> spline 0.00876 0.21516
#> monotone 0.00639 0.19345
#> nominal 0.01060 0.72935
For the nominal model the CV-error is large, but the \(R^2\) is high. Conversely, for the linear model is the CV-error is low, but the \(R^2\) is bad.
Morals can easily be extended to multiple predictors. The starting point is the linear model from above. We add a categorical predictor (gender
; converted into numerical), and an interaction between gender
and age
.
granularity
granularity2 <-$gender <- as.numeric(granularity$gender)-1
granularity2 scale(granularity2) |> as.data.frame()
granularity2 <-head(granularity2)
#> gran age gender
#> 1 1.34472707 -1.834131 -1.1229253
#> 2 0.70924034 -1.834131 0.8843037
#> 3 -0.01834218 -1.815128 0.8843037
#> 4 0.79902412 -1.644100 -1.1229253
#> 5 1.05238391 -1.758118 0.8843037
#> 6 1.82151872 -1.644100 0.8843037
lm(gran ~ -1 + age*gender, data = granularity2)
fitmlin1 <-
fitmlin1#>
#> Call:
#> lm(formula = gran ~ -1 + age * gender, data = granularity2)
#>
#> Coefficients:
#> age gender age:gender
#> -0.25475 0.07687 0.06547
Now we mimic this model using morals()
. We operate on the standardized data and create the interaction terms as follows:
$int <- granularity2$age * granularity2$gender
granularity2 knotsGifi(granularity2[,2:4], "E")
xknots_age <- knotsGifi(granularity2$gran, "E")
yknots_gran <- morals(x = granularity2[,2:4], y= granularity2$gran,
fitmlin2 <-xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitmlin2#> Call:
#> morals(x = granularity2[, 2:4], y = granularity2$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.369
#> Number of iterations: 27
#>
#> Coefficients:
#> age gender int
#> -0.2548 0.0769 0.0648
#>
#> Multiple R-squared: 0.068
The morals()
result matches the one from the lm()
call.
So far we have not touched the response variable in terms of using a transformation other than linear (which just standardizes \(Y\)). Let us fit an ordinal response version using morals()
. We categorize the granularity
into a Likert-type item (5 categories). First we fit a proportional odds model using polr()
with a quadratic age
effect and a main effect for gender
. By default, polr()
uses a logit link.
library("MASS")
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:Gifi':
#>
#> mammals
cut(granularity$gran, 5, labels = 1:5)
grancat <- polr(grancat ~ age + I(age^2) + gender, data = granularity)
fitord1 <-summary(fitord1)
#>
#> Re-fitting to get Hessian
#> Call:
#> polr(formula = grancat ~ age + I(age^2) + gender, data = granularity)
#>
#> Coefficients:
#> Value Std. Error t value
#> age -0.89427 0.191634 -4.6665
#> I(age^2) 0.02568 0.005996 4.2829
#> genderfemale 0.17924 0.309393 0.5793
#>
#> Intercepts:
#> Value Std. Error t value
#> 1|2 -9.1019 1.5014 -6.0622
#> 2|3 -7.3763 1.4643 -5.0373
#> 3|4 -6.1961 1.4308 -4.3305
#> 4|5 -4.4419 1.3773 -3.2250
#>
#> Residual Deviance: 410.3663
#> AIC: 424.3663
There is basically no gender effect. In morals()
we do not need to specify a particular link function. Having an ordinal response we can simply apply an ordinal transformation on it.
granularity
granularity3 <-$gender <- as.numeric(granularity$gender)
granularity3 knotsGifi(granularity3[,2:3], "E")
xknots_age <- knotsGifi(grancat, "D")
yknots_gran2 <- morals(x = granularity3[,2:3], y = as.numeric(grancat),
fitord2 <-xknots = xknots_age, yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1,
xordinal = FALSE, yordinal = TRUE)
fitord2#> Call:
#> morals(x = granularity3[, 2:3], y = as.numeric(grancat), xknots = xknots_age,
#> yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1,
#> xordinal = FALSE, yordinal = TRUE)
#>
#> Loss value: 0.27
#> Number of iterations: 26
#>
#> Coefficients:
#> age gender
#> -0.4573 0.0299
#>
#> Multiple R-squared: 0.211
plot(fitord2, "transplot", main = c("Granularity Categorical", "Age", "Gender"))
The figure shows the transformation plots. For the response we have an ordinal transformation which stretches/squeezes the 1-5 input categories. Age is quadratic, and gender is taken as nominal. The parameters do not have the same meaning as is a logistic regression context.
From the coefficients and in the plot below we see that there is a large effect for age, and the gender effect is close to 0. Again, these effects represent the slope parameters in the linearized regressions based on the transformed scores (see figures below).
plot(fitord2$xhat[,1], fitord2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitord2$xhat[,1][granularity3$gender == 1], fitord2$ypred[granularity3$gender == 1], col = "coral4", lwd = 2)
lines(fitord2$xhat[,1][granularity3$gender == 2], fitord2$ypred[granularity3$gender == 2], col = "cadetblue", lwd = 2)
legend(-0.02, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
plot(granularity3$age, fitord2$yhat, xlab = "Age", ylab = "Granularity (transformed)", main = "Ordinal Polynomial Morals Fit")
order(granularity3$age[granularity3$gender == 1])
ind1 <-lines(granularity3$age[granularity3$gender == 1][ind1], fitord2$ypred[granularity3$gender == 1][ind1], col = "coral4", lwd = 2)
order(granularity3$age[granularity3$gender == 2])
ind2 <-lines(granularity3$age[granularity3$gender == 2][ind2], fitord2$ypred[granularity3$gender == 2][ind2], col = "cadetblue", lwd = 2)
legend(20, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
De Leeuw, J., K. Hornik, and P. Mair. 2009. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods.” Journal of Statistical Software 32 (5): 1–24. https://doi.org/10.18637/jss.v032.i05.
Young, F. W., Y. Takane, and J. De Leeuw. 1976. “Regression with Qualitative and Quantitative Variables: An Alternating Least Squares Approach with Optimal Scaling Features.” Psychometrika 43: 505–29. https://doi.org/10.1007/BF02296972.