---
title: "Ordinal and Mixed PCA"
author: "Patrick Mair, Jan De Leeuw"
output: rmarkdown::html_vignette
bibliography: gifi.bib
vignette: >
%\VignetteIndexEntry{Ordinal and Mixed PCA}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
This vignette shows applications of *principal component analysis* (PCA) on ordinal and mixed input data which in Gifi slang is called *princals*. To be more precise, princals is a combination of multiple correspondence analysis and PCA (with splinical and ordinal options).
## Ordinal PCA in a Nutshell
We use a dataset from @Kennett+Salini:2012 on customer satisfaction that is included in the `Gifi` package.
```{r}
library("Gifi")
ABC6 <- ABC[,6:11] ## we use 6 items from the dataset
head(ABC6)
```
We have a sample size of `r nrow(ABC6)`. Each item is scaled on a 5-point rating scale (1 ... very low, 5 ... very high). A more detailed description can be found in the corresponding `?ABC` help file. We are now going to fit an PCA with all items taken at an ordinal scale level:
```{r}
fit_ordpca <- princals(ABC6, ndim = 2, levels = "ordinal")
fit_ordpca
```
We obtain a loss value of `r round(fit_ordpca$f,3)`. The results are prepared in a way so that they are close to the classical PCA output involving eigenvalues, loadings, and variance accounted for (VAF):
```{r}
summary(fit_ordpca)
```
We can plot the loadings, or combine them with the PC scores (which Gifi calls *object scores*) in a biplot.
```{r, fig.height=5, fig.width=5}
plot(fit_ordpca, plot.type = "loadplot")
```
```{r, fig.height=5, fig.width=5}
plot(fit_ordpca, plot.type = "biplot", col.loadings = "coral3", col.scores = "lightgrey")
abline(h = 0, v = 0, lty = 2)
```
Similar to standard PCA, one can create a scree plot (`plot.type = "screeplot"`) to assess the number of dimensions needed.
If the user wants to have a closer insight into the optimal scaling transformations under the hood, transformation plots can be produced:
```{r, fig.height=7, fig.width=7}
plot(fit_ordpca, plot.type = "transplot")
```
These plots nicely show the transformations of the original 1-5 scores into the optimally scaled *category quantifications*. Since in the `princals()` fit above we defined `levels = "ordinal"`, a monotone regression transformation is applied resulting in these step functions. The data frame with the quantifications on the first dimension can be extracted as follows:
```{r}
head(fit_ordpca$scoremat)
```
This data frame can replace the original `AB6` object for subsequent analyses.
## Standard PCA with Gifi
One can also mimick a standard PCA using `princals()` by setting the `levels` argument accordingly:
```{r}
fit_metpca <- princals(ABC6, ndim = 2, levels = "metric")
fit_metpca
```
The results are highly similar to the classical `prcomp(ABC6, scale = TRUE)` fit, and the same plots as above can be produced. Here we look again at the transformation plots, showing the linear transformation (simply a z-standardization) induced by the metric measurement level assumption.
```{r, fig.height=7, fig.width=7}
plot(fit_metpca, plot.type = "transplot")
```
One can compare the loss value of `r round(fit_ordpca$f, 3)` from the ordinal solution with `r round(fit_metpca$f, 3)` from the metric solution and, in conjunction with the transformation plots, judge whether these Likert items can be treated as metric.
## Mixed PCA
To demonstrate PCA with mixed input scale levels, we use a subset of the data based on the Wilson-Patterson conservatism scale from `MPsychoR` package. For illustration purposes we create age groups so that we have an ordinal variable in the dataset.
```{r}
data("WilPat2")
WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5)
head(WilPat2)
```
The first 6 items are Wilson-Patterson items (1 = "approve", 0 = "disapprove", and 2 = "don't know") which we declare as "nominal". This is followed by `Country` which is also "nominal", and two self-reported liberalism/conservatism and left/right identification which enter as "metric". Finally we use an "ordinal" transformation for age. The corresponding `levelvec` object is then shipped to `princals()`.
```{r}
levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal")
wen_prin <- princals(WilPat2, levels = levelvec) ## fit 2D princals
wen_prin
summary(wen_prin)
```
Let us look at the transformation plots of some variables:
```{r, fig.height=7, fig.width=7}
plot(wen_prin, plot.type = "transplot", var.subset = c(1, 8, 9, 11))
```
Let's produce the joint plot as the main output which combines the loadings for the non-nominal variables with the category quantifications of the
nominal variables. For aesthetic reasons we re-scale the loading arrows by a factor of 0.08.
```{r, fig.height=7, fig.width=7}
plot(wen_prin, plot.type = "jointplot", expand = 0.08)
```
In case we are interested in the *object scores*, we can ask for an object plot:
```{r, fig.height=7, fig.width=7}
plot(wen_prin, "objplot", cex.scores = 0.6)
```
A biplot could also be produced that combines the loadings plot with the object plot.
Note that princals is a restricted version of homals: in princals, the category quantifications on the second dimension are
linearly related to the ones from dimension 1, whereas in homals they are unrestricted. In the multiple correspondence analysis vignette we call `homals()` on the same data. Whether to use princals or homals on datasets like these, depends on what we are mainly interest in: category quantifications of the nominal variables (homals) or loadings of the non-nominal variables (princals). This said, as shown in Sec. 8.3.3 in @Mair:2018b, using the concept of *copies* one can obtain the same results in princals as with a homals fit. Eventually, it's all the same (Gifi) monster.
Further technical details can be found in `vignette("gifi_theory")`. Elaborations on optimal scaling, further applied aspects, and examples are given in @Mair:2018b and in the excellent paper by @Linting+Meulman:2007.