---
title: "Ordinal and Mixed PCA"
author: "Patrick Mair, Jan De Leeuw"
output: rmarkdown::html_vignette
bibliography: gifi.bib
vignette: >
  %\VignetteIndexEntry{Ordinal and Mixed PCA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
  
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

This vignette shows applications of *principal component analysis* (PCA) on ordinal and mixed input data which in Gifi slang is called *princals*. To be more precise, princals is a combination of multiple correspondence analysis and PCA (with splinical and ordinal options).


## Ordinal PCA in a Nutshell

We use a dataset from @Kennett+Salini:2012 on customer satisfaction that is included in the `Gifi` package. 

```{r}
library("Gifi")
ABC6 <- ABC[,6:11]            ## we use 6 items from the dataset
head(ABC6)
``` 

We have a sample size of `r nrow(ABC6)`. Each item is scaled on a 5-point rating scale (1 ... very low, 5 ... very high). A more detailed description can be found in the corresponding `?ABC` help file. We are now going to fit an PCA with all items taken at an ordinal scale level:

```{r}
fit_ordpca <- princals(ABC6, ndim = 2, levels = "ordinal")  
fit_ordpca
```

We obtain a loss value of `r round(fit_ordpca$f,3)`. The results are prepared in a way so that they are close to the classical PCA output involving eigenvalues, loadings, and variance accounted for (VAF):

```{r}
summary(fit_ordpca)
```

We can plot the loadings, or combine them with the PC scores (which Gifi calls *object scores*) in a biplot. 

```{r, fig.height=5, fig.width=5}
plot(fit_ordpca, plot.type = "loadplot")
```

```{r, fig.height=5, fig.width=5}
plot(fit_ordpca, plot.type = "biplot", col.loadings = "coral3", col.scores = "lightgrey")
abline(h = 0, v = 0, lty = 2)
```

Similar to standard PCA, one can create a scree plot (`plot.type = "screeplot"`) to assess the number of dimensions needed. 

If the user wants to have a closer insight into the optimal scaling transformations under the hood, transformation plots can be produced: 

```{r, fig.height=7, fig.width=7}
plot(fit_ordpca, plot.type = "transplot")
```

These plots nicely show the transformations of the original 1-5 scores into the optimally scaled *category quantifications*. Since in the `princals()` fit above we defined `levels = "ordinal"`, a monotone regression transformation is applied resulting in these step functions. The data frame with the quantifications on the first dimension can be extracted as follows:

```{r}
head(fit_ordpca$scoremat)
```

This data frame can replace the original `AB6` object for subsequent analyses.  


## Standard PCA with Gifi 

One can also mimick a standard PCA using `princals()` by setting the `levels` argument accordingly: 

```{r}
fit_metpca <- princals(ABC6, ndim = 2, levels = "metric")
fit_metpca
```

The results are highly similar to the classical `prcomp(ABC6, scale = TRUE)` fit, and the same plots as above can be produced. Here we look again at the transformation plots, showing the linear transformation (simply a z-standardization) induced by the metric measurement level assumption. 

```{r, fig.height=7, fig.width=7}
plot(fit_metpca, plot.type = "transplot")
```

One can compare the loss value of `r round(fit_ordpca$f, 3)` from the ordinal solution with `r round(fit_metpca$f, 3)` from the metric solution and, in conjunction with the transformation plots, judge whether these Likert items can be treated as metric. 

## Mixed PCA 

To demonstrate PCA with mixed input scale levels, we use a subset of the data based on the Wilson-Patterson conservatism scale from `MPsychoR` package. For illustration purposes we create age groups so that we have an ordinal variable in the dataset.

```{r}
data("WilPat2")
WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5)      
head(WilPat2)
```
The first 6 items are Wilson-Patterson items (1 = "approve", 0 = "disapprove", and 2 = "don't know") which we declare as "nominal". This is followed by `Country` which is also "nominal", and two self-reported liberalism/conservatism and left/right identification which enter as "metric". Finally we use an "ordinal" transformation for age. The corresponding `levelvec` object is then shipped to `princals()`.

```{r}
levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal")    
wen_prin <- princals(WilPat2, levels = levelvec)                                          ## fit 2D princals
wen_prin
summary(wen_prin)
```

Let us look at the transformation plots of some variables: 

```{r, fig.height=7, fig.width=7}
plot(wen_prin, plot.type = "transplot", var.subset = c(1, 8, 9, 11))
```

Let's produce the joint plot as the main output which combines the loadings for the non-nominal variables with the category quantifications of the 
nominal variables. For aesthetic reasons we re-scale the loading arrows by a factor of 0.08.

```{r, fig.height=7, fig.width=7}
plot(wen_prin, plot.type = "jointplot", expand = 0.08)                 
```

In case we are interested in the *object scores*, we can ask for an object plot: 

```{r, fig.height=7, fig.width=7}
plot(wen_prin, "objplot", cex.scores = 0.6)                 
```

A biplot could also be produced that combines the loadings plot with the object plot. 

Note that princals is a restricted version of homals: in princals, the category quantifications on the second dimension are
linearly related to the ones from dimension 1, whereas in homals they are unrestricted. In the multiple correspondence analysis vignette we call `homals()` on the same data. Whether to use princals or homals on datasets like these, depends on what we are mainly interest in: category quantifications of the nominal variables (homals) or loadings of the non-nominal variables (princals). This said, as shown in Sec. 8.3.3 in @Mair:2018b, using the concept of *copies* one can obtain the same results in princals as with a homals fit. Eventually, it's all the same (Gifi) monster. 


Further technical details can be found in `vignette("gifi_theory")`. Elaborations on optimal scaling, further applied aspects, and examples are given in @Mair:2018b and in the excellent paper by @Linting+Meulman:2007.

<br>
<br>