--- title: "Ordinal and Mixed PCA" author: "Patrick Mair, Jan De Leeuw" output: rmarkdown::html_vignette bibliography: gifi.bib vignette: > %\VignetteIndexEntry{Ordinal and Mixed PCA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` This vignette shows applications of *principal component analysis* (PCA) on ordinal and mixed input data which in Gifi slang is called *princals*. To be more precise, princals is a combination of multiple correspondence analysis and PCA (with splinical and ordinal options). ## Ordinal PCA in a Nutshell We use a dataset from @Kennett+Salini:2012 on customer satisfaction that is included in the `Gifi` package. ```{r} library("Gifi") ABC6 <- ABC[,6:11] ## we use 6 items from the dataset head(ABC6) ``` We have a sample size of `r nrow(ABC6)`. Each item is scaled on a 5-point rating scale (1 ... very low, 5 ... very high). A more detailed description can be found in the corresponding `?ABC` help file. We are now going to fit an PCA with all items taken at an ordinal scale level: ```{r} fit_ordpca <- princals(ABC6, ndim = 2, levels = "ordinal") fit_ordpca ``` We obtain a loss value of `r round(fit_ordpca$f,3)`. The results are prepared in a way so that they are close to the classical PCA output involving eigenvalues, loadings, and variance accounted for (VAF): ```{r} summary(fit_ordpca) ``` We can plot the loadings, or combine them with the PC scores (which Gifi calls *object scores*) in a biplot. ```{r, fig.height=5, fig.width=5} plot(fit_ordpca, plot.type = "loadplot") ``` ```{r, fig.height=5, fig.width=5} plot(fit_ordpca, plot.type = "biplot", col.loadings = "coral3", col.scores = "lightgrey") abline(h = 0, v = 0, lty = 2) ``` Similar to standard PCA, one can create a scree plot (`plot.type = "screeplot"`) to assess the number of dimensions needed. If the user wants to have a closer insight into the optimal scaling transformations under the hood, transformation plots can be produced: ```{r, fig.height=7, fig.width=7} plot(fit_ordpca, plot.type = "transplot") ``` These plots nicely show the transformations of the original 1-5 scores into the optimally scaled *category quantifications*. Since in the `princals()` fit above we defined `levels = "ordinal"`, a monotone regression transformation is applied resulting in these step functions. The data frame with the quantifications on the first dimension can be extracted as follows: ```{r} head(fit_ordpca$scoremat) ``` This data frame can replace the original `AB6` object for subsequent analyses. ## Standard PCA with Gifi One can also mimick a standard PCA using `princals()` by setting the `levels` argument accordingly: ```{r} fit_metpca <- princals(ABC6, ndim = 2, levels = "metric") fit_metpca ``` The results are highly similar to the classical `prcomp(ABC6, scale = TRUE)` fit, and the same plots as above can be produced. Here we look again at the transformation plots, showing the linear transformation (simply a z-standardization) induced by the metric measurement level assumption. ```{r, fig.height=7, fig.width=7} plot(fit_metpca, plot.type = "transplot") ``` One can compare the loss value of `r round(fit_ordpca$f, 3)` from the ordinal solution with `r round(fit_metpca$f, 3)` from the metric solution and, in conjunction with the transformation plots, judge whether these Likert items can be treated as metric. ## Mixed PCA To demonstrate PCA with mixed input scale levels, we use a subset of the data based on the Wilson-Patterson conservatism scale from `MPsychoR` package. For illustration purposes we create age groups so that we have an ordinal variable in the dataset. ```{r} data("WilPat2") WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5) head(WilPat2) ``` The first 6 items are Wilson-Patterson items (1 = "approve", 0 = "disapprove", and 2 = "don't know") which we declare as "nominal". This is followed by `Country` which is also "nominal", and two self-reported liberalism/conservatism and left/right identification which enter as "metric". Finally we use an "ordinal" transformation for age. The corresponding `levelvec` object is then shipped to `princals()`. ```{r} levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal") wen_prin <- princals(WilPat2, levels = levelvec) ## fit 2D princals wen_prin summary(wen_prin) ``` Let us look at the transformation plots of some variables: ```{r, fig.height=7, fig.width=7} plot(wen_prin, plot.type = "transplot", var.subset = c(1, 8, 9, 11)) ``` Let's produce the joint plot as the main output which combines the loadings for the non-nominal variables with the category quantifications of the nominal variables. For aesthetic reasons we re-scale the loading arrows by a factor of 0.08. ```{r, fig.height=7, fig.width=7} plot(wen_prin, plot.type = "jointplot", expand = 0.08) ``` In case we are interested in the *object scores*, we can ask for an object plot: ```{r, fig.height=7, fig.width=7} plot(wen_prin, "objplot", cex.scores = 0.6) ``` A biplot could also be produced that combines the loadings plot with the object plot. Note that princals is a restricted version of homals: in princals, the category quantifications on the second dimension are linearly related to the ones from dimension 1, whereas in homals they are unrestricted. In the multiple correspondence analysis vignette we call `homals()` on the same data. Whether to use princals or homals on datasets like these, depends on what we are mainly interest in: category quantifications of the nominal variables (homals) or loadings of the non-nominal variables (princals). This said, as shown in Sec. 8.3.3 in @Mair:2018b, using the concept of *copies* one can obtain the same results in princals as with a homals fit. Eventually, it's all the same (Gifi) monster. Further technical details can be found in `vignette("gifi_theory")`. Elaborations on optimal scaling, further applied aspects, and examples are given in @Mair:2018b and in the excellent paper by @Linting+Meulman:2007.