--- title: "Multiple Correspondence Analysis" author: "Patrick Mair, Jan De Leeuw" output: rmarkdown::html_vignette bibliography: gifi.bib vignette: > %\VignetteIndexEntry{Multiple Correspondence Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` This vignette shows an application of *multiple correspondence analysis* (MCA) on nominal and mixed data which in Gifi slang is called *homals*. To be more precise, homals is MCA with splines and ordinal restrictions. ## Multiple Correspondence Analysis in a Nutshell We start with using 6 items from the Wilson-Patterson scale (gay marriage, sexual freedom, gay adoption, gender quotas, affirmative action, and legalized marijuana), each of them with response categories 1 = "approve", 0 = "disapprove", and 2 = "don't know", and the country of the participant (India, Hungary). The full version of this dataset is available in the `MPsychoR` package [@Mair:2018b]. ```{r} library("Gifi") data("WilPat2") WP6 <- WilPat2[,1:7] head(WP6) ``` We have a sample size of `r nrow(WP6)` participants. To fit an two-dimensional MCA we can simply say ```{r} fit_homwp <- homals(WP6) fit_homwp ``` By default, `ndim = 2` and `levels = "nominal"`. The main output of an MCA is the symmetric map (aka joint plot) that shows the category quantifications in a 2D space. ```{r, fig.width=5, fig.height=5} plot(fit_homwp, plot.type = "jointplot") ``` Each item category gets a score in the 2D space. Categories belonging to the same item are presented in the same color. If we are are mostly interested in how the 2 countries are related to each other and to the item responses, we can color the plot accordingly. ```{r, fig.width=5, fig.height=5} colvec <- c(rep("gray", 6), "coral4") plot(fit_homwp, plot.type = "jointplot", col.points = colvec) ``` Other plotting options are scree plots (`plot.type = "screeplot"`), biplots (`plot.type = "biplot"`), as well as the object plot (`plot.type = "objplot"`) for the object scores. To get a more nuanced inside into the optimal scaling transformations of each variable, we can produce a transformation plot which shows the category transformations on both dimensions (D1 in "black", D2 in "red"). ```{r, fig.height=7, fig.width=7} plot(fit_homwp, plot.type = "transplot") ``` As opposed to princals, these transformations are not linearly restricted. Gifi calls this *multiple nominal*. ## Mixed Input Data In the Gifi system, MCA and PCA (and everything in-between) are essentially the same thing. In the princals vignette we demonstrate how to fit a mixed PCA. We now fit a mixed MCA on the same data. The only difference between these two Gifi incarnations is that transformations in princals are, by default, linearly restricted, whereas in homals they are not. In princals this restriction can be relaxed by the concept of *copies* so that both methods give the same results [see @Mair:2018b, Sec. 8.3.3]. To demonstrate MCA with mixed input scale levels, we use the same data as above but add two self-reported liberalism/conservatism and left/right identification which enter as "metric", gender ("nominal"), and age as "ordinal". ```{r} data("WilPat2") WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5) head(WilPat2) ``` ```{r} levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal") wen_hom <- homals(WilPat2, levels = levelvec) wen_hom ``` This result is different from the corresponding `princals(WilPat2, levels = levelvec)` fit. The usual plots can be produced but they are getting cluttered since each category gets its own score, also for variables like `LibCons`, `LeftRight`, and `Age`.