---
title: "Introduction to CALMs"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 2
mathjax: default
vignette: >
%\VignetteIndexEntry{Introduction to CALMs}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "",
eval = TRUE
)
```
## Introduction
One of the most basic analyses in social sciences is the ability to compare groups. However, this seemingly straightforward task is often complicated by two significant
challenges: ensuring group equivalence and assessing measurement invariance. If not adequately addressed, these challenges can lead to misleading conclusions
and undermine the validity of research findings. This application aims to make the process accessible by integrating
several advanced statistical techniques including:
- Propensity score analysis (group equivalence checking and propensity score matching)
- Measurement invariance tests (full and partial)
- Structural invariance tests (full and partial)
## Launching CALMs
You can launch the CALMs in one of two ways:
### 1. Web Access (No Installation Required)
Simply visit: [evaluent.shinyapps.io/CALMs](https://evaluent.shinyapps.io/CALMs)
No setup or installation is needed. The application runs directly in your browser.
### 2. Local Access (Installation Required)
#### A. Open R or RStudio
Make sure R or R studio is installed on your system. Open the application to begin.
#### B. Install the CALMs Package
Run the following in the **R Console**:
```{r, eval = FALSE}
install.packages("calms",dependencies=TRUE)
```
This will install the CALMs package from a local tarball file rather than a CRAN repository to maintain author anonymity while under peer-review.
#### C. Run the CALMs Package
After installing the CALMs package, run the application locally via R with `run_calms()`:
```{r, eval = FALSE}
calms::run_calms()
```
## Built-in Dataset
Users can run CALMs analyses using the dataset built into the application. This built-in dataset is a subset of data from Work
Orientations IV – ISSP 2015 (ISSP Research Group, 2017) and is included with permission from the ISSP Research Group.
The subset and modifications applied to the original dataset were generated using the following code:
```{r, eval = FALSE}
###Load necessary packages
library(foreign)
library(haven)
### Read in data set without labels
dso <-
read.spss("ZA6770_v2-1-0.sav",
use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dso)
names(dso)
### Read in data set with labels
dsoa <-
read.spss("ZA6770_v2-1-0.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dsoa)
names(dsoa)
### Select only needed columns
#quality of job content (JC: v22-v24) and quality of work environment (WE: v25-v27)
#demographics:SEX,EMPREL,TYPORG2,DEGREE
ds<-subset(dso,select=c(country,v22:v27,SEX,DEGREE,EMPREL,TYPORG2))
names(ds)
ds[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]<-dsoa[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]
###Get data for the groups (i.e., countries)
#country numerical codes in SPSS: UK = 826, US = 840
table(ds$country)
ds<-subset(ds,(country=="GB-Great Britain and/or United Kingdom" | country=="US-United States"))
ds$country<-factor(ds$country)
table(ds$country)
nrow(ds)
###getting rid of missing values
nrow(ds)
ds<-na.omit(ds)
nrow(ds)
###check values
table(ds$SEX)
table(ds$DEGREE)
table(ds$EMPREL)
table(ds$TYPORG2)
table(ds$country)
levels(ds$EMPREL)<-c("Employee","Self-employed","Self-employed",NA)
levels (ds$DEGREE)<-c(rep("no univ",5),rep("univ",2))
###getting rid of missing values
nrow(ds)
ds<-na.omit(ds)
nrow(ds)
ds$SEX
levels(ds$SEX)
levels(ds$SEX)<-c(1,0) #Set "Male" to 1
levels(ds$EMPREL)
levels(ds$EMPREL)<-c(0,1) #Set "Employee" to 1
levels(ds$TYPORG2)
levels(ds$TYPORG2)<-c(0,1) #Set "Private employer" to 1
levels(ds$DEGREE)
levels(ds$DEGREE)<-c(0,1) #Set "univ" to 1
levels(ds$country)
levels(ds$country)<-c(1,0) #Set "US-United States" to 1
ds$SEX<-as.numeric(ds$SEX)-1
ds$EMPREL<-as.numeric(ds$EMPREL)-1
ds$TYPORG2<-as.numeric(ds$TYPORG2)-1
ds$DEGREE<-as.numeric(ds$DEGREE)-1
ds$country<-as.numeric(ds$country)-1
nrow(ds)
names(ds)
write_sav(ds,"WosDemo.sav")
```
## Providing Your Own Data
Users can run CALMs analyses on their own datasets. To do so, they must upload two files simultaneously from the same directory:
1. A data file containing the dataset to be analyzed.
2. A corresponding meta file that provides information about the dataset.
The CALMs application supports data files in .csv, .dat, and .sav formats.
The meta file must be a .csv and
be named such that the last four characters of the file name are "Meta" (e.g., My_Meta.csv, Meta.csv). The meta file must have the column names of *itemo*,
*item*, *type*, *scale*, *ds*, and *missing*.
- The column labeled *itemo* identifies the original variable names in the file containing the data to be analyzed.
- The column labeled *item* identifies corresponding new variable names that will be used when creating the cleaned dataset.
- The column labeled *type* identifies
whether the item is a scale item, covariate, or grouping variable.
- The column labeled *scale* identifies the name of the corresponding scale for scale items.
- The column labeled *ds* identifies the name of the dataset uploaded that will be cleaned and subsequently analyzed. Note that the user can upload a dataset with
any number of other variables in addition to those identified in the meta file.
- The column labeled *missing* identifies a numeric value that denotes how missing
values are coded.
A sample meta file is provided below that corresponds to a subset of the 2015 Work Orientations dataset (ISSP Research Group, 2017)
that is built into the CALMs application for demonstration purposes.
```{r, results="markup",echo=FALSE}
library(calms)
data("WosDemoMeta")
WosDemoMeta
```
## Features Overview
The CALMs Shiny application is organized with multiple tabs that each serve a specific purpose:
- Read Me - Displays user manual for the CALMs application.
- View Me - Displays demonstration video of the CALMs application.
- View Data – Displays the dataset selected.
- Downloads – Contains links to files that the user can download.
- Check Group Equivalency – Checks group equivalence of selected covariates.
- Propensity Score Analysis Setup – Provides default call to matchit which the user can override and create custom call.
- Propensity Score Analysis Results – Upon execution of analysis, provides results of propensity score analysis.
- Measurement Invariance – Upon execution of analysis, provides results of measurement invariance tests for the grouping variable and items to analyze selected. User may also override the default to use matched data for invariance tests.
- Metric Invariance – Upon execution of analysis, provides results of metric invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests and modify the alpha for model comparison decisions.
- Scalar Invariance - Upon execution of analysis, provides results of scalar invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests, modify the alpha for model comparison decisions, and select items with loadings to freely estimate.
- Structural Invariance - Upon execution of analysis, provides results of structural invariance tests for the grouping variable and items to analyze. User may also override the default to use matched data for invariance tests, select items with loadings to freely estimate, select items with intercepts to freely estimate, and select scales with means to freely estimate.
## Example Workflow
This section walks through the CALMs application interface using screenshots for illustrative purposes. Specifically, we analyze data from
the 2015 Work Orientations Survey that includes responses from the United States (USA) and the United Kingdom (UK; ISSP Research Group, 2017).
The 2015 Work Orientations dataset is from an international project that began in 1984 and was collected across 37 countries
(ISSP Research Group, 2017).
The portion of the 2015 Work Orientations dataset used for the demonstration includes 1,477
responses from the USA and 1,793 responses from the UK. We specifically chose the stated two countries because full scalar invariance
was not supported in previous measurement invariance studies using the constructs quality of job context (JX), quality of job content (JC),
and quality of work environment (WE) in the measurement model using the 1989 Work Orientations dataset (Cheung & Lau, 2012; Cheung & Rensvold,
1999).
The 2015 Work Orientations dataset provided data for two of these previously utilized constructs, JC and WE (ISSP Research Group, 2017).
Each construct is measured by three items, scored on a five-point Likert-type scale ranging from 1 (strongly agree) to 5 (strongly disagree).
Figure 1 depicts the 2-factor measurement model used in the illustrative example. What follows next is a recommend set of steps to comprehensively
analyze the latent means of JC and WE by country, where country is either USA or UK. Note that researchers may
choose to use the application in a different way that the example workflow and skip tests if that fits their research scenario.
**Figure 1.** *Measurement Model*
### Step 1: Load Data
Users can either use the built-in dataset by leaving Use 2015 Work Orientations Survey Data selected, or
upload their own data by deselecting this option.
To upload your own dataset and accompanying *Meta.csv file, follow the steps shown in the GIF below.
### Step 2: View Data
The labeling of the items in the original dataset (ISSP Research Group, 2017) was not intuitive for our illustrative example; hence, the original items were renamed as
previously described and as depicted in Figure 2.
**Figure 2.** *View Data Tab*
### Step 3: Check Group Equivalency
CALMs uses the MatchIt package in R (Ho et al., 2011) for propensity score analyis including checking for group equivalency. The comparison groups for the demonstration with the 2015 Work Orientations Survey data are the USA and the UK.
Hence, USA was selected as the **Grouping Variable**.
All possible covariates were selected as **Covariates to Check**.
**Figure 3.** *Check Group Equivalency Tab*
The output in Figure 3 shows that there are statistically (*p* < .05) and practically significant (Cramer’s V > .10)
differences in employment type and degree by country. Specifically, employment type (SelfEmp) was found to be statistically significant while organization type
(PrivateOrg) was found to be both statistically and practically significant.
### Step 4: Propensity Score Analysis Setup
CALMs offers users the flexibility to use either a default call to MatchIt or to define a custom call.
A link to the matchIt documentation is included within the application.
To customize the call, deselect **Use Default call to matchit** and edit the arugments following *data=dpsm* in the provided code box.
Two propensity score matching (PSM) methods, nearest neighbor and genetic matching, are the most common.
- Nearest neighbor matching requires the input of all demographic variables and
has been recommended as the most straightforward PSM method (Caliendo & Kopeinig, 2008; Keiffer & Lane, 2016). Although nearest neighbor is computationally
effective with large datasets, the method pairs each treated unit with its nearest matching control without considering fully optimizing matches. This may lead
to covariates not being optimally balanced with less equivalent groups, as compared to more stringent or robust matching methods.
- Genetic matching is recommended when PSM output is required to have highly equivalent groups as it effectively achieves good matching balance even with highly
complex data (Randolph et al., 2014). Genetic matching requires the input of all demographic variables (e.g., gender, age group, race/ethnicity, and educational
level) into an algorithm that results in statistically (e.g., *p* ≤ .05) and practically (e.g., Cramer’s V ≥ .10) significant differences.
By default, CALMs uses the nearest neighbor method.
**Figure 4.** *Propensity Score Analysis Setup Tab*
### Step 5: Propensity Score Analysis
Figure 5 presents the result of the propensity score analysis using the default call previously described.
**Figure 5.** *Propensity Score Analysis Results Tab*
The nearest neighbor method yielded two equivalent groups with 769 responses in each country. We observed that there were statistically significant
differences in gender (Male) by country and elected to use the results of the nearest neighbor method since there was not a practically
significant difference by country (all Cramer’s V < .10).
### Step 6: Measuremenent Invariance Tests
When conducting measurement invariance tests, the application defaults to using the matched dataset.
To change this, users can deselect **Use matched data for invariance tests**.
Users can also select the **Grouping Variable** and **Items to Analyze**. By default, the application
include all items identified in the `*`Meta.csv file as type *item*.
Measurement invariance tests include configural, metric, and scalar. Omnibus and
scale-level tests are provided for both metric and scale invariance tests. Commonly recommended fit indices criteria include: (a) comparative fit index (CFI) ≥ .95;
(b) standardized root-mean-square residuals (SRMR) ≤ .05; and (c)
root-mean-squared-error of approximation (RMSEA) .05 to .08 (Kline, 2016; Schumacker & Lomax, 2016).
Statistically significant model
noninvariance is determined based on the p-value of the χ² difference test at *p* ≤ .05 (Cheung & Rensvold, 1999; van de Schoot et al., 2012).
Guidelines have been provided to evaluate the ΔCFI for practical model (non)invariance, namely: (a) practical model invariance for ΔCFI ≥ -.01;
(b) potential practical model noninvariance for ΔCFI between -.01 and -.02; and (c) practical model noninvariance for ΔCFI ≤ -.02 (Cheung & Rensvold, 2002).
**Figure 6.** *Measurement Invariance Tab*
The results of measurement invariance tests (see Figure 6) indicated that the configural model showed good fit. This model indicated good fit with a SRMR = .031 and
CFI = .956. The metric model was compared to the configural model and met criteria for both statistical and practical
invariance (Δχ²[4] = 8.484, *p* = .075; ΔCFI = -.004).
However, the data did not reach the
thresholds for scalar invariance (Δχ²[4] = 55.772, p < .001; ΔCFI = -.042). Both the JC (Δχ²[2] = 24.996, p < .001; = -.019) and
WE (Δχ²[2] = 30.796, p < .001; ΔCFI = -.023) scales demonstrated evidence of scalar non-invariance.
### Step 7: Metric Invariance Tests
In our illustrative example, it was not necessary to conduct follow-up tests for metric invariance as neither the omnibus test
nor the scale-level tests for JC and WE indicated evidence of metric non-invariance.
However, for demonstration purposes, we conducted metric invariance tests specifically on the JC scale.
Note that the application uses the p-value of the χ² difference test when determining invariant subsets of items (Cheung & Rensvold, 1999).
The default significance level (alpha) is set to .05, but users may adjust this threshold as needed. In this example, we set the alpha to .01 (see Figure 7).
**Figure 7.** *Metric Invariance Tab*
The factor ratio test (see Figure 7) confirmed that all JC items were metric invariant. Similarly, all WE items were metric invariant (tests not shown).
### Step 8: Scalar Invariance Tests
Because full scalar invariance was not demonstrated, we conducted partial MI testing on each scale. Had we determined that the factor loadings were
non-invariant at the metric invariance assessment, we could have allowed a set of loadings to be freely estimated to allow for a partial scalar
invariance assessment.
**Figure 8.** *Scalar Invariance Tab*
The factor ratio test (see Figure 8) identified JC2 and JC3 as a invariant subset of JC items (p > .01). Similarly,
WE1 and WE2 were identified (tests not shown) as a invariant subset of WE items (p > .01).
Based on the results of the scalar invariance assessment, the intercepts for WE3 and JC1
should be freely estimated to account for the partial scalar invariance.
### Step 9: Structural Invariance Tests
Building on the results of the scalar invariance testing, we allowed the intercepts for WE3 and JC1 to be freely estimated. Structural invariance is given
when the comparison between an unconstrained and a constrained structural model yields a non-significant χ² difference (*p* > .05) and a non-significant CFI difference
(Cheung & Rensvold, 1999; Cheung & Rensvold, 2002; Kline, 2016; Schumacker & Lomax, 2016).
**Figure 9.** *Structural Invariance Tab*
The results indicate that the set of scales met the criteria for structural invariance. Although the structural invariance model was statistically significantly
different from the scalar model (Δχ²[2] = 6.374, *p* = .041), the difference was not practically significant (ΔCFI = -.004; see Figure 9).
However, considering only JC, a statistically significant latent mean difference was observed (-.077, *p* = .013). Given that the
latent mean for the USA was constrained to zero, the negative estimate indicates that the latent mean for the UK is lower in JC. There was no significant
latent mean difference for WE across the two countries (-.047, *p* = .326).
## References
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. *Journal of Economic Surveys, 22*(1), 31–72. https://doi.org/10.1111/j.1467-6419.2007.00527.x
Cheung, G. W., & Lau, R. S. (2012). A direct comparison approach for testing measurement invariance. *Organizational Research Methods, 15*(2), 167–198. https://doi.org/10.1177/1094428111421987
Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A reconceptualization and proposed new method. *Journal of Management, 25*(1), 1–27. https://doi.org/10.1177/014920639902500101
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. *Structural Equation Modeling, 9*(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5
Ho, D., Imai, K., King, G., & Stuart, E. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. *Journal of Statistical Software, 42*(8), 1–28. https://doi.org/10.18637/jss.v042.i08
ISSP Research Group (2017). *International social survey programme: Work orientations IV – ISSP 2015*. GESIS data archive, Cologne. ZA6770 data file version 2.1.0, https://doi.org/10.4232/1.12848
Keiffer, G. L., & Lane, F. C. (2016). Propensity score analysis: An alternative statistical approach for HRD researchers. *European Journal of Training and Development, 40*(8/9), 660–675. https://doi.org/10.1108/EJTD-06-2015-0046
Kline, R. B. (2016). *Principles and practice of structural equation modeling* (4th ed.). New York: The Guilford Press.
Randolph, J. J., Falbe, K., Manuel, A., & Balloun, J. (2014). A step-by-step guide to propensity score matching in R. *Practical Assessment, Research & Evaluation, 19*, 1–6. https://doi.org/10.7275/n3pv-tx27
Schumacker, R. E., & Lomax, R. G. (2016). *A beginner’s guide to structural equation modeling* (4th ed.). New York: Routledge.
van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. *European Journal of Developmental Psychology, 9*(4), 486–492. https://doi.org/10.1080/17405629.2012.686740