---
title: "Introduction to CALMs"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
    mathjax: default
vignette: >
  %\VignetteIndexEntry{Introduction to CALMs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "",
  eval = TRUE
)
```

## Introduction

One of the most basic analyses in social sciences is the ability to compare groups. However, this seemingly straightforward task is often complicated by two significant
 challenges: ensuring group equivalence and assessing measurement invariance. If not adequately addressed, these challenges can lead to misleading conclusions
 and undermine the validity of research findings. This application aims to make the process accessible by integrating
 several advanced statistical techniques including:

- Propensity score analysis (group equivalence checking and propensity score matching)
- Measurement invariance tests (full and partial)
- Structural invariance tests (full and partial)

## Launching CALMs

You can launch the CALMs in one of two ways:

### 1. Web Access (No Installation Required)

Simply visit: [evaluent.shinyapps.io/CALMs](https://evaluent.shinyapps.io/CALMs)

No setup or installation is needed. The application runs directly in your browser.

### 2. Local Access (Installation Required) 

#### A. Open R or RStudio

Make sure R or R studio is installed on your system. Open the application to begin.

#### B. Install the CALMs Package

Run the following in the **R Console**:

```{r, eval = FALSE}
install.packages("calms",dependencies=TRUE)
```

This will install the CALMs package from a local tarball file rather than a CRAN repository to maintain author anonymity while under peer-review.

#### C. Run the CALMs Package

After installing the CALMs package, run the application locally via R with `run_calms()`:

```{r, eval = FALSE}
calms::run_calms()
```

## Built-in Dataset
Users can run CALMs analyses using the dataset built into the application. This built-in dataset is a subset of data from Work 
Orientations IV – ISSP 2015 (ISSP Research Group, 2017) and is included with permission from the ISSP Research Group. 
The subset and modifications applied to the original dataset were generated using the following code:

```{r, eval = FALSE}
###Load necessary packages
library(foreign)
library(haven)

### Read in data set without labels
dso <- 
  read.spss("ZA6770_v2-1-0.sav", 
  use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dso)
names(dso)

### Read in data set with labels
dsoa <- 
  read.spss("ZA6770_v2-1-0.sav", 
  use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dsoa)
names(dsoa)

### Select only needed columns
#quality of job content (JC: v22-v24) and quality of work environment (WE: v25-v27)
#demographics:SEX,EMPREL,TYPORG2,DEGREE
ds<-subset(dso,select=c(country,v22:v27,SEX,DEGREE,EMPREL,TYPORG2))
names(ds)

ds[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]<-dsoa[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]

###Get data for the groups (i.e., countries)
#country numerical codes in SPSS: UK = 826, US = 840
table(ds$country)
ds<-subset(ds,(country=="GB-Great Britain and/or United Kingdom" | country=="US-United States"))
ds$country<-factor(ds$country)
table(ds$country)
nrow(ds)

###getting rid of missing values 
nrow(ds)
ds<-na.omit(ds)
nrow(ds)

###check values
table(ds$SEX)
table(ds$DEGREE)
table(ds$EMPREL)
table(ds$TYPORG2)
table(ds$country)

levels(ds$EMPREL)<-c("Employee","Self-employed","Self-employed",NA)
levels (ds$DEGREE)<-c(rep("no univ",5),rep("univ",2))

###getting rid of missing values 
nrow(ds)
ds<-na.omit(ds)
nrow(ds)

ds$SEX
levels(ds$SEX)
levels(ds$SEX)<-c(1,0)     #Set "Male" to 1
 
levels(ds$EMPREL)
levels(ds$EMPREL)<-c(0,1)  #Set "Employee" to 1

levels(ds$TYPORG2)
levels(ds$TYPORG2)<-c(0,1) #Set "Private employer" to 1
 
levels(ds$DEGREE)
levels(ds$DEGREE)<-c(0,1)  #Set "univ" to 1

levels(ds$country)
levels(ds$country)<-c(1,0) #Set "US-United States" to 1
 
ds$SEX<-as.numeric(ds$SEX)-1
ds$EMPREL<-as.numeric(ds$EMPREL)-1
ds$TYPORG2<-as.numeric(ds$TYPORG2)-1
ds$DEGREE<-as.numeric(ds$DEGREE)-1
ds$country<-as.numeric(ds$country)-1

nrow(ds)
names(ds)

write_sav(ds,"WosDemo.sav")
```

## Providing Your Own Data

Users can run CALMs analyses on their own datasets. To do so, they must upload two files simultaneously from the same directory:

1. A data file containing the dataset to be analyzed.
2. A corresponding meta file that provides information about the dataset.

The CALMs application supports data files in .csv, .dat, and .sav formats.

The meta file must be a .csv and 
be named such that the last four characters of the file name are "Meta" (e.g., My_Meta.csv, Meta.csv). The meta file must have the column names of *itemo*, 
*item*, *type*, *scale*, *ds*, and *missing*. 

- The column labeled *itemo* identifies the original variable names in the file containing the data to be analyzed. 
- The column labeled *item* identifies corresponding new variable names that will be used when creating the cleaned dataset. 
- The column labeled *type* identifies 
whether the item is a scale item, covariate, or grouping variable. 
- The column labeled *scale* identifies the name of the corresponding scale for scale items. 
- The column labeled *ds* identifies the name of the dataset uploaded that will be cleaned and subsequently analyzed. Note that the user can upload a dataset with 
any number of other variables in addition to those identified in the meta file. 
- The column labeled *missing* identifies a numeric value that denotes how missing 
values are coded. 

A sample meta file is provided below that corresponds to a subset of the 2015 Work Orientations dataset (ISSP Research Group, 2017)
that is built into the CALMs application for demonstration purposes.

```{r, results="markup",echo=FALSE}
library(calms)
data("WosDemoMeta")
WosDemoMeta
```
## Features Overview

The CALMs Shiny application is organized with multiple tabs that each serve a specific purpose:

- Read Me - Displays user manual for the CALMs application. 
- View Me - Displays demonstration video of the CALMs application. 
- View Data – Displays the dataset selected.
- Downloads – Contains links to files that the user can download.
- Check Group Equivalency – Checks group equivalence of selected covariates. 
- Propensity Score Analysis Setup – Provides default call to matchit which the user can override and create custom call. 
- Propensity Score Analysis Results – Upon execution of analysis, provides results of propensity score analysis.
- Measurement Invariance – Upon execution of analysis, provides results of measurement invariance tests for the grouping variable and items to analyze selected. User may also override the default to use matched data for invariance tests.
- Metric Invariance – Upon execution of analysis, provides results of metric invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests and modify the alpha for model comparison decisions.
- Scalar Invariance - Upon execution of analysis, provides results of scalar invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests, modify the alpha for model comparison decisions, and select items with loadings to freely estimate.  
- Structural Invariance - Upon execution of analysis, provides results of structural invariance tests for the grouping variable and  items to analyze. User may also override the default to use matched data for invariance tests, select items with  loadings to freely estimate, select items with intercepts to freely estimate, and select scales with means to freely estimate.

## Example Workflow

This section walks through the CALMs application interface using screenshots for illustrative purposes. Specifically, we analyze data from 
the 2015 Work Orientations Survey that includes responses from the United States (USA) and the United Kingdom (UK; ISSP Research Group, 2017).
 The 2015 Work Orientations dataset is from an international project that began in 1984 and was collected across 37 countries
 (ISSP Research Group, 2017). 

The portion of the 2015 Work Orientations dataset used for the demonstration includes 1,477 
responses from the USA and 1,793 responses from the UK. We specifically chose the stated two countries because full scalar invariance 
was not supported in previous measurement invariance studies using the constructs quality of job context (JX), quality of job content (JC), 
and quality of work environment (WE) in the measurement model using the 1989 Work Orientations dataset (Cheung & Lau, 2012; Cheung & Rensvold,
 1999). 

The 2015 Work Orientations dataset provided data for two of these previously utilized constructs, JC and WE (ISSP Research Group, 2017).
 Each construct is measured by three items, scored on a five-point Likert-type scale ranging from 1 (strongly agree) to 5 (strongly disagree).
Figure 1 depicts the 2-factor measurement model used in the illustrative example. What follows next is a recommend set of steps to comprehensively
analyze the latent means of JC and WE by country, where country is either USA or UK. Note that researchers may
choose to use the application in a different way that the example workflow and skip tests if that fits their research scenario.

<img src="images/Figure1.png" style="border: 1px solid black; padding: 5px;" width="300">

**Figure 1.** *Measurement Model* 

### Step 1: Load Data

Users can either use the built-in dataset by leaving Use 2015 Work Orientations Survey Data selected, or 
upload their own data by deselecting this option.

To upload your own dataset and accompanying *Meta.csv file, follow the steps shown in the GIF below.

<img src="images/CALMs.gif" style="border: 1px solid black; padding: 5px;" width="1000">

### Step 2: View Data

The labeling of the items in the original dataset (ISSP Research Group, 2017) was not intuitive for our illustrative example; hence, the original items were renamed as 
previously described and as depicted in Figure 2. 

<img src="images/Figure2.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 2.** *View Data Tab* 

### Step 3: Check Group Equivalency

CALMs uses the MatchIt package in R (Ho et al., 2011) for propensity score analyis including checking for group equivalency. The comparison groups for the demonstration with the 2015 Work Orientations Survey data are the USA and the UK. 
Hence, USA was selected as the **Grouping Variable**. 
All possible covariates were selected as **Covariates to Check**.

<img src="images/Figure3.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 3.** *Check Group Equivalency Tab* 

The output in Figure 3 shows that there are statistically (*p* < .05) and practically significant (Cramer’s V > .10) 
differences in employment type and degree by country. Specifically, employment type (SelfEmp) was found to be statistically significant while organization type 
(PrivateOrg) was found to be both statistically and practically significant.

### Step 4: Propensity Score Analysis Setup 

CALMs offers users the flexibility to use either a default call to MatchIt or to define a custom call. 
A link to the matchIt documentation is included within the application.

To customize the call, deselect **Use Default call to matchit** and edit the arugments following *data=dpsm* in the provided code box.

Two propensity score matching (PSM) methods, nearest neighbor and genetic matching, are the most common. 

- Nearest neighbor matching requires the input of all demographic variables and 
has been recommended as the most straightforward PSM method (Caliendo & Kopeinig, 2008; Keiffer & Lane, 2016). Although nearest neighbor is computationally 
effective with large datasets, the method pairs each treated unit with its nearest matching control without considering fully optimizing matches. This may lead 
to covariates not being optimally balanced with less equivalent groups, as compared to more stringent or robust matching methods. 

- Genetic matching is recommended when PSM output is required to have highly equivalent groups as it effectively achieves good matching balance even with highly 
complex data (Randolph et al., 2014). Genetic matching requires the input of all demographic variables (e.g., gender, age group, race/ethnicity, and educational 
level) into an algorithm that results in statistically (e.g., *p* ≤ .05) and practically (e.g., Cramer’s V ≥ .10) significant differences.

By default, CALMs uses the nearest neighbor method. 

<img src="images/Figure4.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 4.** *Propensity Score Analysis Setup Tab* 

### Step 5: Propensity Score Analysis

Figure 5 presents the result of the propensity score analysis using the default call previously described.

<img src="images/Figure5.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 5.** *Propensity Score Analysis Results Tab* 

The nearest neighbor method yielded two equivalent groups with 769 responses in each country. We observed that there were statistically significant 
differences in gender (Male) by country and elected to use the results of the nearest neighbor method since there was not a practically 
significant difference by country (all Cramer’s V < .10).

### Step 6: Measuremenent Invariance Tests

When conducting measurement invariance tests, the application defaults to using the matched dataset. 

To change this, users can deselect **Use matched data for invariance tests**. 

Users can also select the **Grouping Variable** and **Items to Analyze**. By default, the application
include all items identified in the `*`Meta.csv file as type *item*. 

Measurement invariance tests include configural, metric, and scalar. Omnibus and
scale-level tests are provided for both metric and scale invariance tests. Commonly recommended fit indices criteria include: (a) comparative fit index (CFI) ≥ .95; 
(b) standardized root-mean-square residuals (SRMR) ≤ .05; and (c) 
root-mean-squared-error of approximation (RMSEA) .05 to .08 (Kline, 2016; Schumacker & Lomax, 2016). 

Statistically significant model 
noninvariance is determined based on the p-value of the χ² difference test at *p* ≤ .05 (Cheung & Rensvold, 1999; van de Schoot et al., 2012). 
Guidelines have been provided to evaluate the ΔCFI for practical model (non)invariance, namely: (a) practical model invariance for ΔCFI ≥ -.01; 
(b) potential practical model noninvariance for ΔCFI between -.01 and -.02; and (c) practical model noninvariance for ΔCFI ≤ -.02 (Cheung & Rensvold, 2002). 

<img src="images/Figure6.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 6.** *Measurement Invariance Tab* 

The results of measurement invariance tests (see Figure 6) indicated that the configural model showed good fit. This model indicated good fit with a SRMR = .031 and 
CFI = .956. The metric model was compared to the configural model and met criteria for both statistical and practical 
invariance (Δχ²[4] = 8.484, *p* = .075; ΔCFI = -.004).
 However, the data did not reach the 
thresholds for scalar invariance (Δχ²[4] = 55.772, p < .001; ΔCFI = -.042). Both the JC (Δχ²[2] = 24.996, p < .001;  = -.019) and 
WE (Δχ²[2] = 30.796, p < .001; ΔCFI = -.023) scales demonstrated evidence of scalar non-invariance.

### Step 7: Metric Invariance Tests

In our illustrative example, it was not necessary to conduct follow-up tests for metric invariance as neither the omnibus test 
nor the scale-level tests for JC and WE indicated evidence of metric non-invariance. 

However, for demonstration purposes, we conducted metric invariance tests specifically on the JC scale.

Note that the application uses the p-value of the χ² difference test when determining invariant subsets of items (Cheung & Rensvold, 1999).
The default significance level (alpha) is set to .05, but users may adjust this threshold as needed. In this example, we set the alpha to .01 (see Figure 7). 

<img src="images/Figure7.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 7.** *Metric Invariance Tab* 

The factor ratio test (see Figure 7) confirmed that all JC items were metric invariant. Similarly, all WE items were metric invariant (tests not shown).

### Step 8: Scalar Invariance Tests

Because full scalar invariance was not demonstrated, we conducted partial MI testing on each scale. Had we determined that the factor loadings were 
non-invariant at the metric invariance assessment, we could have allowed a set of loadings to be freely estimated to allow for a partial scalar 
invariance assessment. 

<img src="images/Figure8.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 8.** *Scalar Invariance Tab* 

The factor ratio test (see Figure 8) identified JC2 and JC3 as a invariant subset of JC items (p > .01). Similarly,
WE1 and WE2 were identified (tests not shown) as a invariant subset of WE items (p > .01). 

Based on the results of the scalar invariance assessment, the intercepts for WE3 and JC1 
should be freely estimated to account for the partial scalar invariance.

### Step 9: Structural Invariance Tests

Building on the results of the scalar invariance testing, we allowed the intercepts for WE3 and JC1 to be freely estimated. Structural invariance is given 
when the comparison between an unconstrained and a constrained structural model yields a non-significant χ² difference (*p* > .05) and a non-significant CFI difference 
(Cheung & Rensvold, 1999; Cheung & Rensvold, 2002; Kline, 2016; Schumacker & Lomax, 2016). 

<img src="images/Figure9.png" style="border: 1px solid black; padding: 5px;" width="1000">

**Figure 9.** *Structural Invariance Tab* 

 The results indicate that the set of scales met the criteria for structural invariance. Although the structural invariance model was statistically significantly 
different from the scalar model (Δχ²[2] = 6.374, *p* = .041), the difference was not practically significant (ΔCFI = -.004; see Figure 9).
However, considering only JC, a statistically significant latent mean difference was observed (-.077, *p* = .013). Given that the 
latent mean for the USA was constrained to zero, the negative estimate indicates that the latent mean for the UK is lower in JC. There was no significant 
latent mean difference for WE across the two countries (-.047, *p* = .326). 


## References

Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. *Journal of Economic Surveys, 22*(1), 31–72. https://doi.org/10.1111/j.1467-6419.2007.00527.x

Cheung, G. W., & Lau, R. S. (2012). A direct comparison approach for testing measurement invariance. *Organizational Research Methods, 15*(2), 167–198. https://doi.org/10.1177/1094428111421987

Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A reconceptualization and proposed new method. *Journal of Management, 25*(1), 1–27. https://doi.org/10.1177/014920639902500101

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. *Structural Equation Modeling, 9*(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5

Ho, D., Imai, K., King, G., & Stuart, E. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. *Journal of Statistical Software, 42*(8), 1–28. https://doi.org/10.18637/jss.v042.i08

ISSP Research Group (2017). *International social survey programme: Work orientations IV – ISSP 2015*. GESIS data archive, Cologne. ZA6770 data file version 2.1.0, https://doi.org/10.4232/1.12848

Keiffer, G. L., & Lane, F. C. (2016). Propensity score analysis: An alternative statistical approach for HRD researchers. *European Journal of Training and Development, 40*(8/9), 660–675. https://doi.org/10.1108/EJTD-06-2015-0046

Kline, R. B. (2016). *Principles and practice of structural equation modeling* (4th ed.). New York: The Guilford Press.

Randolph, J. J., Falbe, K., Manuel, A., & Balloun, J. (2014). A step-by-step guide to propensity score matching in R. *Practical Assessment, Research & Evaluation, 19*, 1–6. https://doi.org/10.7275/n3pv-tx27

Schumacker, R. E., & Lomax, R. G. (2016). *A beginner’s guide to structural equation modeling* (4th ed.). New York: Routledge.

van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. *European Journal of Developmental Psychology, 9*(4), 486–492. https://doi.org/10.1080/17405629.2012.686740