In this example, we will show how to use lslx to conduct
semi-confirmatory factor analysis with missing data. The example uses
data HolzingerSwineford1939 in the package
lavaan. Hence, lavaan must be installed.
Because HolzingerSwineford1939 doesn’t contain missing
values, we use the code in semTools to create
NA (see the example of twostage() function in
semTools).
data_miss <- lavaan::HolzingerSwineford1939
data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3), 
                       NA, data_miss$x5)
data_miss$age <- data_miss$ageyr + data_miss$agemo/12
data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3), 
                       NA, data_miss$x9)By the construction, we can see that the missingness of
x5 depends on the value of x1 and the
missingness of x9 relies on the age variable.
Note that age is created by ageyr and
agemo. Since ageyr and agemo are
not the variables that we are interested, the two variables are treated
as auxiliary in the later analysis.
A usual confirmatory factor analysis (CFA) model is specified.
model_miss <- "visual  :=> x1 + x2 + x3
               textual :=> x4 + x5 + x6
               speed   :=> x7 + x8 + x9
               visual  <=> 1 * visual
               textual <=> 1 * textual
               speed   <=> 1 * speed"Here, 1 before * will be interpreted as
fix(1). To initialize an lslx object with
auxiliary variables, we need to specify the
auxiliary_variable argument. The
auxiliary_variable argument only accepts numeric variables.
If any categorical variable is considered as a valid auxiliary variable,
user should transform it as a set of dummy variables first. One possible
method is using model.matrix function.
library(lslx)
lslx_miss <- lslx$new(model = model_miss, data = data_miss,
                      auxiliary_variable = c("ageyr", "agemo"))An 'lslx' R6 class is initialized via 'data' argument. 
  Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9 
  Latent Factors: visual textual speed 
  Auxiliary Variables: ageyr agemo Because the specified CFA might not fit the data well, we add a
correlated residual structure to the model by
$penalize_block()
lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE)The code penalizes all the coefficients in y<->y
block with fixed parameter type. Note that this model is not identified
under the usual SEM framework. PL method can still estimate it because
the penalty function introduces additional constraints on parameters.
However, we don’t recommend using such type of model because it is
difficult to be interpreted.
So far, the specified auxiliary variables are only stored in
lslx object. They are actually used after implementing the
$fit() related methods.
lslx_miss$fit_lasso()CONGRATS: Algorithm converges under EVERY specified penalty level.
  Specified Tolerance for Convergence: 0.001 
  Specified Maximal Number of Iterations: 100 By default, fit related methods implement two-step
method (possibly with auxiliary variables) for handling missing values.
User can specify the missing method explicitly via
missing_method argument. Another missing method in the
current version is listwise deletion. However, listwise deletion has no
theoretical advantages over the two-step method.
The following code summarizes the fitting result under the penalty
level selected by a Robust version of Akaike information criterion
(RAIC). The number of missing patterns shows how many
missing patterns present in the data set (include the complete pattern).
If the lslx object is initialized via raw data, by default,
a corrected sandwich standard error will be used for coefficient test.
The correction is based on the asymptotic covariance of saturated
moments derived by full information maximum likelihood. Also, the mean
adjusted likelihood ratio test is based on this quantity. For the
reference, please see the section of Missing Data in
?lslx.
lslx_miss$summarize(selector = "raic")General Information                                                             
   number of observations                             301.000
   number of complete observations                    138.000
   number of missing patterns                           4.000
   number of groups                                     1.000
   number of responses                                  9.000
   number of factors                                    3.000
   number of free coefficients                         30.000
   number of penalized coefficients                    36.000
Numerical Conditions                                                             
   selected lambda                                      0.130
   selected delta                                        none
   selected step                                         none
   objective value                                      0.168
   objective gradient absolute maximum                  0.001
   objective Hessian convexity                          0.741
   number of iterations                                 6.000
   loss value                                           0.051
   number of non-zero coefficients                     41.000
   degrees of freedom                                  13.000
   robust degrees of freedom                           16.174
   scaling factor                                       1.244
Fit Indices                                                             
   root mean square error of approximation (rmsea)      0.025
   comparative fit index (cfi)                          0.997
   non-normed fit index (nnfi)                          0.992
   standardized root mean of residual (srmr)            0.032
Likelihood Ratio Test
                    statistic         df    p-value
   unadjusted          15.434     13.000      0.281
   mean-adjusted       12.405     13.000      0.495
Root Mean Square Error of Approximation Test
                     estimate      lower      upper
   unadjusted           0.025      0.000      0.071
   mean-adjusted        0.000      0.000      0.069
Coefficient Test (Std.Error = "sandwich")
  Factor Loading
                      type  estimate  std.error  z-value  P(>|z|)  lower  upper
       x1<-visual     free     0.921      0.092    9.994    0.000  0.740  1.101
       x2<-visual     free     0.441      0.079    5.560    0.000  0.286  0.597
       x3<-visual     free     0.607      0.070    8.631    0.000  0.469  0.745
      x4<-textual     free     0.974      0.063   15.585    0.000  0.852  1.097
      x5<-textual     free     1.057      0.063   16.738    0.000  0.933  1.180
      x6<-textual     free     0.920      0.059   15.688    0.000  0.805  1.035
        x7<-speed     free     0.438      0.094    4.669    0.000  0.254  0.622
        x8<-speed     free     0.526      0.093    5.632    0.000  0.343  0.709
        x9<-speed     free     0.799      0.092    8.665    0.000  0.619  0.980
  Covariance
                      type  estimate  std.error  z-value  P(>|z|)  lower  upper
 textual<->visual     free     0.486      0.075    6.504    0.000  0.340  0.633
   speed<->visual     free     0.567      0.077    7.406    0.000  0.417  0.718
  speed<->textual     free     0.243      0.089    2.731    0.006  0.069  0.417
          x2<->x1      pen     0.000        -        -        -      -      -  
          x3<->x1      pen     0.000        -        -        -      -      -  
          x4<->x1      pen     0.048      0.050    0.958    0.338 -0.050  0.147
          x5<->x1      pen    -0.112      0.089   -1.253    0.210 -0.286  0.063
          x6<->x1      pen     0.000        -        -        -      -      -  
          x7<->x1      pen    -0.080      0.063   -1.261    0.207 -0.204  0.044
          x8<->x1      pen     0.000        -        -        -      -      -  
          x9<->x1      pen     0.000        -        -        -      -      -  
          x3<->x2      pen     0.103      0.074    1.386    0.166 -0.043  0.249
          x4<->x2      pen     0.000        -        -        -      -      -  
          x5<->x2      pen     0.000        -        -        -      -      -  
          x6<->x2      pen     0.000        -        -        -      -      -  
          x7<->x2      pen    -0.104      0.057   -1.817    0.069 -0.217  0.008
          x8<->x2      pen     0.000        -        -        -      -      -  
          x9<->x2      pen     0.000        -        -        -      -      -  
          x4<->x3      pen     0.000        -        -        -      -      -  
          x5<->x3      pen    -0.079      0.060   -1.316    0.188 -0.196  0.039
          x6<->x3      pen     0.000        -        -        -      -      -  
          x7<->x3      pen     0.000        -        -        -      -      -  
          x8<->x3      pen     0.000        -        -        -      -      -  
          x9<->x3      pen     0.000        -        -        -      -      -  
          x5<->x4      pen     0.000        -        -        -      -      -  
          x6<->x4      pen     0.000        -        -        -      -      -  
          x7<->x4      pen     0.079      0.043    1.851    0.064 -0.005  0.163
          x8<->x4      pen     0.000        -        -        -      -      -  
          x9<->x4      pen     0.000        -        -        -      -      -  
          x6<->x5      pen     0.000        -        -        -      -      -  
          x7<->x5      pen     0.000        -        -        -      -      -  
          x8<->x5      pen     0.000        -        -        -      -      -  
          x9<->x5      pen     0.008      0.065    0.120    0.904 -0.120  0.136
          x7<->x6      pen     0.000        -        -        -      -      -  
          x8<->x6      pen     0.013      0.042    0.300    0.764 -0.070  0.095
          x9<->x6      pen    -0.017      0.048   -0.354    0.723 -0.111  0.077
          x8<->x7      pen     0.253      0.072    3.531    0.000  0.113  0.394
          x9<->x7      pen     0.000        -        -        -      -      -  
          x9<->x8      pen     0.000        -        -        -      -      -  
  Variance
                      type  estimate  std.error  z-value  P(>|z|)  lower  upper
  visual<->visual    fixed     1.000        -        -        -      -      -  
textual<->textual    fixed     1.000        -        -        -      -      -  
    speed<->speed    fixed     1.000        -        -        -      -      -  
          x1<->x1     free     0.473      0.143    3.302    0.001  0.192  0.754
          x2<->x2     free     1.150      0.108   10.683    0.000  0.939  1.360
          x3<->x3     free     0.878      0.077   11.469    0.000  0.728  1.028
          x4<->x4     free     0.387      0.053    7.305    0.000  0.284  0.491
          x5<->x5     free     0.410      0.064    6.435    0.000  0.285  0.535
          x6<->x6     free     0.348      0.046    7.496    0.000  0.257  0.439
          x7<->x7     free     0.930      0.086   10.859    0.000  0.762  1.098
          x8<->x8     free     0.719      0.090    8.020    0.000  0.543  0.895
          x9<->x9     free     0.334      0.137    2.438    0.015  0.066  0.603
  Intercept
                      type  estimate  std.error  z-value  P(>|z|)  lower  upper
            x1<-1     free     4.936      0.067   73.473    0.000  4.804  5.067
            x2<-1     free     6.088      0.068   89.855    0.000  5.955  6.221
            x3<-1     free     2.250      0.065   34.579    0.000  2.123  2.378
            x4<-1     free     3.061      0.067   45.694    0.000  2.930  3.192
            x5<-1     free     4.420      0.093   47.369    0.000  4.237  4.603
            x6<-1     free     2.186      0.063   34.667    0.000  2.062  2.309
            x7<-1     free     4.186      0.063   66.766    0.000  4.063  4.309
            x8<-1     free     5.527      0.058   94.854    0.000  5.413  5.641
            x9<-1     free     5.366      0.074   72.693    0.000  5.221  5.510