Type: | Package |
Title: | Graphical Models with Latent Variables |
Version: | 1.1 |
Date: | 2020-11-29 |
Author: | Yanxin Jin, Samantha Yang, Kean Ming Tan |
Maintainer: | Yanxin Jin <yanxinj@umich.edu> |
Description: | Three methods are provided to estimate graphical models with latent variables: (1) Jin, Y., Ning, Y., and Tan, K. M. (2020) (preprint available); (2) Chandrasekaran, V., Parrilo, P. A. & Willsky, A. S. (2012) <doi:10.1214/11-AOS949>; (3) Tan, K. M., Ning, Y., Witten, D. M. & Liu, H. (2016) <doi:10.1093/biomet/asw050>. |
License: | GPL-3 |
Imports: | Rcpp, pracma, glmnet, MASS, stats |
LinkingTo: | Rcpp, RcppArmadillo |
NeedsCompilation: | yes |
Packaged: | 2020-12-10 07:13:44 UTC; yanxinj |
Repository: | CRAN |
Date/Publication: | 2020-12-10 09:10:09 UTC |
Graphical Models with Latent Variables and Correlated Replicates
Description
Estimate graphical models with latent variables and correlated replicates using the method in Jin et al. (2020).
Usage
corlatent(data, accuracy, n, R, p, lambda1, lambda2, lambda3, distribution = "Gaussian",
rule = "AND")
Arguments
data |
data set. Can be a matrix, list, array, or data frame. If the data set is a matrix, it should have |
accuracy |
the threshhold where algorithm stops. The algorithm stops when the difference between estimaters of the |
n |
the number of observations. |
R |
the number of replicates for each observation. |
p |
the number of observed variables. |
lambda1 |
tuning parameter that encourages estimated graph to be sparse. |
lambda2 |
tuning parameter that models the effects of correlated replicates. Usually set to be equal to lambda1. |
lambda3 |
tuning parameter that encourages the latent effect to be piecewise constants. |
distribution |
For a data set with Gaussian distribution, use "Gaussian"; For a data set with Ising distribution, use "Ising". Default is "Gaussian". |
rule |
rules to combine matrices that encode the conditional dependence relationships between sets of two observed variables. Options are "AND" and "OR". Default is "AND". |
Details
The corlatent method has two assumptions. Assumption 1 states that the R
replicates are assumed to follow a one-lag vector autoregressive model, conditioned on the latent variables.
Assumption 2 states that the latent variables are piecewise constant across replicates.
Based on these two assumptions, the method solve the following problem for 1 \le j \le p
.
\min_{\theta_{j,-j}, \alpha_j, \Delta_j} \{ -\frac{1}{nR}l(\theta_{j,-j}, \alpha_j, \Delta_j) + \lambda\|\theta_{j,-j}\|_1 + \beta\|\alpha_j\|_1 + \gamma\|(I_n \otimes C)\Delta_j\|_1 \},
where l(\theta_{j,-j}, \alpha_j, \Delta_j)
is the log likelihood function, \theta_{j,-j}
encodes the conditional dependence relationships between j
th observed variable and the other observed variables, \alpha_j
models the correlation among replicates, \Delta_j
encodes the latent effect, \lambda
, \beta
, \gamma
are the tuning parameters, I_n
is an n-dimensional identity matrix and C
is the discrete first derivative matrix where the i
th and (i+1)
th column of every ith row are -1 and 1, respectively.
This method aims at modeling exponential family graphical models with correlated replicates and latent variables.
Value
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalties |
the penalty values |
References
Jin, Y., Ning, Y., and Tan, K. M. (2020), ‘Exponential Family Graphical Models with Correlated Replicates and Unmeasured Confounders’, preprint available.
Examples
# Gaussian distribution with "AND" rule
n <- 20
R <- 10
p <- 5
l <- 2
s <- 2
seed <- 1
data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9,
sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X
result <- corlatent(data, accuracy = 1e-6, n, R, p,lambda1 = 0.1, lambda2 = 0.1,
lambda3 = 1e+5,distribution = "Gaussian", rule = "AND")
Generate a Gaussian distributed data set
Description
This function will generate a Gaussian distributed data set with latent variables and correlated replicates.
Usage
generate_Gaussian(n, R, p, l, s, sparsityA, sparsityobserved, sparsitylatent, lwb,
upb, seed)
Arguments
n |
the number of observations. |
R |
the number of replicates. |
p |
the number of observed variables. |
l |
the number of latent variables. |
s |
latent effects are generated as |
sparsityA |
proportion of the number of zeros in the transition matrix |
sparsityobserved |
proportion of the number of zeros in the inverse covariance of the observed variables. Must be a number from 0 to 1. |
sparsitylatent |
proportion of the number of zeros in the inverse covariances among latent variables and between observed variables and latent variables. Must be a number from 0 to 1. |
lwb |
lower bound for the elements in the inverse covariance matrix. |
upb |
upper bound for the elements in the inverse covariance matrix. |
seed |
the seed for the random number generator. |
Details
This function aims to generate a Gaussian distributed data set with latent variables and correlated replicates. For each observation, the latent variables are piecewise constant across replicates, and conditioned on the latent variables, the replicates follow a one-lag vector autoregressive model, where the transition matrix A
is sparse with non-zero elements set equal to 0.3.
The matrix \Sigma
is the covariance matrix for the observed variables X and the latent variables U
, and we partition \Sigma
into matrices that quantify the relationships among the observed variables (\Sigma_{XX}
), between the observed variables and latent variables (\Sigma_{XU}
or \Sigma_{UX}
), and of the latent variables (\Sigma_{UU}
).
In general, the data is generated with:
X_{i1} | U_{i1} \sim N_p(\Sigma_{XU}\Sigma^{-1}_{UU} U_{i1}, \Sigma_{XX} - \Sigma_{XU}\Sigma^{-1}_{UU}\Sigma_{UX}),
X_{it} | X_{i(t-1)},U_{it} \sim N_p(AX_{i(t-1)} + \Sigma_{XU}\Sigma^{-1}_{UU} U_{it}, \Sigma_{XX} - \Sigma_{XU}\Sigma^{-1}_{UU}\Sigma_{UX}),
where 1 \le i \le n
and 1 \le t \le R
.
Value
X |
the generated data, which is a list with |
truegraph |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
References
Jin, Y., Ning, Y., and Tan, K. M. (2020), ‘Exponential Family Graphical Models with Correlated Replicates and Unmeasured Confounders’, preprint available.
Examples
data <- generate_Gaussian(n = 50, R = 20, p = 30, l = 2, s = 2, sparsityA = 0.95,
sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)
Estimate Gaussian Graphical Models with Latent Variables
Description
Estimate Gaussian graphical models with latent variables using the method in Chandrasekaran et al. (2012).
Usage
lvglasso(data, n, p, lambda1, lambda2, rule = "AND")
Arguments
data |
data set, can be a matrix or data frame with |
n |
the number of observations. |
p |
the number of observed variables. |
lambda1 |
tuning parameter that encourages estimated graph to be sparse. Lambda1 is proportional to lambda2. |
lambda2 |
tuning parameter that encourages the matrix |
rule |
rules to combine the inverse covariance matrices. Options are "AND" and "OR". Default is "AND". |
Details
The lvglasso method assumes that all the variables, both observed and latent, are jointly Gaussian, and specifies the conditional distribution of observed variables on the latent variables by a graphical model. Under the high-dimentional setting, this method provides consistent estimators for the conditional graphical model of observed variables conditioned on latent variables.
Value
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalties |
the penalty values |
References
Chandrasekaran, V., Parrilo, P. A. & Willsky, A. S. (2012), ‘Latent variable graphical model selection via convex optimization’, Ann. Statist. 40(4), 1935–1967.
Examples
#Gaussian distribution with "AND" rule
n <- 50
R <- 20
p <- 30
l <- 2
s <- 2
data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9,
sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)$X
result <- lvglasso(data, n, p, lambda1 = 0.222, lambda2 = 0.1*0.222, rule = "AND")
Estimate Graphical Models with Latent Variables and Replicates
Description
Estimate graphical models with latent variables and replicates using the method in Tan et al. (2016).
Usage
semilatent(data, n, R, p, lambda, distribution = "Gaussian", rule = "AND")
Arguments
data |
data set. Can be a matrix, list, array, or data frame. If the data set is a matrix, it should have |
n |
the number of observations. |
R |
the number of replicates for each observation. |
p |
the number of observed variables. |
lambda |
tuning parameter that encourages estimated graph to be sparse. |
distribution |
For a data set with Gaussian distribution, use "Gaussian"; For a data set with Ising distribution, use "Ising". Default is "Gaussian". |
rule |
rules to combine matrices that encode the conditional dependence relationships between sets of two observed variables. Options are "AND" and "OR". Default is "AND". |
Details
The semilatent method has two assumptions. The first one states that the latent variables are constant across replicates.
Assumption 2 states that given the latent variables, the replicates are mutually independent.
With these two assumptions, the method seeks to solve the following problem for 1 \le j \le p
.
\min_{\beta_{j,O / j}} \{l_j (\beta_{j,O / j}) + \lambda\|\beta_{j,O / j}\|_1 \},
where l_j (\beta_{j,O / j})
is a nuisance-free loss function, \beta_{j,O / j}
is a parameter that represents the conditional dependence relationships between j
th observed variable and the other observed variables, and \lambda
is a tuning parameter.
This method aims at modeling semiparametric exponential family graphical model with latent variables and replicates.
Value
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalty |
the penalty value |
References
Tan, K. M., Ning, Y., Witten, D. M. & Liu, H. (2016), ‘Replicates in high dimensions, with applications to latent variable graphical models’, Biometrika 103(4), 761–777.
Examples
#semilatent Gaussian with "AND" rule
n <- 50
R <- 20
p <- 30
seed <- 1
l <- 2
s <- 2
data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9,
sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X
result <- semilatent(data, n, R, p, lambda = 0.1,distribution = "Gaussian",
rule = "AND")