--- title: "Selection Criteria for Parameters in grasps" bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{crit} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: collapse: false comment: '#>' echo: true fig.align: "center" message: false results: 'hide' warning: false --- ## Introduction Precision matrix estimation requires selecting appropriate regularization parameter $\lambda$ to balance sparsity (number of edges) and model fit (likelihood), and a mixing parameter $\alpha$ to trade off between element-wise (individual-level) and block-wise (group-level) penalties. ## Background: Negative Log-Likelihood In a Gaussian graphical model (GGM), the data matrix $X_{n \times d}$ consists of $n$ independent and identically distributed observations $X_1, \dots, X_n$ drawn from $N_d(\mu,\Sigma)$. Let $\Omega = \Sigma^{-1}$ denote the precision matrix, and define the empirical covariance matrix as $S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$. Up to an additive constant, the negative log-likelihood (nll) for $\Omega$ simplified to $$ \mathrm{nll}(\Omega) = \frac{n}{2}[-\log\det(\Omega) + \mathrm{tr}(S\Omega)]. $$ The edge set $E(\Omega)$ is determined by the non-zero off-diagonal entries: an edge $(i, j)$ is included if and only if $\omega_{ij} \neq 0$ for $i < j$. The number of edges is therefore given by $\vert E(\Omega) \vert$. ## Selection Criteria 1. AIC: Akaike information criterion [@akaike1973information] $$ \hat{\Omega}_{\mathrm{AIC}} = {\arg\min}_{\Omega} \left\{ 2\,\mathrm{nll}(\Omega) + 2\,\lvert E(\Omega) \rvert \right\}. $$ 2. BIC: Bayesian information criterion [@schwarz1978estimating] $$ \hat{\Omega}_{\mathrm{BIC}} = {\arg\min}_{\Omega} \left\{ 2\,\mathrm{nll}(\Omega) + \log(n)\,\lvert E(\Omega) \rvert \right\}. $$ 3. EBIC: Extended Bayesian information criterion [@chen2008extended; @foygel2010extended] $$ \hat{\Omega}_{\mathrm{EBIC}} = {\arg\min}_{\Omega} \left\{ 2\,\mathrm{nll}(\Omega) + \log(n)\,\lvert E(\Omega) \rvert + 4\,\xi\,\log(d)\,\lvert E(\Omega) \rvert \right\}, $$ where $\xi \in [0,1]$ is a tuning parameter. Setting $\xi = 0$ reduces EBIC to the classic BIC. 4. HBIC: High dimensional Bayesian information criterion [@wang2013calibrating; @fan2017high] $$ \hat{\Omega}_{\mathrm{HBIC}} = {\arg\min}_{\Omega} \left\{ 2\,\mathrm{nll}(\Omega) + \log[\log(n)]\,\log(d)\,\lvert E(\Omega) \rvert \right\}. $$ 5. $K$-fold cross validation with negative log-likelihood loss. @fig-cv illustrates the $K$-fold cross-validation procedure used for tuning the parameters $\lambda$ and $\alpha$. The notation $\#\lambda$ and $\#\alpha$ denotes the number of candidate values considered for $\lambda$ and $\alpha$, respectively, forming a grid of $\mathrm{\#}\lambda \times \mathrm{\#}\alpha$ total parameter combinations. For each of the $K$ iterations, negative log-likelihood loss is evaluated for all parameter combinations, yielding $K$ performance values per combination. The optimal parameter pair is selected as the one achieving the lowest average loss across the $K$ iterations. ::: {#fig-cv} ![](../man/figures/cv-diagram.png){width=100%} $K$-fold cross-validation procedure for tuning ($\lambda$, $\alpha$). ::: ## Reference {-}