How to use the estimation procedures

library(PCBN)

Data simulation

DAG = create_empty_DAG(7)
DAG = bnlearn::set.arc(DAG, 'U1', 'U2')
DAG = bnlearn::set.arc(DAG, 'U1', 'U3')
DAG = bnlearn::set.arc(DAG, 'U1', 'U4')
DAG = bnlearn::set.arc(DAG, 'U2', 'U4')
DAG = bnlearn::set.arc(DAG, 'U3', 'U4')
DAG = bnlearn::set.arc(DAG, 'U4', 'U6')
DAG = bnlearn::set.arc(DAG, 'U5', 'U6')
DAG = bnlearn::set.arc(DAG, 'U4', 'U7')
DAG = bnlearn::set.arc(DAG, 'U6', 'U7')
DAG |> bnlearn::as.igraph() |>
  igraph::plot.igraph(size = 20, label.cex = 2
                      # , layout = igraph::layout_as_tree
  )

order_hash = r2r::hashmap()
order_hash[['U4']] = c("U2", "U1", "U3")
order_hash[['U6']] = c("U4", "U5")
order_hash[['U7']] = c("U4", "U6")
complete_and_check_orders(DAG, order_hash)

fam = matrix(c(0, 1, 1, 1, 0, 0, 0,
               0, 0, 0, 1, 0, 0, 0,
               0, 0, 0, 1, 0, 0, 0,
               0, 0, 0, 0, 0, 1, 1,
               0, 0, 0, 0, 0, 1, 0,
               0, 0, 0, 0, 0, 0, 1,
               0, 0, 0, 0, 0, 0, 0), 
             byrow = TRUE, ncol = 7)
tau = 0.2 * fam

my_PCBN = new_PCBN(
  DAG, order_hash,
  copula_mat = list(tau = tau, fam = fam))

N = 100
mydata = PCBN_sim(my_PCBN, N = N)

Estimation

Copula of U1 and U2

We prepare the environment for storing the results.

e = default_envir()

We start by fitting the copula of \((U_1, U_2)\).

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U2",
             cond_set = c(), familyset = 1, order_hash = order_hash, e = e,
             method = "mle")
#> Estimating the copula of  U1  and  U2
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

This is stored in e$copula_hash. To access it, we can do:

copula_key = e$keychain[[list(margins = c("U1", "U2"), cond = character(0))]]

e$copula_hash[[copula_key]]
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

There is a level of indirection, because the copula_key actually stores the whole computation tree. Therefore, if the statistician decides to use a different PCBN for the same structure, some estimated parts can be reused if the computation path is the same.

Let’s see what is actually the copula_key.

print(data.tree::FromListSimple(copula_key))
#>   levelName
#> 1    U1, U2
#> 2     ¦--U1
#> 3     °--U2

Corresponding computation trees of the copulas

The computation trees of these copulas can be found as before.

e$keychain[[list(margins = c("U2", "U4"), cond = character(0))]] |>
  data.tree::FromListSimple() |>
  print()
#>   levelName
#> 1    U2, U4
#> 2     ¦--U2
#> 3     °--U4
e$keychain[[list(margins = c("U1", "U4"), cond = c("U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>        levelName
#> 1 U1, U4 | U2   
#> 2  ¦--U1 | U2   
#> 3  ¦   °--U1, U2
#> 4  ¦       ¦--U1
#> 5  ¦       °--U2
#> 6  °--U4 | U2   
#> 7      °--U2, U4
#> 8          ¦--U2
#> 9          °--U4
e$keychain[[list(margins = c("U3", "U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>                 levelName
#> 1  U3, U4 | U1, U2       
#> 2   ¦--U3 | U1           
#> 3   ¦   °--U1, U3        
#> 4   ¦       ¦--U1        
#> 5   ¦       °--U3        
#> 6   °--U4 | U1, U2       
#> 7       °--U1, U4 | U2   
#> 8           ¦--U1 | U2   
#> 9           ¦   °--U1, U2
#> 10          ¦       ¦--U1
#> 11          ¦       °--U2
#> 12          °--U4 | U2   
#> 13              °--U2, U4
#> 14                  ¦--U2
#> 15                  °--U4

Corresponding computation trees of the margins

In the same way, we obtain the computation tree of the margin \(U_4 \, | \, U_1, U_2\) by:

e$keychain[[list(margin = c("U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>             levelName
#> 1  U4 | U1, U2       
#> 2   °--U1, U4 | U2   
#> 3       ¦--U1 | U2   
#> 4       ¦   °--U1, U2
#> 5       ¦       ¦--U1
#> 6       ¦       °--U2
#> 7       °--U4 | U2   
#> 8           °--U2, U4
#> 9               ¦--U2
#> 10              °--U4

You can remark that this is a sub-tree of the computation tree of the copula of \((U_3, U_4) \, | \, U_1, U_2\). This is coherent, because we needed the margin \(U_4 \, | \, U_1, U_2\) to estimate this copula. Nevertheless, note that

e$keychain[[list(margin = c("U3"), cond = c("U1", "U2"))]]
#> NULL

This is because the margin \(U_3 \, | \, U_1, U_2\) is actually the same as \(U_3 \, | \, U_1\) by conditional independence. This can be seen using this function:

remove_CondInd(DAG = DAG, node = "U3", cond_set = c("U1", "U2"))
#> [1] "U1"

Conditional marginal pseudo-observations

The conditional marginal pseudo-observations can be found in e$margin_hash. For example, the conditional pseudo-observations \(\hat U_{i, \, 3|1}\), \(i=1, \dots, n\), can be obtained by:

e$margin_hash[[ e$keychain[[list(margin = c("U3"), cond = c("U1"))]] ]]  |>
  head()
#> [1] 0.7016929 0.3503202 0.2715592 0.9891115 0.1999063 0.1960950

These conditional margins are internally computed by the function ComputeCondMargin.

Fitting the rest of the DAG