Estimation

Copula of U1 and U2

We prepare the environment for storing the results.

e = default_envir()

We start by fitting the copula of $(U_1, U_2)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U2",
             cond_set = c(), familyset = 1, order_hash = order_hash, e = e,
             method = "mle")
#> Estimating the copula of  U1  and  U2
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

This is stored in e$copula_hash. To access it, we can do:

copula_key = e$keychain[[list(margins = c("U1", "U2"), cond = character(0))]]

e$copula_hash[[copula_key]]
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

There is a level of indirection, because the copula_key actually stores the whole computation tree. Therefore, if the statistician decides to use a different PCBN for the same structure, some estimated parts can be reused if the computation path is the same.

Let’s see what is actually the copula_key.

print(data.tree::FromListSimple(copula_key))
#>   levelName
#> 1    U1, U2
#> 2     ¦--U1
#> 3     °--U2

Copulas related to U3 and U4

In the same way as before, we fit the copula of $(U_1, U_3)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U3",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U1  and  U3
#> Bivariate copula: Gaussian (par = 0.22, tau = 0.14)

Remember that the ordered parents of $U_4$ are $U_2$, $U_1$ and $U_3$. Therefore, we fit first the copula of $(U_2, U_4)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U2", w = "U4",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U2  and  U4
#> Bivariate copula: Gaussian (par = 0.18, tau = 0.12)

Then, we fit the copula of $(U_1, U_4) \, | \, U_2$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U4",
             cond_set = c("U2"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U1  given  U2 
#> Estimating the cond cdf of  U4  given  U2 
#> Estimating the copula of  U1  and  U4  given  U2
#> Bivariate copula: Gaussian (par = 0.32, tau = 0.21)

Finally, we fit the copula of $(U_3, U_4) \, | \, U_1, U_2$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U3", w = "U4",
             cond_set = c("U1", "U2"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U3  given  U1 
#> Estimating the cond cdf of  U4  given  U1 U2 
#> Estimating the copula of  U3  and  U4  given  U1 U2
#> Bivariate copula: Gaussian (par = 0.25, tau = 0.16)

Corresponding computation trees of the copulas

The computation trees of these copulas can be found as before.

e$keychain[[list(margins = c("U2", "U4"), cond = character(0))]] |>
  data.tree::FromListSimple() |>
  print()
#>   levelName
#> 1    U2, U4
#> 2     ¦--U2
#> 3     °--U4

e$keychain[[list(margins = c("U1", "U4"), cond = c("U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>        levelName
#> 1 U1, U4 | U2   
#> 2  ¦--U1 | U2   
#> 3  ¦   °--U1, U2
#> 4  ¦       ¦--U1
#> 5  ¦       °--U2
#> 6  °--U4 | U2   
#> 7      °--U2, U4
#> 8          ¦--U2
#> 9          °--U4

e$keychain[[list(margins = c("U3", "U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>                 levelName
#> 1  U3, U4 | U1, U2       
#> 2   ¦--U3 | U1           
#> 3   ¦   °--U1, U3        
#> 4   ¦       ¦--U1        
#> 5   ¦       °--U3        
#> 6   °--U4 | U1, U2       
#> 7       °--U1, U4 | U2   
#> 8           ¦--U1 | U2   
#> 9           ¦   °--U1, U2
#> 10          ¦       ¦--U1
#> 11          ¦       °--U2
#> 12          °--U4 | U2   
#> 13              °--U2, U4
#> 14                  ¦--U2
#> 15                  °--U4

Corresponding computation trees of the margins

In the same way, we obtain the computation tree of the margin $U_4 \, | \, U_1, U_2$ by:

e$keychain[[list(margin = c("U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>             levelName
#> 1  U4 | U1, U2       
#> 2   °--U1, U4 | U2   
#> 3       ¦--U1 | U2   
#> 4       ¦   °--U1, U2
#> 5       ¦       ¦--U1
#> 6       ¦       °--U2
#> 7       °--U4 | U2   
#> 8           °--U2, U4
#> 9               ¦--U2
#> 10              °--U4

You can remark that this is a sub-tree of the computation tree of the copula of $(U_3, U_4) \, | \, U_1, U_2$. This is coherent, because we needed the margin $U_4 \, | \, U_1, U_2$ to estimate this copula. Nevertheless, note that

e$keychain[[list(margin = c("U3"), cond = c("U1", "U2"))]]
#> NULL

This is because the margin $U_3 \, | \, U_1, U_2$ is actually the same as $U_3 \, | \, U_1$ by conditional independence. This can be seen using this function:

remove_CondInd(DAG = DAG, node = "U3", cond_set = c("U1", "U2"))
#> [1] "U1"

Conditional marginal pseudo-observations

The conditional marginal pseudo-observations can be found in e$margin_hash. For example, the conditional pseudo-observations $\hat U_{i, \, 3|1}$, $i=1, \dots, n$, can be obtained by:

e$margin_hash[[ e$keychain[[list(margin = c("U3"), cond = c("U1"))]] ]]  |>
  head()
#> [1] 0.7016929 0.3503202 0.2715592 0.9891115 0.1999063 0.1960950

These conditional margins are internally computed by the function ComputeCondMargin.

Fitting the rest of the DAG

Copulas related to the node U6

BiCopCondFit(data = mydata, DAG = DAG, v = "U4", w = "U6",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U4  and  U6
#> Bivariate copula: Gaussian (par = 0.23, tau = 0.15)

BiCopCondFit(data = mydata, DAG = DAG, v = "U5", w = "U6",
             cond_set = c("U4"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U6  given  U4 
#> Estimating the copula of  U5  and  U6  given  U4
#> Bivariate copula: Gaussian (par = 0.43, tau = 0.28)

Copulas related to the node U7

BiCopCondFit(data = mydata, DAG = DAG, v = "U4", w = "U7",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U4  and  U7
#> Bivariate copula: Gaussian (par = 0.3, tau = 0.19)

BiCopCondFit(data = mydata, DAG = DAG, v = "U6", w = "U7",
             cond_set = c("U4"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U6  given  U4 
#> Estimating the cond cdf of  U7  given  U4 
#> Estimating the copula of  U6  and  U7  given  U4
#> Bivariate copula: Gaussian (par = 0.34, tau = 0.22)