Boolean Nested Effects Models (B-NEM) are used to infer signalling pathways. In different experiments (conditions) genes of a pathway (S-genes) are stimulated or inhibited, alone or in combination. In each experiment transcriptional targets (E-genes) of the pathway react differently and are higher or lower expressed depending on the condition. From these differential expressions B-NEM infers Boolean functions presented as hyper-edges of a hyper-graph connecting parents and children in the pathway. For example, if the signal is transduced by two parents A and B to a child C and the signal can be blocked with a knock-down of either one, they are connected by a typical AND-gate. If the signal is still transduced during a single knock-down, but blocked by the double knock-down of A and B, they activate C by an OR-gate. In general the state of child C is defined by a Boolean function \[f: \left\{0,1\right\}^n \to \left\{0,1\right\},~C = f(A_1,\dots,A_n)\] with its parents \(A_i, i \in \left\{1,\dots,n\right\}\).
The B-NEM package is based on and uses many functions of the CellNOptR package of Terfve et al., 2012.
Install the package with the devtools library or via Bioconductor.
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("bnem")
Load the Boolean Nested Effects model package.
library(bnem)
We show how to use B-NEM on a toy pathway presented by a directed acyclic (hyper-)graph (DAG). B-NEM demands several objects as input. The two main objects are the differential gene expression (data) and prior knowledge respectively the search space.
The following function creates a DAG, which is then extended to a full Boolean network. From this network we sample a subset of edges to define our ground truth. In the last step, the function generates data given the ground truth.
set.seed(9247)
sim <- simBoolGtn(Sgenes = 10, maxEdges = 10, keepsif = TRUE, negation=0.1,
layer=1)
fc <- addNoise(sim,sd=1)
expression <- sim$expression
CNOlist <- sim$CNOlist
model <- sim$model
bString <- sim$bString
ERS <- sim$ERS
PKN <- sim$PKN
children <- unique(gsub(".*=|!", "", PKN$reacID))
stimuli <- unique(gsub("=.*|!", "", PKN$reacID))
stimuli <- stimuli[which(!(stimuli %in% children))]
The following figure shows the PKN and the ground truth network.
library(mnem)
par(mfrow=c(1,2))
plotDnf(PKN$reacID, stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 10
## Number of Edges = 21
plotDnf(model$reacID[as.logical(sim$bString)], stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 12
## Number of Edges = 14
We suggest to take a look at the sif file in the working directory. In future analyses it is easier to just provide a suitable sif file for the investigated pathway. In a real application the underlying ground truth network (GTN) is not known.
B-NEM uses differential expression between experiments to infer the pathway logics. For example look at the colnames of sim$fc (=foldchanges of E-genes (rows)) and remember that S0g, S1g are our stimulated S-genes and the rest possibly inhibited. Thus in the first column of fc we have the contrast S2g \(-\) control. In the control no S-genes are perturbed.
We search for the GTN in our restricted network space. Each network is a sub-graph of the full hyper-graph sim$model$reacID. We initialize the search with a starting network and greedily search the neighborhood.
We use two different network scores. “s” is the rank correlation and “cosine” is the cosine similarity (hence no shift invariance).
## we start with the empty graph:
initBstring <- reduceGraph(rep(0, length(model$reacID)), model, CNOlist)
## or a fully connected graph:
## initBstring <- reduceGraph(rep(1, length(model$reacID)), model, CNOlist)
## paralellize for several threads on one machine or multiple machines
## see package "snowfall" for details
parallel <- NULL # NULL for serialization
## or distribute to 30 threads on four different machines:
## parallel <- list(c(4,16,8,2), c("machine1", "machine2", "machine3",
## "machine4"))
## greedy search:
greedy <- bnem(search = "greedy",
fc=fc,
expression=expression, # not used, if fc is defined
CNOlist=CNOlist,
model=model,
parallel=parallel,
initBstring=initBstring,
draw = FALSE, # TRUE: draw network at each step
verbose = FALSE, # TRUE: print changed (hyper-)edges and
## score improvement
maxSteps = Inf,
method = "s"
)
greedy2 <- bnem(search = "greedy",
fc=fc,
expression=expression,
CNOlist=CNOlist,
model=model,
parallel=parallel,
initBstring=initBstring,
draw = FALSE,
verbose = FALSE,
maxSteps = Inf,
method = "cosine"
)
We compare the binary strings defining the ground truth and the inferred networks. We first look at the respective scores (the smaller the better) and then compute sensitivity and specificity of the edges for both results.
greedy$scores # rank correlation = normalized
## [[1]]
## [1] -34.16531 -36.24934 -37.42401 -38.57367 -39.47269 -40.24216 -40.95579
## [8] -41.58563 -41.84892 -42.11197 -42.31057
greedy2$scores # scaled log foldchanges
## [[1]]
## [1] -34.25865 -36.41634 -37.70387 -38.93015 -39.91160 -40.66781 -41.38822
## [8] -41.90228 -42.15694 -42.43007 -42.63515
accuracy <- function(gtn,inferred) {
sens <- sum(gtn == 1 & inferred == 1)/
(sum(gtn == 1 & inferred == 1) + sum(gtn == 1 & inferred == 0))
spec <- sum(gtn == 0 & inferred == 0)/
(sum(gtn == 0 & inferred == 0) + sum(gtn == 0 & inferred == 1))
return(c(sens,spec))
}
accuracy(bString,greedy$bString)
## [1] 0.900000 0.974359
accuracy(bString,greedy2$bString)
## [1] 0.900000 0.974359
resString <- greedy2$bString
We take a look at the efficiency of the search algorithm with sensitivity and specificity of the hyper-edges for the optimized network and the accuracy of its ERS (similar to the truth table). Since several networks produce the same ERS, the learned hyper-graph can differ from the GTN and still be \(100\%\) accurate.
par(mfrow=c(1,2))
## GTN:
plotDnf(model$reacID[as.logical(bString)], main = "GTN", stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 12
## Number of Edges = 14
## greedy optimum:
plotDnf(model$reacID[as.logical(resString)], main = "greedy optimum",
stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 12
## Number of Edges = 14
## accuracy of the expected response scheme (can be high even, if
## the networks differ):
ERS.res <- computeFc(CNOlist,
t(simulateStatesRecursive(CNOlist, model, resString)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))
## [1] 0.9744186
After optimization we look at the data and how well the greedy optimum explains the E-genes. The lower the score the better the fit.
fitinfo <- validateGraph(CNOlist, fc=fc, expression=expression, model = model,
bString = resString,
Sgene = 4, Egenes = 1000, cexRow = 0.8,
cexCol = 0.7,
xrot = 45,
Colv = TRUE, Rowv = TRUE, dendrogram = "both",
bordercol = "lightgrey",
aspect = "iso", sub = "")
The bottom row shows the ERS of S-gene S0g and the other rows show the observed response scheme (ORS) of the S0g-regulated E-genes. Alternatively to the greedy neighborhood search a genetic algorithm and exhaustive search are available. The exhaustive search is not recommended for search spaces with more than 20 hyper-edges.
## genetic algorithm:
genetic <- bnem(search = "genetic",
fc=fc,
expression=expression,
parallel = parallel,
CNOlist=CNOlist,
model=model,
initBstring=initBstring,
popSize = 10,
stallGenMax = 10,
draw = FALSE,
verbose = FALSE
)
## exhaustive search:
exhaustive <- bnem(search = "exhaustive",
parallel = parallel,
CNOlist=CNOlist,
fc=fc,
expression=expression,
model=model
)
In this section we show how to use B-NEM, if stimuli and inhibitors overlap. Additionally we want to show that B-NEM can resolve cycles. For this we allow the PKN to have cycles, but no repression, because repression can lead to an unresolvable ERS. See Pirkl et al., 2016 for details.
We look for stimuli which are also inhibited. For those we add additional stimuli S-genes. The stimuli S-gene (parent) and the inhibitor S-gene (child) are connected by a positive edge. Hence, if the S-gene is activated, the stimuli is set to 1. If the S-gene is inhibited the inhibitor S-gene is set to 0.
set.seed(9247)
sim <- simBoolGtn(Sgenes = 4, maxEdges = 50, dag = FALSE,
negation = 0, allstim = TRUE,frac=0.1)
fc <- addNoise(sim,sd=1)
expression <- sim$expression
CNOlist <- sim$CNOlist
model <- sim$model
bString <- sim$bString
ERS <- sim$ERS
PKN <- sim$PKN
children <- unique(gsub(".*=|!", "", PKN$reacID))
stimuli <- unique(gsub("=.*|!", "", PKN$reacID))
stimuli <- stimuli[which(!(stimuli %in% children))]
The next figure shows the cyclic PKN with extra stimuli S-genes. Notice, this way the inhibition of the S-genes overrules the stimulation.
par(mfrow=c(1,2))
plotDnf(sim$PKN$reacID, stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 8
## Number of Edges = 16
plotDnf(sim$model$reacID[as.logical(bString)], stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 9
## Number of Edges = 9
greedy3 <- bnem(search = "greedy",
CNOlist=CNOlist,
fc=fc,
expression=expression,
model=model,
parallel=parallel,
initBstring=bString*0,
draw = FALSE,
verbose = FALSE,
maxSteps = Inf,
method = "cosine"
)
resString2 <- greedy3$bString
We look at the accuracy again.
par(mfrow=c(1,2))
plotDnf(model$reacID[as.logical(bString)], main = "GTN", stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 9
## Number of Edges = 9
plotDnf(model$reacID[as.logical(resString2)], main = "greedy optimum",
stimuli = stimuli)
## A graphNEL graph with directed edges
## Number of Nodes = 13
## Number of Edges = 20
accuracy(bString,resString2)
## [1] 0.5714286 0.8181818
ERS.res <- computeFc(CNOlist, t(simulateStatesRecursive(CNOlist, model,
resString2)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))
## [1] 0.9759615
One additional challenge for B-NEM compared to methods like CellNetOptimizer is the fact, that B-NEM optimizes the signalling pathway and simultaneously the attachment of the E-genes. However, it is possible to include prior knowledge into the search.
We just have to create a list object, which holds prior information about the E-genes.
egenes <- list()
for (i in PKN$namesSpecies) {
egenes[[i]] <- rownames(fc)[grep(i, rownames(fc))]
}
initBstring <- reduceGraph(rep(0, length(model$reacID)), model, CNOlist)
greedy2b <- bnem(search = "greedy",
CNOlist=CNOlist,
fc=fc,
expression=expression,
egenes=egenes,
model=model,
parallel=parallel,
initBstring=initBstring,
draw = FALSE,
verbose = FALSE,
maxSteps = Inf,
method = "cosine"
)
resString3 <- greedy2b$bString
We attach every E-gene to its real parent in the for loop. If an E-gene is only included once in the egenes object, it’s position is not learned, but fixed during the optimization of the signalling pathway. Alternatively, we can include one E-gene several times for just a subset of S-genes. This way S-genes, which do not have the E-genes included in their E-gene set are excluded as potential parents.
accuracy(bString,resString3)
## [1] 1.000000 0.969697
ERS.res <- computeFc(CNOlist,
t(simulateStatesRecursive(CNOlist, model, resString3)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))
## [1] 0.9983974
We can also quantify how well the attached E-genes fit to the learned network. See the Pirkl, 2016 for more details.
residuals <- findResiduals(resString3, CNOlist, model, fc, verbose = FALSE)
## verbose = TRUE plots the residuals matrices
Rows denote S-genes in the network. Columns denote Contrasts between two experiments. Green colors in the left matrix show the score improves, if no (0) or a negative (-1) response in the network’s ERS is changed to positive (+1). Red colors show a zero changed to positive. The right matrix shows the same for switched +1 and -1.
In this section we analyze the B-Cell receptor (BCR) signalling data. The data set consists of one stimulated S-gene (BCR), three S-genes with available single inhibitions (Tak1, Pik3, Erk) and three S-genes with up to triple inhibitions. See Pirkl et al. 2016 for more details.
data(bcr)
## head(fc)
fc <- bcr$fc
expression <- bcr$expression
We build a PKN to incorporate biological knowledge and account for missing combinatorial inhibitions.
library(CellNOptR)
negation <- FALSE # what happens if we allow negation?
sifMatrix <- numeric()
for (i in "BCR") {
sifMatrix <- rbind(sifMatrix, c(i, 1, c("Pi3k")))
sifMatrix <- rbind(sifMatrix, c(i, 1, c("Tak1")))
if (negation) {
sifMatrix <- rbind(sifMatrix, c(i, -1, c("Pi3k")))
sifMatrix <- rbind(sifMatrix, c(i, -1, c("Tak1")))
}
}
for (i in c("Pi3k", "Tak1")) {
for (j in c("Ikk2", "p38", "Jnk", "Erk", "Tak1", "Pi3k")) {
if (i %in% j) { next() }
sifMatrix <- rbind(sifMatrix, c(i, 1, j))
if (negation) {
sifMatrix <- rbind(sifMatrix, c(i, -1, j))
}
}
}
for (i in c("Ikk2", "p38", "Jnk")) {
for (j in c("Ikk2", "p38", "Jnk")) {
if (i %in% j) { next() }
sifMatrix <- rbind(sifMatrix, c(i, 1, j))
if (negation) {
sifMatrix <- rbind(sifMatrix, c(i, -1, j))
}
}
}
file <- tempfile(pattern="interaction",fileext=".sif")
write.table(sifMatrix, file = file, sep = "\t",
row.names = FALSE, col.names = FALSE, quote = FALSE)
PKN <- readSIF(file)
In the next step, we create the meta information. This ensures, that we simulate all the conditions, which are actually available in the data. Furthermore we build our Boolean search space based on the PKN.
CNOlist <- dummyCNOlist(stimuli = "BCR",
inhibitors = c("Tak1", "Pi3k", "Ikk2",
"Jnk", "p38", "Erk"),
maxStim = 1, maxInhibit = 3)
model <- preprocessing(CNOlist, PKN)
## [1] "The following species are measured: BCR, Erk, Ikk2, Jnk, Pi3k, Tak1, p38"
## [1] "The following species are stimulated: BCR"
## [1] "The following species are inhibited: Erk, Ikk2, Jnk, Pi3k, Tak1, p38"
## [1] "The following species are not observable and/or not controllable: "
In the final step we learn the network with the greedy search.
initBstring <- rep(0, length(model$reacID))
bcrRes <- bnem(search = "greedy",
fc=fc,
CNOlist=CNOlist,
model=model,
parallel=parallel,
initBstring=initBstring,
draw = FALSE,
verbose = FALSE,
method = "cosine"
)
par(mfrow=c(1,3))
plotDnf(PKN$reacID, main = "PKN", stimuli = "BCR")
## A graphNEL graph with directed edges
## Number of Nodes = 7
## Number of Edges = 18
plotDnf(bcrRes$graph, main = "greedy optimum (empty start)",
stimuli = "BCR")
## A graphNEL graph with directed edges
## Number of Nodes = 8
## Number of Edges = 9
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] CellNOptR_1.38.0 ggplot2_3.3.3 XML_3.99-0.6
## [4] Rgraphviz_2.36.0 RCurl_1.98-1.3 hash_2.2.6.1
## [7] RBGL_1.68.0 graph_1.70.0 BiocGenerics_0.38.0
## [10] mnem_1.8.0 bnem_1.0.0 BiocStyle_2.20.0
##
## loaded via a namespace (and not attached):
## [1] RcppEigen_0.3.3.9.1 igraph_1.2.6 splines_4.1.0
## [4] BiocParallel_1.26.0 GenomeInfoDb_1.28.0 fastICA_1.2-2
## [7] sva_3.40.0 amap_0.8-18 digest_0.6.27
## [10] htmltools_0.5.1.1 magick_2.7.2 pcalg_2.7-2
## [13] wesanderson_0.3.6 gdata_2.18.0 fansi_0.4.2
## [16] magrittr_2.0.1 memoise_2.0.0 cluster_2.1.2
## [19] sfsmisc_1.1-11 limma_3.48.0 Biostrings_2.60.0
## [22] fastcluster_1.1.25 annotate_1.70.0 matrixStats_0.58.0
## [25] gmodels_2.18.1 epiNEM_1.16.0 bdsmatrix_1.3-4
## [28] jpeg_0.1-8.1 colorspace_2.0-1 blob_1.2.1
## [31] xfun_0.23 dplyr_1.0.6 crayon_1.4.1
## [34] jsonlite_1.7.2 RcppArmadillo_0.10.4.0.0 genefilter_1.74.0
## [37] survival_3.2-11 zoo_1.8-9 glue_1.4.2
## [40] gtable_0.3.0 XVector_0.32.0 zlibbioc_1.38.0
## [43] kernlab_0.9-29 prabclus_2.3-2 DEoptimR_1.0-8
## [46] ggm_2.5 apcluster_1.4.8 abind_1.4-5
## [49] scales_1.1.1 vsn_3.60.0 edgeR_3.34.0
## [52] DBI_1.1.1 Rcpp_1.0.6 xtable_1.8-4
## [55] clue_0.3-59 bit_4.0.4 latex2exp_0.5.0
## [58] proxy_0.4-25 mclust_5.4.7 preprocessCore_1.54.0
## [61] stats4_4.1.0 tsne_0.1-3 httr_1.4.2
## [64] RColorBrewer_1.1-2 fpc_2.2-9 modeltools_0.2-23
## [67] ellipsis_0.3.2 pkgconfig_2.0.3 flexmix_2.3-17
## [70] nnet_7.3-16 sass_0.4.0 binom_1.1-1
## [73] locfit_1.5-9.4 utf8_1.2.1 tidyselect_1.1.1
## [76] rlang_0.4.11 AnnotationDbi_1.54.0 munsell_0.5.0
## [79] BoolNet_2.1.5 tools_4.1.0 cachem_1.0.5
## [82] generics_0.1.0 RSQLite_2.2.7 evaluate_0.14
## [85] stringr_1.4.0 fastmap_1.1.0 ggdendro_0.1.22
## [88] yaml_2.2.1 knitr_1.33 bit64_4.0.5
## [91] Linnorm_2.16.0 robustbase_0.93-7 purrr_0.3.4
## [94] KEGGREST_1.32.0 nlme_3.1-152 flexclust_1.4-0
## [97] compiler_4.1.0 png_0.1-7 e1071_1.7-6
## [100] affyio_1.62.0 minet_3.50.0 tibble_3.1.2
## [103] statmod_1.4.36 bslib_0.2.5.1 stringi_1.6.2
## [106] highr_0.9 naturalsort_0.1.3 lattice_0.20-44
## [109] Matrix_1.3-3 vegan_2.5-7 permute_0.9-5
## [112] vctrs_0.3.8 pillar_1.6.1 lifecycle_1.0.0
## [115] BiocManager_1.30.15 jquerylib_0.1.4 data.table_1.14.0
## [118] bitops_1.0-7 corpcor_1.6.9 R6_2.5.0
## [121] latticeExtra_0.6-29 affy_1.70.0 bookdown_0.22
## [124] IRanges_2.26.0 MASS_7.3-54 gtools_3.8.2
## [127] assertthat_0.2.1 withr_2.4.2 GenomeInfoDbData_1.2.6
## [130] S4Vectors_0.30.0 diptest_0.76-0 mgcv_1.8-35
## [133] class_7.3-19 rmarkdown_2.8 Rtsne_0.15
## [136] Biobase_2.52.0 snowfall_1.84-6.1 ellipse_0.4.2
References:
Pirkl, Martin, Hand, Elisabeth, Kube, Dieter, & Spang, Rainer. 2016. Analyzing synergistic and non-synergistic interactions in signalling pathways using Boolean Nested Effect Models. , 32(6), 893–900.
Pirkl, Martin. 2016. Indirect inference of synergistic and alternative signalling of intracellular pathways. University of Regensburg.
Saez-Rodriguez, Julio, Alexopoulos, Leonidas G, Epperlein, Jonathan, Samaga, Regina, Lauffenburger, Douglas A, Klamt, Steffen, & Sorger, Peter K. 2009. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol, 5, 331.\
C Terfve, T Cokelaer, A MacNamara, D Henriques, E Goncalves, MK Morris, M van Iersel, DA Lauffenburger, J Saez-Rodriguez. CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Systems Biology, 2012, 6:133.