%\VignetteIndexEntry{Training Signalling Pathway Maps to Biochemical Data with Logic-Based Ordinary Differential Equations} %\VignetteKeywords{Training Signalling Pathway Maps to % Biochemical Data with Logic-Based Ordinary Differential Equations %} %\VignettePackage{CNORode} \documentclass{article} \usepackage{Sweave} \usepackage{url, color} %\usepackage{cmap} \usepackage{authblk} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{hyperref} \hypersetup{ colorlinks, linkcolor=blue } \RequirePackage[T1]{fontenc} \RequirePackage{graphicx,ae,fancyvrb,color} \IfFileExists{upquote.sty}{\RequirePackage{upquote}}{} \setkeys{Gin}{width=0.8\textwidth} \definecolor{gris90}{gray}{0.90} \definecolor{gris10}{gray}{0.1} \definecolor{green}{rgb}{0.6, 0.9,0.6} \DefineVerbatimEnvironment{Sinput}{Verbatim}{% fontshape=sl, frame=single, xleftmargin=2em, fillcolor=\color{gris90}, % numbers=left % prevent copy/paste entire code } \DefineVerbatimEnvironment{Soutput}{Verbatim}{ fontshape=sl, frame=single, xleftmargin=2em, fillcolor=\color{green} } \DefineVerbatimEnvironment{shell}{Verbatim}{formatcom=\color{blue}} \title{Training Signalling Pathway Maps to Biochemical Data with Logic-Based Ordinary Differential Equations } \author[1,2]{David Henriques} \author[1]{Thomas Cokelaer} \author[3]{Attila Gabor} \author[3,4]{Enio Gjerga \thanks{@enio.gjerga@gmail.com}} \affil[1]{European Bioinformatics Institute, Saez-Rodriguez group, Cambridge, United Kingdom} \affil[2]{Instituto de Investigaciones Marinas-CSIC, Vigo, Spain} \affil[3]{Heidelberg University, Faculty of Medicine, Institute for Computational Biomedicine, Bioquant} \affil[4]{RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE)} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle \tableofcontents % gg <>= options(width=70, useFancyQuotes="UTF-8", prompt=" ", continue=" ") @ \section{Introduction} Mathematical models are used to understand protein signalling networks so as to provide an integrative view of pharmacological and toxicological processes at molecular level. \emph{CellNOptR}~\cite{CellNOptR} is an existing package (see \url{http://bioconductor.org/packages/release/bioc/html/CellNOptR.html}) that provides functionalities to combine prior knowledge network (about protein signalling networks) and perturbation data to infer functional characteristics (of the signalling network). While \emph{CellNOptR} has demonstrated its ability to infer new functional characteristics, it is based on a boolean formalism where protein species are characterised as being fully active or inactive. In contrast, logic-based ordinary differential equations allow a quantitive description of a given Boolean model. The method used here was first published by Wittmann et al. \cite{wittman} by the name of \emph{odefy}. For a detailed description of the methodology the user is adressed to \cite{wittman} and for a published application example to \cite{wittmanJS}. This package implements the Odefy method and focus mainly extending the \emph{CellNOptR} capabilities in order to simulate and calibrate logic-based ordinary differential equation model. We provide direct and easy to use interface to optimization methods available in R such as \emph{eSSR} \cite{eSSR1} (enhanced Scatter Search Metaheuristic for R) and an R genetic algorithm implementation by the name of \emph{genalg} in order to perform parameter estimation. Additionally we were specially careful in tackling the main computanional bottlenecks by implementing CNORode simulation engine in the C language using the \emph{CVODES} library \cite{cvodes}. This brief tutorial shows how to use \emph{CNORode} using as a starting point a Boolean model and a dataset consisting in a time-series of several proteins. \section{Installation} \emph{CNORode} depends on \emph{CellNOptR} and \emph{genalg}, which are 2 bioconductor packages. Therefore, in order to install \emph{CNORode}, open a R session and type: <>= if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("CNORode") @ It may take a few minutes to install all dependencies if you start from scratch (i.e, none of the R packages are installed on your system). Note also that under Linux system, some of these packages necessitate the R-devel package to be installed (e.g., under Fedora type \emph{sudo yum install R-devel}). Additionally, for parameter estimation we recommend the use of \emph{eSSR}. This algorithm is part of the MEIGOR toolbox which is available on BioConductor and it can be downloaded from \url{https://www.bioconductor.org/packages/release/bioc/html/MEIGOR.html}. MEIGOR can be installed by typing <>= if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MEIGOR") @ Finally, once \emph{CNORode} is installed you can load it by typing: <>= library(CNORode) @ \label{sec:quickstart} \section{Quick Start} In this section, we provide a quick example on how to use \emph{CNORode} to find the set of continuous parameters which minimize the squared difference between a model simulation and the experimental data. Since here we will not be modifying the model structure as opposed to \emph{CellNOptR} we will use a model that already contains AND type gates. Such model can be for instance the result of calibrating a \textit{prior knowledge network}(PKN) with \emph{CellNOptR}. Please note that a PKN can also be used as Boolean model which will contain only OR type gates. Detailed information about the model used here (ToyModelMMB\_FeedbackAnd) and additional models can be found at: \url{https://saezlab.github.io/CellNOptR/5_Models%20and%20Documentation/} \begin{center} \begin{figure}[ht] \includegraphics[height=7cm, width=7cm]{ToyModelMMB_FeedbackAnd} \includegraphics[height=7cm, width=7cm]{data_ToyModelMMB_FeddbackAnd} \caption{The used model(left panel). A plot from the data, resulting from the \emph{plotCNOlist} function (right panel). \label{fig:}} \end{figure} \end{center} The example used here is shipped with the CNORode. In order to load the data and model you should type the following commands: % show data and model loading <>= library(CNORode) model=readSIF(system.file("extdata", "ToyModelMMB_FeedbackAnd.sif", package="CNORode",lib.loc = .libPaths())); cno_data=readMIDAS(system.file("extdata", "ToyModelMMB_FeedbackAnd.csv", package="CNORode",lib.loc = .libPaths())); cnolist=makeCNOlist(cno_data,subfield=FALSE); @ The structure from the CNOlist and the Model object is exactly the same as used in the \emph{CellNOptR} and therefore for a detailed explanation about these structure we direct the reader to the \emph{CellNOptR} manual. In order to simulate the model and perform parameter estimation we first need to create a list with the ODE parameters associated with each dynamic state as described in \cite{wittman}. Each dynamic state will have a \(\tau\) parameter, as many \(n\) and \(k\) parameters as inputs. Although the default is to use the normalized Hill function it also possible to use the standard Hill or even not to use any transfer function at all. To illustrate the shape of the equations associated to each dynamic state and the meaning of each parameter, let us show the differential of \textit{Mek}: \[ \dot{Mek}=\Bigg[ \Bigg( 1 - \frac{Akt^{n_1}/({k_1}^{n_1} + Akt^{n_1})}{1/({k_1}^{n_1} + 1)}\Bigg) \cdot \Bigg(\frac{Raf^{n_2}/({k_2}^{n_2} + Raf^{n_2})}{1/({k_2}^{n_2} + 1)}\Bigg) - Mek \Bigg] \cdotp \tau_{Mek} \] To create a list of ODE parameters we will typically use the \emph{createLBodeContPars} function: <<>>= ode_parameters=createLBodeContPars(model, LB_n = 1, LB_k = 0.1, LB_tau = 0.01, UB_n = 5, UB_k = 0.9, UB_tau = 10, default_n = 3, default_k = 0.5, default_tau = 1, opt_n = TRUE, opt_k = TRUE, opt_tau = TRUE, random = FALSE) @ This function creates a general structure where the ODE parameters are ordered according to the model. Some tweaks have been added in order to ease tasks we have found to be common, nevertheless you can edit several attributes manually. If you print the \emph{ode\_parameters} list you will see the following attributes. \clearpage <<>>= print(ode_parameters) @ \clearpage Typically before running an optimization run you will want to choose which type of parameters you want to optimize. The field \emph{index\_opt\_pars} defines which parameters are meant to be optimized. In the \emph{createLBodeContPars}, if you choose \(opt\_tau \) as \(TRUE\) all \( \tau \) parameters will be added to the index\_opt\_pars array, the same idea is valid for \( n \) and \( k \) parameters. It is also possible to choose default values for lower and upper bounds for the parameters of a given type, e.g. \( \tau \) ( \emph{LB\_tau} and \emph{UB\_tau}), as well as a default initial value for such parameters. Once we have the ODE parameters structure we are ready to run a simulation or optimization process. To run a simulation we can use the \emph{getLBodeModel} or \emph{getLBodeDataSim}, depending on if we want to simulate only the signals present in the CNOlist object or all the species in the model. Additionally \emph{plotLBodeDataSim} or \emph{plotLBodeModelSim} will also return the values of a model simulation while plotting the same values. In figure \ref{fig:plotModelSimFig}, we use plotLBodeModelSim to plot all the experiments sampled in 5 different instants (\emph{timeSignals}). \begin{figure}[ht] \begin{center} <>= modelSim=plotLBodeModelSim(cnolist = cnolist, model, ode_parameters, timeSignals=seq(0,2,0.5)); @ \caption{A model simulation plotted with \emph{plotLBodeModelSim} .. \label{fig:plotModelSimFig}} \end{center} \end{figure} \clearpage As previously mentioned, we provide two optimization algorithms that allow parameter estimation Both of these algorithms have specific parameters that can be tunned on each specific problem (please check CNORode manual for detailed information). For instance, in order to run the genetic algorithm for 10 iterations and a population of size of 10, we can use the following code: <>= initial_pars=createLBodeContPars(model, LB_n = 1, LB_k = 0.1, LB_tau = 0.01, UB_n = 5, UB_k = 0.9, UB_tau = 10, random = TRUE) #Visualize initial solution simulatedData=plotLBodeFitness(cnolist, model,initial_pars) paramsGA = defaultParametersGA() paramsGA$maxStepSize = 1 paramsGA$popSize = 50 paramsGA$iter = 100 paramsGA$transfer_function = 2 opt_pars=parEstimationLBode(cnolist,model,ode_parameters=initial_pars, paramsGA=paramsGA) #Visualize fitted solution simulatedData=plotLBodeFitness(cnolist, model,ode_parameters=opt_pars) @ Model optimisation using eSS: <>= requireNamespace("MEIGOR") initial_pars=createLBodeContPars(model, LB_n = 1, LB_k = 0.1, LB_tau = 0.01, UB_n = 5, UB_k = 0.9, UB_tau = 10, random = TRUE) #Visualize initial solution fit_result_ess = parEstimationLBodeSSm(cnolist = cnolist, model = model, ode_parameters = initial_pars, maxeval = 1e5, maxtime = 20, local_solver = "DHC", transfer_function = 3 ) #Visualize fitted solution # simulatedData=plotLBodeFitness(cnolist, model,ode_parameters=fit_result_ess) @ \begin{center} \begin{figure}[ht] <>= simulatedData=plotLBodeFitness(cnolist, model, initial_pars, transfer_function = 3) @ \caption{The initial solution before optimization. Each row corresponds to an experiment with a particular combination of stimuli and inhibitors. The columns correspond to the measured values (triangles) and the simulated values (dashed blue lines) from a given signal. The background color gives an indication of squared difference where red means high error and white low error.} \label{fig:plotInitFit} \end{figure} \end{center} \begin{figure}[ht] \begin{center} <>= simulatedData=plotLBodeFitness(cnolist, model, ode_parameters=fit_result_ess, transfer_function = 3) @ \end{center} \caption{A solution obtained by optimization with a genetic algorithm. Each row corresponds to an experiment with a particular combination of stimuli and inhibitors. The columns correspond to the measured values (triangles) and the simulated values (dashed blue lines) from a given signal. The background color gives an indication of squared difference where red means high error and white low error.} \label{fig:plotFinalFit} \end{figure} \clearpage In addition to eSSR and genalg its is fairly easy to use any other continuous optimization algorithm. In the following example we show how to generate and use an the objective function in order to use it with a variant of eSSR(part of MEIGOR package) that uses multiple cpus: <>= library(MEIGOR) f_hepato<-getLBodeContObjFunction(cnolist, model, initial_pars, indices=NULL, time = 1, verbose = 0, transfer_function = 2, reltol = 1e-05, atol = 1e-03, maxStepSize = Inf, maxNumSteps = 1e4, maxErrTestsFails = 50, nan_fac = 1) n_pars=length(initial_pars$LB); problem<-list(f=f_hepato, x_L=initial_pars$LB[initial_pars$index_opt_pars], x_U=initial_pars$UB[initial_pars$index_opt_pars]); #Source a function containing the options used in the CeSSR publication source(system.file("benchmarks","get_paper_settings.R",package="MEIGOR")) #Set max time as 20 seconds per iteration opts<-get_paper_settings(20); Results<-CeSSR(problem,opts,Inf,Inf,3,TRUE,global_save_list=c('cnolist','model', 'initial_pars')) @ \section{Crossvalidation} CNORode offers the possibility to perform a k-fold cross-validation for logic-ode models in order to assess the predictive performance of our models. In k-iterations a fraction of the data is eliminated from the CNOlist. The model is trained on the remaining data and then the model predicts the held-out data. Then the prediction accuracy is reported for each iteration. Three different re-sampling strategies about how we can split the training and the test set: \emph{1)}Re-sampling of the data-points, \emph{2)}Re-sampling of the experimental conditions and \emph{3)}Resampling of the observable nodes. In the example below, we show an example about how we can apply the cross-validation analysis over a small toy case-study from Macnamara et al. 2012. <>= library(CellNOptR) library(CNORode) library(MEIGOR) # MacNamara et al. 2012 case study: data(PKN_ToyPB, package="CNORode") data(CNOlist_ToyPB, package="CNORode") # original and preprocessed network plotModel(pknmodel, cnodata) model = preprocessing(data = cnodata, model = pknmodel, compression = T, expansion = T) plotModel(model, cnodata) plotCNOlist(CNOlist = cnodata) # set initial parameters ode_parameters=createLBodeContPars(model, LB_n = 1, LB_k = 0, LB_tau = 0, UB_n = 4, UB_k = 1, UB_tau = 1, default_n = 3, default_k = 0.5, default_tau = 0.01, opt_n = FALSE, opt_k = TRUE, opt_tau = TRUE, random = TRUE) ## Parameter Optimization # essm paramsSSm=defaultParametersSSm() paramsSSm$local_solver = "DHC" paramsSSm$maxtime = 600; paramsSSm$maxeval = Inf; paramsSSm$atol=1e-6; paramsSSm$reltol=1e-6; paramsSSm$nan_fac=0; paramsSSm$dim_refset=30; paramsSSm$n_diverse=1000; paramsSSm$maxStepSize=Inf; paramsSSm$maxNumSteps=10000; transferFun=4; paramsSSm$transfer_function = transferFun; paramsSSm$lambda_tau=0 paramsSSm$lambda_k=0 paramsSSm$bootstrap=F paramsSSm$SSpenalty_fac=0 paramsSSm$SScontrolPenalty_fac=0 # run the optimisation algorithm opt_pars=parEstimationLBode(cnodata,model, method="essm", ode_parameters=ode_parameters, paramsSSm=paramsSSm) plotLBodeFitness(cnolist = cnodata, model = model, ode_parameters = opt_pars, transfer_function = 4) # 10-fold crossvalidation using T1 data # We use only T1 data for crossvalidation, because data # in the T0 matrix is not independent. # All rows of data in T0 describes the basal condition. # Crossvalidation produce some text in the command window: library(doParallel) registerDoParallel(cores=3) R=crossvalidateODE(CNOlist = cnodata, model = model, type="datapoint", nfolds=3, parallel = TRUE, ode_parameters = ode_parameters, paramsSSm = paramsSSm) @ For more, please information about the \emph{crossvalidateODE} function, please check its documentation. \begin{thebibliography}{} \bibitem{CellNOptR} C.~Terfve. \newblock CellNOptR: R version of CellNOpt, boolean features only. \newblock {R package version 1.2.0, (2012) http://www.bioconductor.org/packages/release/bioc/html/CellNOptR.html} \bibitem{alexopoulos_networks_2010} L.G.~Alexopoulos, J.~Saez-Rodriguez, B.D.~Cosgrove, D.A.~Lauffenburger, P.K~Sorger.: Networks inferred from biochemical data reveal profound differences in toll-like receptor and inflammatory signaling between normal and transformed hepatocytes. \newblock Molecular \& Cellular Proteomics: {MCP} \textbf{9}(9), 1849--1865 (2010). \bibitem{MMB} M.K.~Morris, I.~Melas, J.~Saez-Rodriguez. \newblock Construction of cell type-specific logic models of signalling networks using CellNetOptimizer. \newblock {\em Methods in Molecular Biology: Computational Toxicology}, Ed. B. Reisfeld and A. Mayeno, Humana Press. \bibitem{fuzzy2011} M.K.~Morris, J.~Saez-Rodriguez, D.C.~Clarke, P.K.~Sorger, D.A.~Lauffenburger. \newblock Training Signaling Pathway Maps to Biochemical Data with Constrain ed Fuzzy Logic: Quantitative Analysis of Liver Cell Responses to Inflammatory Stimuli. \newblock{\em PLoS Comput Biol.} 7(3) (2011) : e1001099. \bibitem{julio2009} J.~Saez-Rodriguez, L.~Alexopoulos, J.~Epperlein, R.~Samaga, D.~Lauffenburger, S.~Klamt and P.K.~Sorger. \newblock Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. \newblock {\em Molecular Systems Biology}, 5:331, 2009. \bibitem{wittmanJS} Dominik Wittmann, Jan Krumsiek, Julio S. Rodriguez, Douglas Lauffenburger, Steffen Klamt, and Fabian Theis. \newblock Transforming boolean models to continuous models: methodology and application to t-cell receptor signaling. \newblock BMC Systems Biology, 3(1):98+, September 2009. \bibitem{eSSR1} Egea, J.A., Maria, R., Banga, J.R. (2010) \newblock An evolutionary method for complex-process optimization. \newblock Computers \& Operations Research 37(2):315\-324. \bibitem{eSSR2} Egea, J.A., Balsa-Canto, E., Garcia, M.S.G., Banga, J.R. (2009) \newblock Dynamic optimization of nonlinear processes with an enhanced scatter search method. \newblock Industrial \& Engineering Chemistry Research 49(9): 4388\-4401. \bibitem{wittman} Jan Krumsiek, Sebastian Polsterl, Dominik Wittmann, and Fabian Theis. \newblock Odefy - from discrete to continuous models. \newblock BMC Bioinformatics, 11(1):233+, 2010. \bibitem{cvodes} R. Serban and A. C. Hindmarsh \newblock"CVODES: the Sensitivity\-Enabled ODE Solver in SUNDIALS," \newblock Proceedings of IDETC/CIE 2005, Sept. 2005, Long Beach, CA. Also available as LLNL technical report UCRL\-JP\-200039. \bibitem{macnamara_2012} A. MacNamara, C. Terfve, D. Henriques, B.P. Bernabe, and J. Saez-Rodriguez. \newblock State-time spectrum of signal transduction logic models. \newblock Phys Biol., 9(4):045003, 2012. \end{thebibliography} \end{document}