## Abstract

In the present chapter we discuss methodologies for the modelling, calibration and validation of cellular signalling pathway dynamics. The discussion begins with the typical range of techniques for modelling that might be employed to go from the chemical kinetics to a mathematical model of biochemical pathways. In particular, we consider the decision-making processes involved in selecting the right mechanism and level of detail of representation of the biochemical interactions. These include the choice between (i) deterministic and stochastic chemical kinetics representations, (ii) discrete and continuous time models and (iii) representing continuous and discrete state processes. We then discuss the task of calibrating the models using information available in web-based databases. For situations in which the data are not available from existing sources we discuss model calibration based upon measured data and system identification methods. Such methods, together with mathematical modelling databases and computational tools, are often available in standard packages. We therefore make explicit mention of a range of popular and useful sites. As an example of the whole modelling and calibration process, we discuss a study of the cross-talk between the IL-1 (interleukin-1)-stimulated NF-κB (nuclear factor κB) pathway and the TGF-β (transforming growth factor β)-stimulated Smad2 pathway.

## Introduction

Speaking in general terms, systems biology brings the quantitative disciplines associated with the physical sciences to the world of biological sciences. Viewed in this broad light, the term systems biology can take on a wide variety of interpretations (see [1–4]), all of which have a certain validity. However, in all attempts at definition of systems biology a common theme emerges: the use of quantitative methods of mathematical analysis and modelling in a way that enables the investigation of dynamical performance. Along with mathematical methods for representing, analysing and controlling objects, systems biology also brings with it the philosophy of a systems approach: an approach which attempts to view the behaviour of a process as the integrated sum of its parts.

The idea of a systems approach offers a helpful counterbalance to the qualitative and reductionist methods that are often necessary within the environment of the experimental biologist. In particular, systems biology offers a unifying framework for integration and quantification of information from the disparate parts of a biological system. With respect to the quantification, important recent systems biology efforts are focused upon accurately quantifying measurements [5] and the associated instrumentation. The integration of biological knowledge on the other hand finds its most useful expression in mathematical modelling [6,7]. As discussed in the present chapter, a mathematical model provides a framework for integration of experimental results from numerous disparate sources. As part of this process, static and descriptive cell-signalling diagrams are turned into mathematical representations where the dynamics of the signalling elements can be studied within a single model based upon quantitative measurements.

Our aim in the present chapter is to outline the steps and methods that are commonly used to obtain a mathematical model of the dynamics of a cellular signalling pathway. The first step is to choose the appropriate modelling framework for the signalling pathway under consideration: this process is discussed in the section on ‘Modelling frameworks’. As the construction of a mathematical model depends on knowledge of the biochemical components and the associated reactions involved, we also show how to access the electronic resources for obtaining this information in the section on ‘Collecting part lists’. Finally, the mathematical model must be calibrated and validated against experimental observations and data. The methods and challenges in this step are discussed in the section on ‘Model calibration and validation’.

## Modelling frameworks

The construction of a mathematical model that is representative and yet manageable is an art that has been extensively developed in the physical sciences. Thus biologists wishing to obtain maximum value from mathematical modelling are strongly advised to review the systematic methods that have been developed for physical system modelling [8]. The modelling procedures followed in the biological sciences have many similarities with modelling the physical world, but there are specific differences which will be outlined here.

The first choice presented in biological system modelling is which form of model is appropriate for the signalling pathway under consideration. The various model choices that are possible are indicated in Figure 1. Starting with the left-hand column (deterministic or stochastic), there is a distinction between deterministic reactions that are assumed to take place in uniform biochemical environments, such as might be found in a biochemical reactor, and those where there is a paucity of molecules to participate in the reactions. In this latter case, an element of randomness occurs and a stochastic model form is required. However, deterministic models are simpler and faster in computation, so that they are usually the starting point for model construction, until experimental results indicate otherwise.

Considering the second column from the left (continuous or discrete time), most signalling pathway models use continuous time models based upon ODE (Ordinary Differential Equation) sets. Discrete time equations are only infrequently appropriate and for current purposes can be ignored. Models based upon PDE (Partial Differential Equation) sets are relevant for the case in which the various biochemical reactions are not strictly segregated. For example, Amonlirdviman et al. [9] employed PDEs in order to describe a pathway (planar cell polarity signalling) where a protein in one cell interacts with another in the neighbouring cells [9]. Both continuous and discrete time forms can be either deterministic [10,11] or stochastic [12,13].

Considering the third column from the left (continuous state or discrete state), since reactions usually occur in a regular and repeatable sequence, then continuous state models are the form normally used. However, in certain cases, there are biochemical reactions that occur only under specific logical conditions, such as expression of a gene or at a threshold protein concentration. Such reaction sequences would involve discrete state models. However, such discrete events only usually represent a part of a mainly continuous reaction sequence within a signalling pathway. Such a system can generally be represented as a hybrid model (see the far right-hand column of Figure 1) that combines continuous dynamics of the pathway as described by the ODE, and automata or finite state machines (as in [14,15]) for the discrete states. Figure 1 is only indicative of the modelling subdivisions and alternatives. Other helpful classifications are given elsewhere [16,17].

### Deterministic ODE and PDE models

In the present chapter we will focus on the most commonly encountered model forms. These are the deterministic and stochastic continuous time model frameworks. Of these, the ODE approach is the most frequently used and the most straightforward to understand. We will therefore begin the discussion with an illustrative example of this most commonly encountered model in its deterministic form. The discussion will then pass to a more formal presentation of the general steps in ODE modelling of pathway dynamics in both the deterministic and stochastic forms.

#### An illustrative example

The basis of deterministic modelling is the application of the mass action law and various approximations [18,19] to the sequence of biochemical reactions that occur in a signalling pathway. The mass action law was introduced in the 19th century by Guldberg and Waage [20] and states that the rate of a chemical reaction is proportional to the probability that the reacting species (molecules) collide. This collision probability is in turn proportional to the concentration of the reactants.

To illustrate the use of the mass action law for the mathematical representation of biochemical reactions, we will consider five biochemical species (A, B, C, D and E) whose reactions are represented schematically in the prototype signalling pathway shown in Figure 2. These reactions can be written as follows (eqns 1 and 2):
where *k*_{1}, *k*_{2}, *k*_{3} and *k*_{4} are the reaction constants (i.e. the rate of binding). Here, biochemical species *A* and *B* combine at a rate of *k*_{1} to form a new species *C*. In turn *C* breaks down into species *A* and *B* at rate *k*_{2}, and also into *A* and new species *D* at rate *k*_{3}. The new species *D* then breaks down into a further new species *E*. The rate constants (*k*_{1}, *k*_{2}, *k*_{3}, *k*_{4}) determine how much of each biochemical species is formed over time.

For each reaction in eqns (1) and (2), the biochemical species on the left-hand side are called reactants, and those on the right-hand side are products. Based on the mass action law, the rate of the reaction is directly proportional to the product of the reactant concentrations (eqns 3–6) [18]: where the brackets indicate the concentration of the species–typically expressed in nM. The rate of change of the concentration of each biochemical species is determined by the net rate of the production and consumption reactions and the stoichiometry.

A central concept in mathematical modelling of a dynamical system is the state of a system [8]. The system states are important because they give the minimal set information required to completely determine the behaviour of the system over the course of time. In signalling pathway models, the normal choice for the set of states is the set of concentrations. Thus in our illustrative example, the state x denotes the concentration vector of the biochemical species: x = ([*A*,[*B*],[*C*],[*D*],[*E*])^{T}, such that (eqn 7):
If the object *A* is an enzyme, we can approximate eqn (7) in a simpler form using a Michaelis–Menten approximation [19]. This is possible if we can assume that the concentration of complex *C* formed by *A* and *B* does not change significantly with respect to other species. This is called a quasi-steady state since no apparent change of reaction is observed. In other words, at the quasi-steady state, we have (eqn 8):
So, we have (eqn 9):
from eqns (7) and (8). Moreover, since the total concentration of enzyme *A* should not change, we have (eqn 10):
From eqns (9) and (10) we derive eqn (11):
where is called a Michaelis–Menten constant.

If we substitute eqn (11) into eqn (7), then we have (eqn 12): Eqn (12) can be further reduced to the following dynamical equation (eqn 13) having three state variables:

Thus we have two possible representations of the reaction sequence illustrated in Figure 2. One based upon mass action kinetics (eqn 7) and the Michaelis–Menten form, eqn (13), that is valid under certain quasi-steady-state conditions. In general, biochemical reaction networks are large and complex. So, they are usually represented by differential equations with a large number of state variables. It is thus important to recognize which equations may be approximated by simpler forms by the use of methods such as the Michaelis–Menten approach.

#### A general ODE formulation

The example given in the previous section illustrates the key steps in constructing a mathematical model of the signalling pathway dynamics. In general a signalling pathway will be more complex than this example and to this end we will describe the general steps that would occur in constructing an ODE model. The reaction sequences in the illustrative example can be generalized for any number of reactions by noting that any elementary reaction *j*can be written as (eqn 14):
where the *x*_{k}, for *k* = 1,…, *N*, are the concentrations of biochemical species and in reaction *j*; *s*_{2,j}^{in} = stoichiometric coefficient of the *i*-th species/variable acting as the reactant; *s*_{i,j}^{out} = stoichiometric coefficient of the *i*-th species acting as the product; *i* = 1,…, *N*; and *j* = 1,…, *P*

The stoichiometric coefficients of each species is equal to the number of molecules of species *i* involved in reaction *j*. Since the reactions in the signalling pathway are not only mathematically transformed using the mass action law, but also using the assumption of Michaelis–Menten kinetics [18,19] as discussed in the illustrative example, we separate the reactions according to their kinetic assumption by adding subscripts to their reaction (*R*) and rate constant (*k*): *ma* for mass-action based kinetics and *MM* for Michaelis–Menten kinetics. Similarly, we will also distinguish reactions involving the ligand input as *R*_{ip} and the rate constant as *k*_{ip}. The pathway being studied can now be represented as a set of non-linear differential equations (also referred to as the dynamical state equations), thus (eqns 15a, 15b and 16):
in which *s*_{i,j} is an element of the stoichiometry matrix *S* that takes integer values (*S*⊆*Z*^{P×N}). Then (eqns 17a and 17b):
Where the subscripts *MM* and *ma* indicate Michaelis-Menten and mass action reactions respectively. The vector function v_{MM} is described in [18,19], whereas the vector function v_{ma} is derived in [21] as (eqn 18):
where the matrix *S*_{in} has *s*_{i,j}^{in} as its elements. Here, exp(.) and log(.) are element-wise matrix operations. Elements of v_{ma} are typically formulated as (eqn 19):
*x*_{i,j}^{in} is species *i* acting as reactant in reaction *j*. *g* (x, k_{ip}) is defined as follows (eqn 20):
Where *B* is a stoichiometry matrix for the input. In eqn (20) both exp (.) and log (.) are again element-wise operations. Note that the system in eqns (15a) and (15b) assumes that the ligand–receptor binding has 1:1 binding stoichiometry.

The ODE set eqn (15) is our deterministic continuous time mathematical model of the signalling pathway. It describes the change of each biochemical species in the pathway due to production, consumption and degradation of the species over time. The ODE approach can also be used to describe the transfer of biochemical species from one compartment to another, such as from cytoplasm to nucleus [22], in the following circumstances: (i) the compartments are well stirred and (ii) the rates of transport between compartments are observable. A well stirred compartment means the biochemical species are evenly distributed in space.

When these conditions are not satisfied, a PDE becomes necessary to account for the spatial distribution of the biochemical species. A full description of the PDE formulation is beyond the scope of the present chapter; however, in general terms we note that it is based on a diffusion–reaction representation. In this form, and accounting for the spatial distribution of the biochemical species in the pathway, the system in eqn (15) would be rewritten as (eqn 21): where the elements of vector μ denote the diffusion rate constant. For further information on PDE methods and spatial modelling based on diffusion–reaction equations see [23].

### Stochastic models of signalling pathways

The deterministic ordinary differential approaches described in the previous section give a mathematical model in which, for any starting condition, the way in which each of the states (the concentrations of the various biochemical species) varies can be exactly predicted. The underlying assumption of the deterministic model is that the concentrations of the various chemical species is high and that at any reaction point they are well mixed. However, this assumption is not always correct in cellular reactions. Specifically, as the numbers of molecules (or concentrations) becomes lower, the variability of the molecular population in each stage of a signalling pathway increases [12], and with it comes random variations in the reaction processes. The interpretation of what constitutes a ‘low’ number of molecules is a subject of debate. However, Klipp et al. [24] have suggested that when the number of molecules is of the order of dozens or hundreds, then the random element in reactions becomes significant, and a stochastic differential equation method should be employed.

#### Stochastic framework

A key difference between the deterministic modelling and stochastic modelling is that instead of accounting for the change in concentration of each species over time (as in the deterministic case), in stochastic modelling we track changes in the number of molecules of each of the biochemical species. With this key difference in mind, we denote the number of molecules of each species at time *t* as the random variable:
Thus for example, in the stochastic version of the illustrative exercise above, the state vector will become *Z*(*t*)=(#A, #B, #C, #*D*)^{T} with the initial state of the system *Z*(*t*_{0}) = z_{0}. When the first reaction occurs, the system will move from state z_{0} to a new state *z** shown below (eqn 22):
Here the change in the number of molecules for each species from state *z*_{0} to*z** is equal to its corresponding stoichiometry. This state transition depends on the probability that the changes due to the first reaction occur as described by an object termed the propensity function. For any reaction *j* as in eqn (14), the propensity function is (eqn 23) [25]:
This state transition results in a change in the number of molecules of each species. As previously shown in eqn (22), the change in the number of molecules for the state variables, due to any reaction *j* in eqn (14), is equivalent to the net stoichiometry, defined as (eqn 24):
We can now describe the transition to state *z* from another state, due to the *j*-th reaction as (eqn 25):
whereas the transition away from state *z* caused by the *j*-th reaction is (eqn 26):
Let us define the following as the probability that each species exists in *z* number of molecules at any time *t* (eqn 27).
The Chemical Master Equation then describes the time evolution of the above probability by taking into account the probability that any reaction *j* occurs (the propensity function) (eqn 28):
This equation tracks the change in the probability that the system variable *Z* exists in some number of molecules as a result of all of the reactions involved in the pathway. Unfortunately, the Chemical Master Equation grows exponentially as the number of species increase. It essentially creates probability variables for each possible state (possible number of molecules for each species). To overcome this problem Gillespie [26] developed an algorithm to efficiently simulate the Chemical Master Equation.

#### Stochastic simulation

The key idea in the Gillespie algorithm is to calculate the individual trajectory of the species, instead of solving for the individual state transition probabilities. In practical terms this means that at each iteration, the algorithm determines the propensity function, *a*_{j}, given as (eqn 29):
where *c*_{j} is the stochastic rate constant (eqn 30):
Here, *N*_{A} is the Avogadro’s number and *k*_{j} = ∑_{i}*s*_{i,j}^{in}.

As we can see in eqn (30), the stochastic rate constant is related to the deterministic rate constant, since both the Chemical Master Equation and mass-action law were derived using molecular physical properties.

With the propensity equation defined in eqn (29), it is possible to delineate the Gillespie algorithm [27]:

1. Initialization: load rate constants,

*c*_{1},*c*_{2},…,*c**j*and use random number generator to assign initial numbers of molecules for each species.2. Compute

*a*_{j}for each reaction*R*_{j}.3. Calculate the combined propensity,

*a*^{*}= ∑_{j}*a*_{j}.4. Generate uniform random numbers,

*r*_{1},*r*_{2}.5. Compute

6. Determine μ such that ∑

_{j=1}^{μ−1}*a*_{j}≤*r*_{2}*a*^{*}≤ ∑_{j=1}^{μ}*a*_{j}.7. Update the number of molecules and set

*t*=*t*+ ι.8. Return to Step 2, unless

*t*≥*t*_{max}*or a*^{*}= 0.

For more details of the algorithm and proofs of eqns (29) and (30) see [12,26,28]. The computer implementation of the Gillespie algorithm is a non-trivial task. Thus it is useful to note that several biochemical simulation software suites have built-in implementations of the Gillespie algorithm, these suites include Copasi [29], Dizzy [30] and STOCKS [31].

## Collecting ‘part’ lists

Writing the equations for a mathematical model, as explained in the previous section, is only the first part of the modelling task; values must be assigned to the various coefficients and parameters in the mathematical model. This calibration stage of modelling requires knowledge of biochemical interactions, stoichiometry of the components in the reactions and the rate constants. It is rare for a single laboratory to have the resources to determine all the parameters required in a model. In this spirit, it is customary to collect parameter information from appropriate biological *in vitro* and *in vivo* experiments as they have been reported in the literature. This information is often available in web-based databases which have consolidated published information on interactions between proteins, DNA and DNA–protein.

Mathematical models of cell signalling dynamics require reaction rate information and unfortunately the number of reaction rate constants reported or collected in the databases is low. This is mainly due to experimental limitations in biology and to the dependence of rate constants on various environmental factors and variations in experimental technique. As a result, most of the rate constants in a dynamical model are obtained through the calibration process discussed in the section on ‘Model calibration and validation’ below. As a precursor to this, we describe some popular and easily accessible databases.

### Protein interaction and cellular pathway databases

A growing number of databases are available which contain information useful in signalling pathway modelling. Such databases include the list of interactions, the rate constants (estimated and obtained elsewhere), and, in some cases, the equations. As a guide to what is available, Table 1 is a list of some useful databases of biochemical interactions, the rate constants and mathematical models. This list is presented with the caveat that the internet changes rapidly, so that the location and content of Table 1 may change from time to time.

The availability of easily accessible databases is of great value, but the parameters taken from these databases should be used with care. Specifically, it is important to remember that the biochemical reactions involved in a pathway can vary between organisms. In addition, the reliability of the reported biochemical reactions needs to be carefully taken into account, as biological systems have high variability.

The issue of model complexity is again relevant here. One should include in the pathway model only that level of biochemical detail that is absolutely necessary. Once an initial model has been formulated, then Occam’s Razor should be applied to ensure that the model is no more complex than necessary. The simpler the model the more straightforward it will be to calibrate and implement. Even if the model is simpler than thought biologically necessary, it still might be more appropriate than a complex model in which many of the parameters are unknown or of poor quality. In general, the choice of model complexity is based on a balance between the biological question under study and the feasibility of building and parametrizing the model.

### Modelling representation tools

With the increasing popularity of mathematical modelling as a tool to analyse biological systems, so efforts have been made to enable sharing of signalling pathway models in a standardized format. Representative of this standardization is the development of an XML-based markup language called SBML (Systems Biology Markup Language) [32] and CellML [33]. The standardization that SBML and CellML provides has led to an increasing exchange of models.

In the section on modelling frameworks above we discussed the various methodologies for deriving a mathematical model of cell signalling dynamics. However, there are a number of computational tools that are available which assist in model building. These tools are primarily based on graphical representations of biochemical (cell signalling and metabolic) and gene networks with a range of built-in facilities. These include the automatic generation of mathematical models (discrete, ODE, PDE, hybrid, etc.), the simulation and graphical presentation of their dynamical behaviour and the numerical analysis of the simulation results.

A summary of some popular software systems is given in Table 2. These tools are compatible at varying degrees with popular scientific and mathematical software such as MATLAB, Maple and Mathematica, and interface (import/export) with SBML. In the event that the researcher wishes to work in MATLAB, but will need some degree of standardization for biology, then the freeware Systems Biology Toolbox is recommended [34].

### Example

In order to illustrate the procedures described thus far, we now review the mathematical modelling of an example pathway. Specifically, consider the task of modelling the cross-talk between two pathways: IL-1 (interleukin-1)-induced NF-κB (nuclear factor κB) and TGF-β (transforming growth factor β)-induced Smad pathways. IL-1 acts as a ligand in the pathway, and it belongs to the cytokine family, a group of proteins similar to hormones that are involved in the innate immune system. Current known functions of IL-1 cytokines include raising body temperature, controlling lymphocytes and increasing the number of bone marrow cells. IL-1 can be secreted by many types of cells, but primarily by macrophages. Downstream of IL-1 is NF-κB, a protein complex (composed of two or more proteins) that is involved in regulating immune response to infection. Aberrant activation of NF-κB has been linked to cancer, inflammatory and autoimmune diseases, septic shock and viral infection, amongst other conditions. NF-κB functions as a transcription factor so, in the absence of ligand stimulus, it is kept inactive in the cytoplasm, by binding to an inhibitory protein called IκB (inhibitor of NF-κB). NF-κB will translocate to the nucleus if the upstream signalling components degrade the IκB protein.

There are various ligands, including IL-1, that can set NF-κB free to enter the nucleus and to initiate gene transcription. The exact mechanisms of NF-κB activation remain a subject of debate; however, one current understanding of the NF-κB mechanism is illustrated in Figure 3.

TGF-β is a cytokine which has dual roles: tumour suppressor and promoter [36]. It was first found to elicit signals that transform the growth of fibroblasts in culture into cancer cells. On the other hand, subsequent studies showed that TGF-β can also inhibit cell growth. Downstream of TGF-β is the Smad protein which, upon TGF-β receptor activation, is phosphorylated and consequently dimerized. The Smad dimer then translocates to the nucleus and initiates gene transcription.

Lu et al. [37] found that there is cross-talk between the TGF-β/Smad pathway and the IL-1/NF-κB pathway in a dose-dependent manner. The cross-talk mechanism is as follows: the two cytokines will activate their canonical pathways when they are present in low doses. However, when the cytokines are present in high doses, IL-1 will activate both NF-κB and Smad. Likewise, TGF-β will activate Smad and NF-κB. Although it is not fully elucidated, the cross-talk seems to occur in the receptors. That is, IL-1 receptors can only bind to the TGF-β receptors when the respective cytokines are present in high doses. In this example, the role of the mathematical model will help to understand the dose-dependent cross-talk between the two pathways.

The detailed biochemical reactions involved are shown in Figure 4. Lu et al. [37] discovered that the cross-talk between TGF-β/Smad and NF-κB pathways required activation of four adaptor proteins: IRAK (IL-1-receptor-associated kinase), MyD88, TRAF-6 (tumour-necrosis-factor-receptor-associated factor 6) and TAK1 (TGF-β-activated kinase 1). These adaptor proteins are included in the system and modelled using mass-action kinetics. Although evidence for the sequential interactions between these adaptors is still not consistent, evidence from [38,39] is used to support the model sketched here.

In the model, MyD88 is recruited to the activated IL-1 receptor (ILR*). The resulting complex then recruits MyD88, IRAK and subsequently TRAF-6. Upon TRAF-6 recruitment, the IRAK–TRAF-6 complex dissociates and recruits TAB2 (TAK1-binding subunit 2) and TAK1. Some studies suggest that IRAK is ubiquitinated upon TAB2 recruitment to the complex. However, since the ubiquitination mechanism is not yet elucidated, this process can be omitted and IRAK be assumed to degrade at this point. Once TAK1 is phosphorylated, it can activate IKK (IκB kinase), which then frees the NF-κB dimer from its inhibitor, IκB.

A common starting point in any modelling exercise is to review the literature and adapt existing models. In this case we take the NF-κB model (starting with IKK activation onwards) from Hoffmann et al. [40]. The Hoffmann model takes into account three isoforms of IκB: IκBα, IκBβ and IκBγ. In our model, we only consider IκBα since it is the most important player in the pathway that we are studying.

In the TGF-β/Smad pathway, we partly adapt a model from Clarke et al. [41]. In the adapted model, TGF-β binds to TβRI (TGF-β receptor I) and subsequently recruits the second receptor (TβRII). These receptor complexes phosphorylate each other and create a docking site for Smad2. As a result, Smad2 is recruited and phosphorylated by the complex. Once it is activated, Smad2 dissociates from the receptor complex and forms a dimer with Smad4. Together, the dimer complex translocates to the nucleus. As described by Clarke et al. [41], Smad2 can shuttle between the nucleus and cytoplasm with or without the presence of its ligand. The biochemical reactions, along with the rate constants from the literature are listed in Table 3.

## Model calibration and validation

As mentioned previously, where it is possible, the model coefficients and parameters are taken from web-based databases. However, for those rate constants that are not available from the literature or databases, the values need to be estimated using biological data. In this process of model calibration (known as system identification in the control and systems literature [49]), we use measured time course and related data for the biological process to estimate unknown parameters. The parameters to be estimated often include not only the rate constants, but also the initial conditions [*x*_{0} = *x*(*t*_{0})]. The mathematical procedures involved in calibration can be quite involved [50], thus for the purposes of the present chapter we only sketch the steps involved.

If we collect all of the rate constants and initial conditions to be estimated into a vector, θ, the estimation problem is typically formulated as a weighted least squares optimization (eqn 31):
where **y**^{Exp}(*t*_{d}) is the experimental data vector at time *d* (*d* = 1,…, *D*), **y** (*t*_{d}, θ, **u**) is the predicted output from the model at time d and Q is a diagonal matrix that weighs/scales the measurement errors.

Owing to the non-linear nature of the system described by eqn (15), the minimization problem in eqn (31) is often multimodal or non-convex [51]. As a consequence of this, the global optimum may not be found using standard optimization algorithms, such as Lavenberg–Marquadt or the Gauss–Newton methods. Stochastic methods such as simulated annealing, and genetic and evolutionary algorithms have been reported to perform well in finding the global optimum [51,52], albeit at the expense of computational cost. However, even the most sophisticated identification methods can still encounter local optimum and it is wise to use a range of methods. This can be done using one of the many software packages which contain calibration and identification methods, such as Copasi [29] or SBMLPet [53].

Alternatives to global optimization methods are: (i) the multiple shooting approach [54–56] where the state variables of the dynamics are discretized, (ii) comprehensive sampling of the whole parameter space to find sets of solutions that give qualitatively similar output behaviour [41,57], or (iii) to fit the parameters ‘manually’ starting from a reasonable parameter set [7] and sequentially inspecting different parameter sets for suitability. This manual fitting is extremely time consuming and is only suggested when the number of unknown parameters to be estimated is small.

The challenge in parameter estimation of the kind mentioned in the preceding paragraphs is not only to find the global solution, but also to obtain appropriate data. Specifically, data are not bountiful in biological systems, where the most common experimental technique, quantitative Western blot analysis [58], provides only relative concentration time profiles in terms of intensity values [59]. This means that the resulting intensity values can be compared only with values of the same band or blot, which typically consists of the same protein measurements from one experimental run at different time points. Figure 5 shows a typical Western blot and the corresponding quantification. The availability of data, which also holds for most techniques, irrevocably depends on the availability of corresponding antibodies to probe any given protein.

In the Western blot technique, the measured proteins are obtained from a population of cells, and thus the data describe the average behaviour of the cells. Other methods, such as flow cytometry, are capable of probing the protein one cell at a time, yielding the relative protein amount per cell. Despite intense development and rapid progress in bio-instrumentation, currently available measurement techniques do not allow direct measurement of system variables. This causes a range of problems, most notably an inability on the part of the calibration tools to find a unique set of parameter values. This problem, termed lack of identifiability, is common in systems where there are few data points available compared with the number of unknown coefficients. In addition, where there is feedback within the biochemical reaction sequences then some coefficients become effectively hidden (or unidentifiable) from external measurements [60]. A result of these phenomena is that measurements typically correspond to linear combinations of two or more variables.

For the system described in eqn set (15), the output can be expressed as:
where *int* (.) denotes a function of intensity, and *C*∈ℝ^{m×n} is a matrix that selects the output as a linear combination of the variables. The variable *x* is a function of time (t), rate constants (*k*) and input (*u*), but these notations are dropped for representation purposes. In order to estimate the parameters, the model output, *y*, needs to be formulated appropriately according to the observed data. This formulation depends on the nature of the data; an issue which is discussed below.

#### Case 1: data are in terms of absolute protein amount

The absolute amount of protein can be obtained if the purified form is available [5]. In this case, one would generate a ‘standard curve’, a known titration of the purified protein, alongside the protein measurements (from the cell lysates) that are obtained under experimental conditions. Ideally when using Western blot analysis, the standard curve should be generated for every protein and every blot. If the intensity values of the protein from the lysates fall within the linear range of the standard curve, then there is a linear relationship between the intensity value and the total protein amount or concentration. That is (eqn 33): where α is a vector estimated from the standard curve for each measured protein. Then the output to the system in eqn set (15) can simply be written as (eqn 34):

#### Case 2: data are in terms of intensity value that falls within the linear range

Since generating the standard curve for every experimental run and every measured protein may not be possible, one can then just generate one standard curve. If the resulting intensity values are still within the linear range, the output is then formulated as in eqn (34). However, when estimating the parameters, α needs to be estimated as well. The objective function then becomes (eqn 35):

#### Case 3: data are in terms of relative amount

In the case where purified protein is unavailable, the measured intensity is often normalized to a reference time point. This reference point is typically the maximum intensity within the band or the first value of the band. Let *t*_{r} be the reference point chosen for each measured protein and *C*_{j} be the *j*-th row of matrix *C*(*j* = 1 … *m*). The system output is then (eqn 36):

### Example: calibration of IL-1/NF-κB and TGF-β/Smad pathways

We have previously discussed the biochemical reactions in the dose-dependent cross-talk pathways of IL-1/NF-κB and TGF-β/Smad. Since not all parameters are available in the literature, the unknown parameters need to be estimated using experimental data. This can be done by taking time course data (in this case from Lu et al. [37]) and as shown in Figure 6. The biological experiments associated with Figure 6 were conducted using HEK (human embryonic kidney)-293 cells, where either the gene for MyD88 or IRAK, protein adaptors upstream of NF-κB, is knocked down (eliminated). This means that any biochemical reactions downstream of these adaptor proteins will not take place, allowing us to focus on one-half of the pathway (IL-1/TGF-β/Smad). The cells were treated with a high dose of IL-1 for the indicated time duration (Figure 6), and pSmad2 (phosphorylated Smad2) abundance was measured using Western blot analysis.

With the knockdown condition, we have 17 rate constants and seven initial conditions to be estimated using two sets of time course data (each with eight data points). This paucity of data compared with the number of parameters to be determined is a typical example of poor identifiability mentioned previously. A number of methods exist for tackling this problem [61]. An appropriate method in this case is to decrease the parameter space, by placing realistic constraints on the upper and lower values of the parameters. For the association constant, the lower and upper bound are set to 10^{−4} and 10^{−4} nM^{−1}·S^{−1} respectively. These constraints are from [62], where it is indicated that the association of protein molecules to dimers or larger complexes typically occurs within this range. For the dissociation constant and the initial conditions, the lower bound is zero. With most signalling pathway processes occurring in seconds [63], the dissociation rate should be lower than minutes.

The estimation of the unknown parameters is then performed using the form of technique discussed previously. Thus, for example, if a typical gradient-based method is used to estimate the parameters, we iteratively estimate the parameters using different initial or seed values in order to avoid convergence to a local optimum. These seed values are generated randomly from the specified range as discussed above. The response from the best objective function is shown in Figure 7. Here, the model output follows the trend of the data. The gradient-based method is mentioned here as an example, and other algorithms, such as those mentioned earlier, could also be used.

## Conclusions

In the present chapter we have presented, albeit in review form, the different frameworks for modelling signalling pathway dynamics. We have noted that models based upon deterministic ODEs are the most popular and most tractable approach. We have also demonstrated the typical techniques to develop mathematical models from the biochemical representation of the pathways using chemical kinetics. The main challenge in this process is to determine the right mechanism for the biochemical interactions (the chemical kinetics) as well as obtaining the rate constants.

There is also the issue of how much detail should be included, since an activation of any protein can involve numbers of proteins and reactions in which some of the reactions may only be valid in specific cell lines. Incorporating more details leads to an increasing number of parameters to be estimated. However, if an important biochemical reaction is not included, the overall system behaviour can change dramatically. Thus model complexity needs to be given special attention with Occam’s Razor, whereby: one should not increase, beyond what is necessary, the number of entities required to explain anything. We have also discussed the calibration and validation of the mathematical models. As the current state of the art in instrumentation does not allow direct measurement of individual protein concentrations, identification becomes necessary with the attendant problems of identifiability always a major issue. The signalling pathway model example we have given is typical of the complexity to be anticipated, whereby models contain a large number of unknown parameters.

We have outlined the calibration process and noted the interplay between the ease of parameter estimation, the amount of available data and the number of unknown parameters. Here we have re-emphasized that it is often better to start with a model which oversimplifies the biology. Even though the model is too simple it will at least be mathematically and computationally tractable and can form a starting point for investigations. As demonstrated by Swameye et al. [55], a simple model that is properly calibrated is better than none, and can form the first step in understanding the complex behaviour of signalling pathways.

## Summary

• The different frameworks for modelling signalling pathway dynamics are presented.

• Models based upon deterministic ODEs are the most popular and most tractable approach.

• The typical techniques to develop mathematical models from the biochemical representation of the pathways using chemical kinetics are demonstrated.

• The calibration and validation of the mathematical models is discussed.

• We have outlined the calibration process and noted the interplay between the ease of parameter estimation, the amount of available data and the number of unknown parameters.

• It is often better to start with a model which oversimplifies the biology.

## Acknowledgments

We are grateful to Dr Radina Soebiyanto for help in assembling the material for this work and working through the simulations. We thank Professor George Stark and the Stark Laboratory for the data on IL-1/NF-κB and TGF-β/Smad2 pathway. The authors of the present chapter were funded by NIH (National Institutes of Health; grant numbers K25 CA 113133 and U56 CA 112963) and the Case Provost Opportunity Fund Award for the Systems Biology Center of Excellence Initiative (to S.N.S.), the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea Ministry of Education, Science and Technology through the Systems Biology grant (M10503010001-07N030100112), the Nuclear Research Grant (M20708000001-07B0800-00110) and the 21C Frontier Microbial Genomics and Application Center Program (to K.-H.C.), and the Science Foundation Ireland (grant number RP 03/RP1/I383) (to P.W.)

- © The Authors Journal compilation © 2008 Biochemical Society