## Abstract

Mathematical models are central in systems biology and provide new ways to understand the function of biological systems, helping in the generation of novel and testable hypotheses, and supporting a rational framework for possible ways of intervention, like in e.g. genetic engineering, drug development or treatment of diseases. Since the amount and quality of experimental ‘omics’ data continue to increase rapidly, there is great need for methods for proper model building which can handle this complexity. In the present chapter we review two key steps of the model building process, namely parameter estimation (model calibration) and optimal experimental design. Parameter estimation aims to find the unknown parameters of the model which give the best fit to a set of experimental data. Optimal experimental design aims to devise the dynamic experiments which provide the maximum information content for subsequent non-linear model identification, estimation and/or discrimination. We place emphasis on the need for robust global optimization methods for proper solution of these problems, and we present a motivating example considering a cell signalling model.

## Introduction

Systems biology offers unique opportunities for applying systems engineering methods, which are based on mathematical models. Correspondingly, new types of biological problems are motivating new research in theoretical systems engineering [1–4]. Systems biology requires a close coupling of theoretical work and experimental analysis, where experiments, modelling and analysis proceed iteratively [5–8].

Systems biology addresses the missing links between molecules and physiology by discovering how function arises in dynamic interactions [9]. Its ultimate objective should be to explain how the components within a cell interact dynamically, and then how cells interact, resulting in the observed structure, organization and function [10]. Dynamics play a key role in cellular processes. For example, some signalling pathways encode information not just as protein concentrations, but via changes in the dynamics of those concentrations [11,12]. Thus a common approach to model these inter- and intra-cellular dynamic processes is by means of dynamic mathematical models, usually consisting of sets of differential equations. Modelling approaches in systems biology are usually classified as top-down and bottom-up strategies [6,8,9]. Stelling [6] further classifies the approaches to the mathematical modelling of cellular networks in three groups: (i) interaction-based (static models with no stoichiometry and no parameters), (ii) constraint-based (static models, with stoichiometric information, but no parameters) and (iii) mechanism-based (dynamic models, with stoichiometric information and kinetic parameters). Stelling [6] also argues that mechanism-based modelling is the most obvious candidate for achieving a system-wide understanding, but not by simple scaling to the genome level. He proposes the combination of present modelling methods with the biologically important themes of modularity, optimality and robustness [13]. The ultimate outcomes are mechanistic models that, via hierarchical composition of modules and possible abstractions at every level, allow a consistent link between the molecular and cellular levels without the need to represent every detail. Wolkenhauer and Mesarović [10] state that we can only understand cell function as a dynamic system, but they also note that these dynamic models of biological systems are phenomenal constructions where the interactions among system variables are defined in an operational rather than a mechanistic way.

Therefore there is a clear need for sound procedures to build these dynamic models of biological systems from the vast amount of data generated from the different omics. System identification, a key area in systems engineering, deals with the development of mathematical models of dynamic systems from input/output data [14,15].

Model building is an iterative process, usually represented as a cycle (Figure 1), which must start from the definition of the purpose of the model. That is, the goal (purpose) and application of the model conditions the selection of the modelling framework, i.e. models must start with questions to be addressed [16]. In the next step, using the *a priori* available knowledge and preliminary experimental data, a modelling framework is chosen and a first mathematical model structure is proposed. This first model usually contains unknown, non-measurable parameters that may be estimated by means of experimental data fitting. In this regard, we need to know whether it is possible to uniquely determine their values (identifiability analysis) and, if so, to estimate them with maximum precision and accuracy (parameter estimation step). This leads to a first working model that must be validated with new experiments, revealing in most cases a number of deficiencies. In this case, a new model structure and/or a new (optimal) experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory.

In the present chapter, we will focus on the key steps of parameter estimation and optimal experimental design, assuming the structure of the non-linear dynamic model as given.

## Parameter estimation

The problem of parameter estimation, i.e. determining the parameters of a system from experimental input–output data, is sometimes also called identification, model calibration or regression. This is just one aspect of a larger problem, the inverse problem, which includes *a priori* structural identifiability, *a posteriori* or practical identifiability and parameter identification [14,17]. A basic yet thorough guide for parameter estimation in the context of biological systems has been recently published [18].

In this section, we consider deterministic, non-linear dynamic models of biochemical systems, i.e. those described by deterministic ODEs (ordinary differential equations) or DAEs (differential algebraic equations). In the case of ODEs, a popular statement is the so-called state-space formulation (eqns 1 and 2):
where **x** ∈ *X* ⊂ *R ^{nx}* are the state variables,

**y**∈

*Y*⊂

*R*

^{nobs×ns}is the vector of observables with

*n*discrete time measurements each,

_{s}**u**∈

*U*⊂

*R*

^{nu}specifies the vector of inputs (i.e. all manipulated variables for a particular experiment) and

**θ**∈ Θ ⊂ ℜ

^{nθ}is the vector of model parameters where Θ represents the set of admissible parameters that may be fixed by physical, chemical or biological considerations.

Many difficulties found during parameter estimation are due to a poor identifiability of the model parameters. Parameter identifiability tests should be performed prior to the estimation process to ensure that the parameter estimation problem is well-posed [17]. Identifiability analysis investigates whether the unknown parameters of the postulated model can be estimated in a unique way. Regarding this problem, we can distinguish between structural and practical (or *a posteriori*) identifiability. Structural identifiability is a theoretical property of the model structure depending only on the observation and the input functions. The parameters of a model are structurally globally identifiable if, under ideal conditions of noise-free observations and error-free model structure, and independently of the particular values of the parameters, they can be uniquely estimated from the designed experiment [15,17]. Practical identifiability is usually evaluated based on the output sensitivity functions (partial derivatives of the measured states with respect to the parameters). If the sensitivity functions are linearly dependent the model is not identifiable, and sensitivity functions that are nearly linearly dependent result in parameter estimates that are highly correlated. This is usually studied from sensitivity-based criteria like the well-known FIM (Fisher information matrix). Detailed numerical procedures using the FIM are given in [19,20].

The objective of parameter estimation is, given a model structure and a set of experimental data, to calibrate the model (i.e. determining parameters which cannot be measured directly) so as to fit the experimental results in the best possible way. This is performed by minimizing a cost function which measures the goodness of the fit. Mathematically, this is stated as the optimization of a scalar cost function J(θ) with respect to the model parameters θ. The cost function is usually a certain weighted measure of the distance between the experimental data and the predicted values for the observables. The optimal value of θ will of course depend on the cost function chosen. Cost functions that have been shown to work well in practice include [14] (i) the Bayesian estimator, (ii) maximum likelihood estimator and (iii) (weighted) least squares estimator.

These cost functions are listed in decreasing order of the amount of information that has to be provided by the user or, equivalently, in increasing order of the number of *a priori* assumptions already included in the criterion. For the most complicated case of Bayesian estimation, the probability distribution of the parameters and the conditional probability distribution of the measurements for given parameter values have to be parameterized, whereas for the simplest case of least squares estimation can be performed without any extrinsic information. Weighted least squares and least squares estimation are special cases of the maximum likelihood method in which the measurement errors are assumed to be uncorrelated and normally distributed with zero mean and constant variance.

The weighted least squares function is given by (eqn 3):
where *n _{obs,i}* is the number of observables in each experiment,

*n*the number of experiments,

_{exp}*n*is the number of sampling times for each observable in each experiment, ỹ

_{sij}_{ij}∈ Ỹ ⊂ ℜ

^{nsij}are the vectors of sampling data for each observable in each experiment, y

_{ij}∈

*Y*⊂ ℜ

^{nsij}the corresponding model predictions,

**θ**∈ Θ ⊂ ℜ

^{nθ}and

**Q**

_{ij}∈ Ω ⊂ ℜ

^{nsij×nsij}is a non-negative definite symmetric weighting matrix. The minimization of this cost function will be subject to the dynamics of the system, plus possibly other algebraic constraints, and model parameters are also subject to upper and lower bounds. This formulation is that of an NLP (non-linear programming) problem with differential-algebraic constraints.

Estimating the parameters of a non-linear dynamic model is much more difficult than for the linear case, as no general analytic result exists. Since biological dynamic models are often highly non-linear, in order to reliably find the estimates we must resort to robust non-linear optimization techniques. Schittkowski [21] has listed a number of possible pitfalls and difficulties regarding parameter estimation in dynamical systems, including convergence to local solutions (if standard local methods are used, and especially when only bad starting values for parameters are available). Basically, existing methods for parameter estimation in dynamical systems can be classified into two main groups [21]: (i) initial value methods, also called single shooting methods, and (ii) multiple shooting methods [22]. Since the majority of both single and multiple shooting methods ultimately rely on local optimization solvers (usually gradient-based), there is a risk of convergence to local optima. Moreover, in the presence of a bad fit, there is no way of knowing if it is due to a wrong model formulation, or if it is simply a consequence of local convergence.

Thus there is a distinct need for using global optimization methods which provide more guarantees of converging to the globally optimal solution, as shown by Moles et al. [23]. The importance of using global optimization methods for parameter estimation in systems biology has been increasingly recognized in recent years [24–27]. Global optimization methods can be roughly classified as deterministic, stochastic and hybrid strategies. Deterministic methods can guarantee, under some conditions and for certain problems, the location of the global optimum solution. Nevertheless, no deterministic algorithm can solve global optimization problems of the class considered here with certainty in finite time. Stochastic methods are based on probabilistic algorithms, and they rely on statistical arguments to prove their convergence in a weak way. However, many stochastic methods can locate the vicinity of global solutions, but the associated computational cost is usually very large. In order to surmount this difficulty, hybrid methods and metaheuristics have been recently presented [20,28] that speed up these methodologies while retaining their robustness. In particular, the Scatter Search metaheuristic showed speeds up of between one and two orders of magnitude with respect to previous results [28,29].

## Optimal experimental design

Performing experiments to obtain a rich enough set of experimental data is a costly and time-consuming activity. The purpose of optimal experimental design is to devise the necessary dynamic experiments in such a way that the parameters are estimated from the resulting experimental data with the best possible statistical quality, which is usually a measure of the accuracy and/or decorrelation of the estimated parameters. In other words, based on model candidates, we seek to design the best possible experiments in order to facilitate system identification.

Optimal experimental design is, therefore, critical in the model building cycle, outlined in Figure 1, which should also contain coupled elements like model discrimination. In the context of systems biology, several aspects of the general problem have already been studied [30–38].

Let us assume that the model calibration is performed by minimizing the weighted least squares function, (eqn 3) above. Under the hypothesis of zero-mean Gaussian noise with a diagonal co-variance matrix for each experiment, performing a linear approximation of the system’ output trajectories around the optimum value of the parameters θ× and substituting in (eqn 3) it is concluded that the practical identifiability can be improved through the maximization of the FIM [14] (eqn 4):
where ∇_{θ}*y _{ij}*(

*t*|

_{k}**θ**

^{*}) represents the sensitivity of the observable j with respect to the parameters as evaluated for a given value θ*, at sampling time k and for the experimental conditions i.

Therefore, mathematically, the optimal experimental design problem can be formulated as a dynamic optimization problem where the objective is to find a set of inputs **u**, usually time-varying variables (stimuli) together with initial conditions, sampling times and experiment durations, for the dynamic experiments in order to optimize the quality of the estimated parameters in some statistical sense one must calculate the inputs **u** to minimize or maximize a performance index which is a scalar measure of the FIM:
subject to the system dynamics (eqns 1 and 2) and possibly other algebraic constraints related to experimental limitations:
Scalar functions of the FIM are used as OED criteria for increasing the accuracy and practical identifiability of the model parameters from experimental data. OED applied to linear steady-state models is a well-known subject, but OED of non-linear dynamic models is more challenging.

Different so-called alphabetical optimal design criteria for OED have been discussed in the literature (see e.g. [39]). Since the FIM may be related to the confidence hyper-ellipsoid for the parameters, the different OED criteria are related to its shape and size. The D-criterion minimizes the confidence ellipsoids volume, the E-criterion minimizes the length of the largest axis whereas the modified E-criterion minimizes the ratio of the largest to the smallest axis (i.e. the condition number of the FIM), seeking to make those ellipsoids as spherical as possible.

Numerical solutions for this dynamic optimization problem can be obtained using direct methods, which transform the original problem into an NLP problem via parameterizations of the inputs and/or the states. However, because of the frequent non-smoothness of the cost functions, the use of gradient-based methods to solve this NLP might lead to local solutions. As it happened in parameter estimation there is a need of global optimization methods to ensure proper solutions. Stochastic methods of global optimization have presented as robust methods for this class of problem [20,35,36].

Finally, it should be noted that most models in systems biology are non-linear on the states and/or the parameters, so the FIM, which is based on a linearization for a given value of the parameters, may result in underestimations of the confidence intervals. Therefore although the FIM is convenient for OED purposes since it is relatively cheap to compute, alternatives such as the bootstrap method should be used for a better estimation of the confidence regions for highly non-linear models, as will be illustrated in the example below.

## Example

In this section we illustrate methods for parameter estimation and optimal experimental design, recently developed in our group, considering the modelling of the STAT5 (signal transducer and activator of transcription 5) signalling cascade as an example. In particular, it will be shown how, given a model candidate, these methods allow us to carefully plan optimal experiments which will result in parameters estimated with the best possible statistical quality.

This signalling pathway starts with the phosphorylation of the STAT5 molecule. This reaction is driven by the EpoR (erythropoietin receptor) located at the cell membrane. Two phosphorylated STAT5 proteins form a dimer which subsequently enters the cell nucleus where it stimulates the transcription of target genes. After that, the molecules are undimerized and dephosphorylated and relocated back to the cytoplasm. This process is modelled by the following system of non-linear ODEs [22,40] (eqn 7):
where *k*_{1}, *k*_{2} are rate constants and τ is a delay parameter. *x*_{1} represents the cytoplasmic unphosphorylated STAT5, *x*_{2} the phosphorylated STAT5, *x*_{3} the dimer and *x*_{4} is the nuclear STAT5 (all of them in arbitrary units). As the input function, *EpoR _{A}*(

*t*) determining the STAT5 activation serves as the response function of the EpoR to a stimulus.

The parameter estimation problem considered consists of the calculation of *k*_{1}, *k*_{2} and τ. The observed quantities are the total amount of activated STAT5 *y*_{1} = *s*_{1}(*x*_{2} + *x*_{3}) and the total amount of STAT5 in the cytoplasm *y*_{1} = *s*_{2}(*x*_{1} + *x*_{2} + *x*_{3}). The corresponding initial conditions and the nominal parameter values are: **x**_{0} = [3.71,0,0,0]^{T};**θ**^{*} = [2.12,0.109,5.2] respectively. The scaling parameters are assumed to be known and fixed to the following values s1 = 0.33 and s2 = 0.26 [22]. The delay is approximated by a linear chain of length *N* (eqn 8):
Here, *in*(*t*) is the input and *o*(*t*) the output of the delay chain. In the case of the described STAT5 model we set *in*(*t*)=*x*_{3}(*t*),*o*(*t*)=*x*_{3}(*t*−τ) and *N* = 8 [22].

The solution of the parameter estimation problem relies on the use of time-resolved experimental data. Since the experimental data are subject to noise, the estimated parameters should be accompanied by confidence intervals. Moreover, different experimental schemes, with the same experimental noise level, will result in different confidence hyper-ellipsoids, so volume and the eccentricity of the confidence hyper-ellipsoids will be good measures of the experiment design quality.

Estimates of the confidence intervals for each parameter may be calculated using the FIM. However, it has been shown that this may lead to wrong conclusions [41], thus the bootstrap approach is used here. This method requires the solution of the parameter estimation problem for a sufficiently large number of times (approx. 500 times in this work) from different initial guesses. The *n*Θ-dimensional ‘cloud’ of solutions achieved is then used to estimate the volume and eccentricity of the confidence hyper-ellipsoid and the confidence interval for each parameter.

Assuming the parameter values reported in [22], different pseudo-experimental (simulated) data sets were generated. Parameter estimation from these data sets using local optimization methods from different initial guesses reveals convergence to suboptimal solutions, in agreement with what we stated in the parameter estimation section. Thus, in order to accurately calculate the parameters and their confidence regions, the use of a global optimization approach is required. Here we have used the recently developed SSm (Scatter Search metaheuristic) [29], since it offers an excellent compromise between robustness and efficiency for this class of problem [28].

We first considered a single experiment with a sustained stimulus (*EpoR _{A}* = 1) and 20 equidistant sampling times, in order to evaluate the practical identifiability properties under typical experimental conditions. The experimental data were assumed to be subject to zero-mean Gaussian noise with σ = 10% S.D. The largest confidence interval, of more than 100%, as predicted by the bootstrap approach, corresponds to the estimation of

*k*

_{1}. The volume for the confidence region is 0.70 and its eccentricity is 11.9, which reveals a significant overall correlation among the parameters.

Since in practice several experiments are usually performed, next we considered the use of five optimally designed sustained stimulus experiments. Now the maximum error was approx. 16% for the estimation of τ. However, the price to pay for such reduction in the maximum error is a larger amount of experimental work.

In order to reduce the experimental work while keeping the quality of the estimates, we have also considered the optimal design of dynamic (time varying stimulus) experiments taking into account the following experimental constraints: (i) step-wise profiles for the stimulus (using one or two pulses); (ii) maximum of 20 sampling times for both observables; and (iii) maximum experiment duration of 3 h, and a minimum of 40 min.

The objective was to illustrate the advantages of optimal dynamic experiments considering the E-criterion and several scenarios: (i) using a single dynamic experiment, E1; (ii) two optimal experiments, E3; and (iii) using optimally located sampling times for 1 and 2 experiments (E2; E4). Results for these different experimental schemes are summarized in Table 1.

The results obtained for the experimental scheme E1 showed a significant improvement over the sustained stimulus case, with a maximum error of 22% for *k*_{1}. The results for experimental scheme E2 showed that the optimal location of sampling times further improved these results, even though the number of sampling times was reduced by 20%. Figure 2 presents this optimal experimental scheme E2, together with the evolution of the corresponding observables.

A property of optimum designs in general is the tendency to concentrate the experimental effort on a few sets of conditions. For the case of optimal sampling, this means, as shown in Figure 2, to concentrate them around particular times during the experiment. The tendency is explained by the fact that it is better to make more measurements around the times where the observables become more sensitive to the parameters.

Figure 3 presents a comparison of the confidence regions for the sustained stimulus case and the E2 experimental scheme, for the pair of parameters *k*_{2} and τ, showing how both the volume and the eccentricity of the confidence interval were largely reduced by the combination of dynamic stimulus and optimal sampling.

As expected, optimal designs with additional experiments (E3 and E4) further reduced the size of the confidence regions, with final maximum errors of the order of magnitude of the experimental noise, which is an excellent result. Figure 4 presents the computation of the confidence intervals for the parameters through the histograms of solutions obtained using the bootstrap approach under the experimental scheme E4. In addition, it should be noted that using these experimental schemes clearly outperforms the use of five optimally designed sustained stimulus experiments, providing better statistical quality with a simultaneous significant reduction in the experimental cost.

## Conclusions

The model building loop (i.e. carried out as an iterative process) is a key element in systems biology. In the present chapter we have focused on two critical steps of this loop: parameter estimation and optimal experimental design. We have shown how these problems can be formulated as non-linear optimizations, and we have underlined the need for global optimization methods to ensure proper solutions for these problems. A case study regarding cell signalling modelling was used to illustrate these concepts.

## Summary

•

*Model building is an iterative process. In the present chapter we review two key steps of the model building process, namely parameter estimation (model calibration) and optimal experimental design.*•

*Parameter estimation (parametric identification) aims to find the unknown parameters of the model which give the best fit to a set of experimental data. Optimal experimental design aims to devise the dynamic experiments which provide the maximum information content for subsequent non-linear model identification, estimation and/or discrimination.*•

*The parameter identification problem is usually formulated as a non-linear optimization problem, where the objective is to find the set of unknown parameters to minimize a function quantifying the goodness of the fit. This type of problem is frequently hard to solve owing to (i) the multimodal nature of the optimization problem, that is, the presence of several suboptimal solutions, and (ii) the lack of (structural or practical) identifiability, that is, the impossibility of giving a unique solution for the parameters.*•

*This contribution illustrates the use of global optimization techniques in order to avoid suboptimal solutions, and introduces the use of optimal experimental design as a means to improve (practical) identifiability. This approach is illustrated with an example considering a cell signalling cascade.*

## Acknowledgments

This work was supported by the European Community as part of the FP6 COSBICS (Computational Systems Biology of Cell Signalling) Project (STREP FP6-512060) and by Xunta de Galicia (PGIDIT05PXIC40201PM).

- © The Authors Journal compilation © 2008 Biochemical Society