## Abstract

At genome scale, it is not yet possible to devise detailed kinetic models for metabolism because data on the *in vivo* biochemistry are too sparse. Predictive large-scale models for metabolism most commonly use the constraint-based framework, in which network structures constrain possible metabolic phenotypes at steady state. However, these models commonly leave many possibilities open, making them less predictive than desired. With increasingly available –omics data, it is appealing to increase the predictive power of constraint-based models (CBMs) through data integration. Many corresponding methods have been developed, but data integration is still a challenge and existing methods perform less well than expected. Here, we review main approaches for the integration of different types of –omics data into CBMs focussing on the methods’ assumptions and limitations. We argue that key assumptions – often derived from single-enzyme kinetics – do not generally apply in the context of networks, thereby explaining current limitations. Emerging methods bridging CBMs and biochemical kinetics may allow for –omics data integration in a common framework to provide more accurate predictions.

- constraint-based models
- data integration
- metabolic networks

## Introduction

Metabolism is a set of biochemical reactions that allow the conversion of nutrients into building blocks necessary for life. In human diseases such as cancer or diabetes, metabolism shifts in a specific way, and elucidating the underlying mechanisms is crucial to understand and treat these diseases. However, this is far from trivial given the complexity of metabolic networks. They include typically thousands of biochemical reactions catalyzed by different enzymes with various properties to convert hundreds of metabolites. This motivates the need for computational modeling methods to transform data into insight [1].

Biochemical processes in metabolic networks can be described by kinetic models, represented as a system of ordinary differential equations (ODEs) in the form
where **N** is the stoichiometric matrix of the system and **v** is the vector of reaction rates computed according to their rate laws from the species concentrations **c** and the rate law parameters **θ**. Integration of the ODE system returns unique trajectories of concentrations and reaction rates that depend on the parameters and on the initial concentrations, **c _{0}**. Thereby, kinetic models provide a high level of detail. However, they depend on knowing enzyme-kinetic rate laws and parameters, which hinders their application to genome-scale metabolic networks, where most reactions are still not characterized

*in vivo*[2].

Because of these difficulties, constraint-based models (CBMs) are a favorite approach for modeling metabolic networks. Instead of simulating a trajectory in the concentration space, CBMs describe the entire space of feasible flux (rate) distributions **v** of the network given a set of flux constraints (Figure 1A). Commonly, CBMs assume the system to be at steady state, leading to the stoichiometric constraint **N·v** = **0**; reaction directionality and capacity constraints are incorporated as bounds on the reaction rates **lb** ≤ **v** ≤ **ub**. Although the constraint-based formulation does not model metabolite concentrations and time evolution, it empowers several methods that help answer biological questions that are more difficult to address with kinetic models [3,4]. A central method is flux balance analysis (FBA) [5], which identifies an optimal flux distribution given an assumed cellular objective such as maximal growth for microorganisms. However, for multicellular systems such as human cells or microbial communities, the cellular objective is unknown, preventing the use of optimization methods. The main challenge for CBMs then is a potentially under-constrained flux space because the tighter the constraints are, the closer the feasible flux space will be to an ODE solution obtained with perfect knowledge of the kinetic parameters, and therefore the more predictive the CBM will be [3].

When CBMs are insufficiently predictive, integrating –omics data is a possible avenue to reduce the flux space and to obtain more precise predictions [6]. While this concept is intuitive and many computational methods based on it have been developed, the heterogeneity of –omics data makes them difficult to integrate, sometimes even more difficult than generating the data itself [7]. Perhaps more importantly, the mapping of –omics data on to CBMs is non-trivial. As Figure 1A shows, reaction fluxes are determined by a combination of factors, including enzyme and substrate concentrations, enzyme parameters, and regulatory interactions. Integrating concentration data ultimately reduces the question of what the influence of changes in enzyme or metabolite concentrations on the fluxes is. It is not easy to answer in a network context, despite the apparent simplicity of enzyme kinetic rate laws. Because reaction rates influence substrate concentrations, changes in the flux through one reaction also affect the fluxes of other reactions consuming the same metabolite. Consider the example in Figure 1B–D, a branching network with constant input where the two pathways are regulated by enzymes E_{1} and E_{2} with similar turnover numbers and substrate affinities. At steady state, the enzyme concentrations and parameters determine the distribution of fluxes between the two branches. When we perturb reaction r_{1} by halving the concentration of E_{1}, its rate immediately halves accordingly, but subsequent substrate accumulation leads to the fluxes in both branches reaching a new steady state. If the same perturbation is applied to a linear pathway, the accumulation of substrate possibly leads to the restoration of the original steady state flux distribution.

Interdependencies and concentration effects, hence, have profound implications for data integration into the constraint-based framework, which does not natively capture the effect of concentration changes. Most state of the art methods for –omics data integration in CBMs must ultimately resort to heuristics to integrate experimental data – here, we focus on the methods’ underlying principles, explicit or implicit assumptions, and corresponding limitations. Our central argument is that the absence of kinetic information in such methods is a main limitation, and that it will be necessary to systematically exploit (available) kinetic information in the data integration process.

## Data integration methods

Kinetic rate laws can be decomposed into different components to emphasize the effect of different physical effects on the resulting flux. For instance, the irreversible Michaelis–Menten rate law can be decomposed into the product of three terms: the enzyme activity, the substrate saturation level, and the thermodynamic driving force [8]. In the following, we will review how one can integrate experimental data related to each of these terms of the rate law in order to obtain more predictive flux distributions, and we will specify what assumptions each approach entails.

### Reaction rates

*In vivo* metabolic fluxes constitute the integrated response of the network to, for example external perturbations. However, these intracellular fluxes cannot be estimated directly from metabolite concentrations without knowing all relevant kinetic details such as rate laws, their parameters, and effective enzyme concentrations. ^{13}C-based flux analysis addresses this problem. Its principle is that ^{13}C-labeled substrates are fed to growing cells and the labeled substrates distribute throughout the network. The fluxes can then be estimated from the measured isotope distribution and a structural model of the metabolic network (see [9–11] for details on the method). However, the complexity of ^{13}C-based flux analysis limits its use to the estimation of 50–100 fluxes in central metabolism, preventing its use for the estimation of fluxes in a large-scale metabolic network [12]. Still, these fluxes can be easily integrated as constraints in a genome-scale model (GSM) by constraining the flux values to their experimentally measured values [13]. Alternatively, ^{13}C-based flux estimates are often used to validate *a posteriori* flux predictions generated using other methods [14].

Extracellular (net uptake or secretion) fluxes can be easily measured and integrated into the metabolic network. The measurement of metabolite uptake or secretion rates depends on the types of culture conditions used, which imply different assumptions. For chemostat cultures, both the growth rate and the extracellular fluxes are constant, such that one can estimate extracellular fluxes by comparing the metabolite concentrations in the culture with those of the fresh medium and by correcting for the cell dry weight. In batch culture conditions, both the growth rate and the extracellular fluxes vary over time [15]. Fluxes are usually estimated from dynamic extracellular metabolomics (exo-metabolomic) data, generated in a typical experiment that measures metabolite concentrations in the extracellular medium at different times (Figure 2A). Note, however, that the estimation usually assumes constant growth, uptake, and excretion rates over time, and that estimates could be incomparable between experiments with different dry weights per cell.

Exo-metabolomics data represent a metabolic footprint of the cell that can help to understand the phenotype [16]. For example, absolute metabolite concentrations measured in different cancer cell lines showed that glycine consumption correlates with the rate of cancer proliferation [17]. However, often ion intensities from MS-based metabolite analysis provide only relative data on metabolite abundances. For each metabolite, the ratio of the ion intensities in two conditions is then considered to be equal to the ratio of the respective metabolite concentrations. For example, this approach was used to study mycobacteria in normal and hypoxic conditions [18], where GSMs (specifically, their reaction graph representations) facilitated the interpretation of exo-metabolomics data by generating hypotheses on which intracellular pathways are used in each condition. Measured extracellular fluxes can also be directly integrated as a set of quantitative or qualitative constraints in the GSM to constrain the flux space (Figure 2A). If absolute rates are available, the fluxes are constrained to the experimentally measured rates. With relative rates, one can apply overflow constraints, that is, constrain the ratios of exchange fluxes to the measured values. Flux spaces for different conditions can then be characterized by random sampling to compare metabolic network operation [19]. While these approaches constrain otherwise free exchange fluxes by measured rates, one can constrain a model further by only allowing measured exchange fluxes, and blocking all others. However, the set of measured external metabolites may be incomplete and the resulting model might not be viable. A recent method addresses this problem by finding a minimal number of additional exchange fluxes that need to be active for the model to sustain minimal cell growth. Exo-metabolomics data can then be used to create condition-specific models tailored to specific metabolomics profiles [20]. Overall, thus, it is relatively straightforward to integrate (dynamic) exo-metabolomics data into GSMs to increase their predictive power. However, the extracellular metabolic profile of cells is sometimes not available, for instance when analyzing human cells *in vivo*.

### Metabolite concentrations

The accuracy and resolution of intracellular metabolomics measurements heavily depends on the experimental method used as well as the extraction method [21,22]. A more fundamental challenge to integrate this type of data, however, is that CBMs do not represent metabolite concentrations explicitly. As a consequence, intracellular metabolite concentrations cannot be integrated directly. The quasi steady-state assumption in the constraint-based framework states that metabolite concentrations are constant. However, this neglects that the cell can accumulate or deplete intracellular metabolites to adapt in dynamic situations such as during responses to changing nutrient conditions. Recently, a novel method referred to as unsteady-state FBA (uFBA) has been developed to allow for changes in intracellular metabolite concentrations, by relaxing the assumption of quasi steady-state for some metabolites [23]. It explained the utilization of extracellular citrate by red blood cells stored in an anticoagulant solution by the large intracellular malate pool. Interestingly, to cope with non-linearity in time-course metabolomics data, the data were discretized into time-intervals where it is possible to assume linear profiles for piece-wise simulation.

A second aspect of the influence of intracellular metabolite concentrations on fluxes relates to the thermodynamic driving force: the product to substrate concentration ratio of a biochemical reaction can influence both the flux directionality and the rate of the reaction. In the thermodynamic framework, a chemical reaction is feasible in a certain direction if the Gibbs free energy change of the reaction, *ΔG*, is negative. In turn, *ΔG* depends on the free energy change of the reaction in standard conditions and the ratio of the concentrations of the reaction’s products and educts [24]. As shown in Figure 2B, each irreversible biochemical reaction imposes a constraint on the product and substrate concentrations. The intersection of all these constraints constitutes the feasible metabolic (concentration) space for the network. Thermodynamic-based metabolic flux analysis (TMFA) exploits this principle to predict a flux distribution consistent with the thermodynamic constraints. Specifically, TMFA tries to find an optimal flux distribution such that the free energies of the irreversible reactions are negative and such that there are feasible metabolite concentrations that can satisfy feasibility of a network flux distribution [25]. Building on this method, Hoppe and co-workers [26] constrained the metabolic space to plausible metabolite concentrations based on literature to constrain flux directionalities. This approach provides a link between metabolite concentrations and flux directionality, without requiring any additional assumption. However, it is not yet applicable to large-scale metabolic models for two main reasons: the mixed integer linear programming approach (MILP) is computationally expensive, and one needs to know the standard Gibbs free energy change constants, which are difficult to measure in practice [27].

### Enzyme concentrations

For the analysis of metabolic network operation in multicellular systems *in vivo*, often transcriptomics or proteomics data are more readily available than metabolomics data. The integration of transcriptomics or proteomics data therefore has been a major direction of methods development for GSMs. A variety of different goals was addressed, for example to compare cell states affected by genetic perturbations to understand key genes involved in cancer [28] or ageing [29], to create tissue-specific models by knowing which enzymes are expressed in the different tissues [30–33], and to understand the impact of environmental perturbations in microbiota on the human host’s transcriptome [34]. Here, through some examples, we focus on the main concepts behind methods that integrate transcriptomic data into CBMs to increase the accuracy of flux predictions. Extensive summaries of the methods can be found in other reviews [14,35,36].

Fundamentally, transcriptomics and proteomics data both approximate enzyme concentrations in metabolic networks; we focus on transcriptomics in the following, and the same logic applies to proteomics data. Transcriptomics data can be integrated either by discretizing the gene expression data (the switch approach) or by deriving quantitative continuous constraints (the valve approach) [37]. For instance, iMAT uses the switch approach. Absolute gene expression data allow to classify reactions into those with highly or lowly expressed enzymes. iMAT favors (disfavors) flux through highly (lowly) expressed reactions, without requiring a cellular objective [38]. E-Flux is an example of methods based on the valve approach. Absolute gene expression data are normalized to a reference condition, and these data are used to directly constrain the fluxes, the rationale being that the higher the concentration of the enzyme, the higher is the flux capacity through this reaction [39]. However, the performance of such data integration methods was evaluated systematically only recently. In comparisons between predicted and experimentally measured flux distributions for different knockout mutants and overexpression strains of *Escherichia coli* and *Saccharomyces cerevisiae*, these data integration methods performed consistently less well than parsimonious enzyme usage FBA (pFBA) [14]. pFBA is an FBA variant that assumes minimal enzyme usage as a part of a cell’s objective, and it does not integrate experimental data [40]. This raises the question of how data integration could make GSMs *less* predictive. Only linear-bound FBA (LBFBA) recently reported slight improvements over pFBA in certain conditions [41].

Consider first methods in the valve category, which set bounds on metabolic fluxes according to quantitative measures of enzyme concentrations. This assumes that mRNA abundances correlate with protein concentrations, which correlate with enzyme activities, which correlate with metabolic fluxes [42]. It is known that cellular mRNA and protein concentrations can correlate poorly [43], but this does not explain the methods’ performance in the comparative study because integrating proteomics instead of transcriptomics data did not lead to significant improvements [14]. One possible explanation for proteomics data not improving predictions is that the methods compared in [14] are tailored to transcriptomics data and, for example different discretizations may be required for the two data types. In our view, however, a more likely reason is that all current methods integrate transcriptomic or proteomic data without integrating the associated kinetic parameters. For instance, E-Flux limits the flux capacity of a reaction based on the amount of enzyme present. Enzyme kinetics justifies this principle in an isolated reaction, but not in a network. Consider the linear pathway shown in Figure 1B–D, where all fluxes are entirely determined by the input flux. There is no change in flux upon halving the enzyme concentration because the enzyme does not operate at saturation, and the substrate concentration adapts to the change in enzyme concentration. Inappropriate scaling of the maximal flux in this example would identify the reaction as the bottleneck and predict a flux change upon change in the enzyme concentration. Fundamentally, even with absolute proteomics data, the limitation is a potentially wrong scaling of constraints on the flux space when one does not consider the different turnover rates of metabolic enzymes. Unfortunately, these vary by several orders of magnitude between different enzymes [44]. Also, *in vivo* experimental evidence indicates that direct correlations between transcriptomics or proteomics data and metabolic fluxes are weak. Chubukov and co-workers [34] showed that changes in *E. coli’s* central carbon metabolism fluxes upon changes in the carbon source available in the medium could not be explained alone by the transcriptome, and that changing substrate concentrations explained only a small fraction of flux changes.

There is, however, a special case in which kinetic parameters and metabolite concentrations do not affect the flux through enzyme-catalyzed reactions: without enzymes, there should be no flux through the catalyzed reactions. Hence, one could use transcriptomics data to derive which enzymes are absent, for example from specific tissues or cell lines. This is the rationale behind the switch approach: it sets a threshold to determine whether gene products are present or not, and it forces the flux through reactions that lack a catalyzing enzyme to zero [45]. Indeed, cell line-specific models developed with the switch approach showed improved gene essentiality predictions. However, we do not have principled ways of defining the threshold’s value and this value substantially influences the model predictions. For example, cell line specification via a threshold sometimes removed important metabolic functionalities by falsely considering a gene product as absent [31]. An arbitrary threshold for microarray data can be avoided by a discretization workflow specific for this data type [46], used by recent methods integrating microarray data to construct tissue-specific models [47,48]. Overall, thus, while the integration of transcriptomics or proteomics data into GSMs appears intuitive and straightforward, this is not necessarily the case when either evaluating the performance of the corresponding methods, or when mapping the methods’ assumptions to the underlying biochemical kinetics.

## Hybrid frameworks

### Integrating structure and kinetics

The constraint-based framework predicts metabolic fluxes with very limited information about the biological systems studied, but the omission of kinetics prevents a rigorous integration of metabolite and enzyme concentration data. While the limited availability of kinetic parameters motivates the wide adoption of heuristic-based methods for data integration, we argue that it is critical for data integration frameworks to re-consider reaction kinetics in the context of the reaction network. Because developing realistic genome-scale kinetic (ODE) models is still an ambitious goal even for the best-studied model organisms, hybrid approaches are promising. Hybrid models use kinetic information only to the extent that it is available or reliable, while the remaining network is modeled in the constraint-based framework. This allows to combine the descriptive power of an ODE model with the exploratory capabilities of constraint-based modeling.

In this framework, static approaches for –omics data integration constitute one main class. They use the ODE formulation and concentration measurements for a part of the network to generate condition-specific flux constraints (Figure 3A). The population FBA method [49] improves over E-Flux [39] by using published enzyme turnover rates as well as protein counts inferred from fluorescence measurements to compute flux bounds. Modeling individual cells with protein copy numbers sampled from the experimental distributions allowed to predict cell-to-cell variability and the Crabtree effect in a yeast population [50]. FBA with kinetic constraints (KFBA) [51] estimates kinetic parameters from a multi-omics dataset including flux, protein abundance, and metabolite abundance measurements [52]. For each test condition, enzyme and metabolite concentrations together with the estimated parameters and their confidence intervals are used to compute condition-specific flux constraints for a subsequent FBA. KFBA showed improved predictions over classical FBA, but the method was allowed to violate some constraints to find a feasible solution, highlighting the challenges in the construction of complete and consistent models. k-OptForce [53] uses metabolite concentrations and an ODE model to compute steady-state fluxes for those reactions with kinetic information; these fluxes act as constraints, for example in strain design [54]. Without explicitly computing flux constraints, integrative omics-metabolic analysis (IOMA) [55] intends to find the flux distribution that best agrees with proteomics and metabolomics data and with the kinetics of a set of well-characterized reactions. Minimizing the sum of per-reaction error variances over different conditions makes IOMA less sensitive to systematic bias due to regulation mechanisms not included in the model, but it also implicitly favors smaller fluxes, akin to pFBA [40].

The other main class of dynamic approaches is most relevant when experimental data are not available for the conditions of interest, or when the environmental conditions vary over time, for example due to external substrate consumption. Dynamic approaches rely on dynamic FBA (DFBA) [56] (Figure 3B; see also [57] for a recent review). DFBA models a subset of reactions whose dynamics are deemed relevant, or for which kinetics are available, as an ODE system. At each time step of the integration for solving the ODE system, simulated fluxes from the ODE model constrain an FBA problem over the remainder of the network, and the FBA results enter the ODE model for the next time step. Here, experimental data are used to compute uptake rates or kinetic parameters. In principle, the ODE model can have arbitrary coverage of the network, ranging from a single uptake reaction to more complex (possibly regulated) subnetworks as in integrated FBA (iFBA) [58]. However, DFBA simulations have to solve both the ODE system and the CBM in one pass at each time step, without multiple levels of flux dependencies. This is possible if the ODE system covers only the network boundaries such as biomass production and nutrient uptake reactions. Despite these constraints, DFBA can help analyze the properties of highly interdependent hybrid models. For example, an ODE model of central carbon metabolism integrated into a GSM revealed bistability in the metabolism of *E. coli* [59]. Overall, however, selecting the relevant kinetics and inferring the corresponding kinetic parameters from experimental data remain key challenges of hybrid frameworks.

Finally, in particular for hybrid models, metabolic regulation complicates the prediction of flux distributions. Transcriptomics and proteomics data may replace explicit models for the effects of transcriptional and post-translational enzyme regulation, but even together with metabolomics data for substrate concentrations of metabolic reactions, they are not sufficient to explain metabolic fluxes [34]. Hence, a significant portion of metabolic regulation must be attributed to other metabolite–enzyme interactions such as allosteric regulation [60]. Indeed, most metabolic enzymes in *E. coli* are regulated by metabolites, with more than a third of the inhibitors’ concentrations being above the *K*_{I} value (inhibitor-enzyme dissociation constant) [61,62], and methods for the reconstruction of small molecule regulatory networks are becoming available [63,64]. For hybrid models, interdependencies via allosteric regulators, especially when currency metabolites such as ATP or NADH are involved, directly affect the selection of reactions to be modeled by ODE systems that, in contrast with CBMs, predict metabolite concentrations, given the concentrations in the intracellular or extracellular environment. In the constraint-based modeling framework, to our knowledge, only a single method accounts for regulatory enzyme–metabolite interactions. Allosteric regulation FBA (arFBA) [65] uses known regulatory interactions to minimize the discrepancy between the changes in the regulated flux and in the turnover rate of the effector with respect to a reference condition. It relies on the assumption that the change ratios are linearly correlated, which is not justifiable in general. Again, only the kinetics of a reaction can predict the effect of regulation at different levels of enzyme saturation or effector concentration.

### Probabilistic approaches

Uncertain knowledge of reaction kinetics, their parameters, and their regulation is pervasive in systems biology, not restricted to CBMs and metabolic networks [66]. For smaller scale dynamic models of biochemical networks, probabilistic approaches that quantitate uncertainty, its propagation through networks, and it effects on phenotypes are increasingly employed [67]. In particular, Bayesian approaches gain importance because they can account for uncertainties in networks and experimental observations simultaneously [68,69]. For example, Bayesian inference was successful in real-life applications that simultaneously aimed to elucidate biological mechanisms and parameter values by integrating prior knowledge and experimental data for dynamic models of cell signaling [70,71]. We argue that developing equivalent computational methods for Bayesian data integration and metabolic flux analysis for the substantially larger CBMs is a challenge worthwhile to be addressed.

In a consistent Bayesian approach for data integration into CBMs, the network structure is the main prior information. It restricts the space of feasible fluxes, and instead of reasoning about (single) optimal fluxes or boundaries on fluxes, one can aim to derive probability distributions over fluxes given the network structure. Recent developments of sampling algorithms [72–74] as well as approximation algorithms [75] make the characterization of probability distributions over feasible fluxes at genome-scale a realistic endeavor. Then, the combination with –omics data (via their likelihood) can, in principle, generate probabilistic predictions of metabolic fluxes (the Bayesian posterior), without requiring assumptions on biological objectives. This may allow an unbiased investigation of metabolic flux distributions that are consistent with experimental observations, but substantial conceptual and technical challenges such as defining and efficiently sampling omics-based probability distributions on the flux space need to be solved.

To illustrate how Bayesian approaches could help integrating –omics data into CBMs probabilistically, consider the example of Michaelis–Menten kinetics in Figure 4. Simulations in Figure 4A suggest that relative metabolite concentrations can be informative for metabolic fluxes – the effect of a two-fold increase in substrate concentration is captured probabilistically, without knowing exact parameter values. In contrast, with an enzyme abundance increased by two-fold, the flux is expected to double, but with large variability due to potentially changing substrate concentrations when the reaction is embedded in a network (Figure 4B). Thus, assuming types of reaction kinetics, and using known distributions of enzyme parameters, Bayesian inference for –omics data integration appears to be a promising avenue for metabolic flux analysis, despite the large uncertainties in enzyme kinetics and regulation.

## Conclusions and perspectives

At genome-scale, it is currently impossible to comprehensively measure (through ^{13}C experiments) or precisely predict (through kinetic models) metabolic flux distributions, which makes the constraint-based framework so attractive for the analysis of metabolic networks. In particular, CBMs are powerful at elucidating the entire feasible flux space given well-characterized constraints derived from reaction stoichiometries, mass balances, and thermodynamics. To constrain behaviors further – and thereby make models more predictive – the intuitively appealing idea of obtaining constraints based on measured concentration data, however, has proven challenging. We argue that the main limitation of many current methods for data integration is to rely on assumptions derived from single-enzyme kinetics that do not necessarily hold in a network context. As simple examples show (see Figure 1), especially the links between enzyme concentrations and fluxes in a reaction network are not trivial in general, and further complicated by several layers of regulation *in vivo* [34].

As a consequence, we need to account for reaction kinetics in the integration of experimental data and in the description of many metabolic mechanisms such as substrate accumulation or allosteric regulation. We do not expect genome-scale kinetic models to replace the simple and conservative constraint-based approaches in the near future, but rather argue that it is worth exploiting the complementary strengths of the two frameworks: obtaining precise fluxes or constraints from well-characterized reactions, and exploring the resulting feasible flux space with constraint-based approaches. Methods such as KFBA and k-OptForce can set omics-based constraints on the individual fluxes using kinetics, but a key open question is which minimal network parts should be augmented with kinetics to obtain more predictive and still consistent models. Importantly, it is not necessary to constrain all fluxes by kinetics because the steady-state assumption propagates flux constraints to other reactions in the network. This applies to linear pathways, where all fluxes are equal at steady state, and to branched pathways as in Figure 1, where knowing two fluxes suffices to infer the remaining one. Minimal sets of relevant fluxes should ideally be identified from structural information only, for example using concepts from metabolic control analysis (MCA) [76]. The introduction of kinetics, however, must be approached with care: the example of KFBA consistently needing to violate constraints to obtain a feasible steady-state solution suggests that incomplete model structures or inaccurate parameters can easily lead to inconsistencies in the integration of experimental data.

More generally, we believe that probabilistic methods are under-explored in model-based methods for –omics data integration. In particular for applying Bayesian approaches, by now well-established in other domains of systems biology, CBMs are lagging behind. With recent technical developments in sampling in large networks, it is now possible to develop Bayesian approaches for flux predictions that consistently integrate multiple types of –omics data. Note that this approach differs from emerging machine learning methods that, for example were recently used to integrate a multi-omics compendium of 26 databases into one multiscale model for *E. coli* [36]. Unfortunately, machine learning methods usually have a weak connection to the underlying biochemistry, which hinders reasoning about the connection between biological processes and predicted phenotypes. Finally, because many metabolic reactions are regulated to maintain stable fluxes and to switch pathways depending on the environmental conditions, new methods that account for regulation will likely have a great impact on the quality of predictions.

## Summary

Data integration into CBMs of metabolism promises to increase the models’ predictive powers.

This integration is challenging, depending on the type of data.

Many existing methods rely on assumptions derived from enzyme kinetics that do not necessarily hold in a network context.

Emerging methods bridging CBMs and biochemical kinetics are expected to reduce these limitations.

## Competing interests

The authors declare that there are no competing interests associated with the manuscript.

## Funding

This work was supported by the SystemsX.ch projects TbX [grant number #2013/454] and GutX [grant number #2014/263] of the Swiss National Science Foundation.

**Abbreviations:** CBM, constraint-based model; DFBA, dynamic flux balance analysis; FBA, flux balance analysis; GSM, genome-scale model; IOMA, integrative omics-metabolic analysis; KFBA, FBA with kinetic constraint; ODE, ordinary differential equation; pFBA, parsimonious enzyme usage FBA; TMFA, thermodynamic-based metabolic flux analysis

- © 2018 The Author(s). Published by Portland Press Limited on behalf of the Biochemical Society