With the growing importance of computational models in systems biology there has been much interest in recent years to develop standard model interchange languages that permit biologists to easily exchange models between different software tools. In the present chapter two chief model exchange standards, SBML (Systems Biology Markup Language) and CellML are described. In addition, other related features including visual layout initiatives, ontologies and best practices for model annotation are discussed. Software tools such as developer libraries and basic editing tools are also introduced, together with a discussion on the future of modelling languages and visualization tools in systems biology.
What do screw threads, TTL (Transistor–Transistor Logic) and SBML (Systems Biology Markup Language) have in common? Standardization of course. Standardization is one of those overlooked but essential features of modern life without which our world would largely come to a grinding halt. The main benefit of standardization is increased productivity. Thus the introduction in the 19th century of standard screw threads by Whitworth in the U.K. and Sellers in the U.S.A. made the manufacture of the humble bolt and screw cheaper and more reliable. Likewise the introduction of TTL circuits in 1962 by Texas Instruments led to the standardization of the electrical characteristics for digital circuits and allowed the development of reliable and cheap interchangeable logic parts for the growing electronics industry. History has shown us repeatedly that, on the whole, standardization is a good thing.
In systems biology there are already proposed standards for accomplishing certain things. In particular, model exchange has been an active area of discourse where at least two standards, SBML and CellML, have been proposed as a means for users to exchange models between different software tools. Prior to 2000, each software tool used its own format to store models. This meant that is was very difficult to move a model from one software tool to another and, worse still, if a software tool ceased development and was no longer supported (which is all too common in systems biology), it would become imperative to translate one’ models to a different tool. Clearly this state of affairs was quite unproductive and, out of these obvious shortcomings, a number of groups set out to gather community support to develop a standard that model developers would be happy to use. This meant that a model could be stored in a format that was independent of the software tool. There was an early effort in 1998 by the BTK (BioThermoKinetics) group to standardize on a practical format for exchanging models between two widely used tools, Gepasi  and SCAMP . Around the same time, bioengineers at the University of Auckland (Auckland, New Zealand) began investigating the role that XML  could play in defining a standard for exchanging computational models in order to reduce errors that appeared frequently in published models. From the Auckland team emerged CellML . Members from the BTK group subsequently took their experience and contributed significantly to the other major model exchange standard, called SBML . SBML was developed in 2000 at Caltech (California Institute of Technology, Pasadena, CA, U.S.A.) as a result of funding received from the Japanese ERATO (Exploratory Research for Advanced Technology) programme. Both CellML and SBML are today viewed as the main de facto standards for exchanging cellular network models. There are, however, fundamental differences between the approaches that CellML and SBML take in the way models are represented which we will touch upon in the present chapter.
Quantitative models based on differential equations
Many simulation models in systems biology are constructed using differential equations. These equations describe the continuous rate of change of molecular species in time. Other researchers use a stochastic-based description either by explicitly modelling the particulate nature of the cellular milieu or by simply adding noise to differential equations. When modelling systems are based on differential equations, many researchers will express these models using the following equation: where S is the vector of molecular species concentrations; N, the stoichiometry matrix; v the rate vector; and p a vector of parameters which can influence the evolution of the system. Many software tools will permit users to enter models as a list of reactions and then automatically generate the mathematical model [2,6,7].
De facto standards
Although there has yet been no ratification of a standard exchange format by an official body such as OASIS or ISO (International Standards Organization), both SBML and CellML are considered as de facto standards simply because they are so widely used. In this section we will consider each standard, although our focus will mainly be on SBML and related technologies.
CellML  represents cellular models using a mathematical description. In addition, CellML represents entities using a component-based approach where relationships between components are represented by connections. The literal translation of the mathematics however goes much further, in fact the representation that CellML uses is very reminiscent of the way an engineer might wire up an analogue computer to solve the equations (though without specifying the integrators). As a result CellML is very general and in principal could probably represent any system that has a mathematical description [and not just the kind indicated by (eqn 1)]. CellML is also very precise in that every item in a model is defined explicitly. However, the generality and explicit nature of CellML also results in increased complexity especially for software developers.
Given the general component/relationship-based approach of CellML, models defined using CellML introduce biological information by way of metadata. As CellML is an XML-based language there are natural ways for introducing metadata. Most prominent is to embed annotations using the RDF (Resource Description Framework; http://www.w3.org/RDF/). RDF allows the description of specific CellML elements. Within the individual RDF descriptions the metadata are further qualified by relying on a standardized set of terms from the DCMI (Dublin Core Metadata Initiative; http://dublincore.org/documents/dcmi-terms/). These terms (i.e. authors, dates, titles and so on) can then be easily mined by other applications by just looking over the model. Apart from the general approach that relies on RDF and DCMI, CellML metadata can also be in the form of a BioPAX description. BioPAX (or Biological Pathway Exchange; http://www.biopax.org) is an ontology that defines biological pathway data, such as metabolic pathways or molecular interactions. BioPAX thus represents a perfect complement to CellML’ rigorous mathematical formalism.
The CellML team has amassed a very large suite (hundreds) of models (http://www.cellml.org/models) which provides many real examples of CellML syntax. This is an extremely useful resource for the community.
Whereas CellML attempts to be highly comprehensive, SBML was designed to meet the immediate needs of the modelling community and is therefore more focused on a particular problem set. One result of this is that the standard is simpler compared with CellML although more recent revisions add new functionality so that the difference in complexity between CellML and SBML is becoming less significant. Like CellML, SBML is based on XML, however, unlike CellML, it takes a different approach to representing cellular models. The way SBML represents models closely maps the way existing modelling packages represent models. Whereas CellML represents models as a mathematical wiring diagram, SBML represent models as a list of chemical transformations. Since every process in a biological cell can ultimately be broken down into one or more chemical transformations this was a natural representation to use. However SBML does not have generalized elements such as components and connections, SBML employs specific elements to represent spatial compartments, molecular species and chemical transformations. In addition to these, SBML also has provision for rules which can be used to represent constraints, derived values and general maths which for one reason or another cannot be transformed into a chemical scheme.
SBML, like any standard, evolves with time . Major revisions of the standard are captured in levels, while minor modifications and clarifications are captured in versions. An example of a major change within the standard would be the use of MathML in Level 2 of SBML, whereas Level 1 encoded infix strings to denote reaction rates and rules. A minor change on the other hand would, for example, be the introduction of semantic annotations [see the section on MIRIAM (Minimum Information Requested In the Annotation of biochemical Models) below] that can be added to SBML Level 2 version 2, whereas this was not possible in a supported fashion in earlier versions. At the time of writing the present chapter, SBML Level 3 is still in development. With Level 3 the standard will develop in an extensible manner. This means there will be a set of core features that must be supported around which additional features, such as spatial modelling, can be included.
For a well-annotated repository of SBML models see the BioModels Database (http://www.ebi.ac.uk/biomodels/).
SBML development tools
The success of SBML over competing standards can be ascribed in part to libSBML (http://sbml.org/software/libsbml/). LibSBML is a software library provided by the SBML Team. The software library is based around a C/C++ core, with wrappers provided for many programming languages. Furthermore, the library is available for Windows and POSIX (Portable Operating System Interface) operating systems, and thus can be used virtually anywhere. With an abundance of documentation and available examples, software developers can readily use libSBML for their SBML support.
By using libSBML, a developer no longer has to worry about the level and version of an SBML document the software has to read, as libSBML encapsulates this and is even able to convert SBML models into the desired form. Hence a software developer can focus on how to interpret computational models rather than concerning themselves with the mechanics of reading and writing SBML. At the time of writing the present chapter, libSBML has released version 3. New in this version are features to validate the model, such as unit consistency checking, or checks on whether the model is over-determined. LibSBML now also provides support for MIRIAM-compatible annotations .
SBMLeditor , developed by Nicolas Rodriguez at the European Bioinformatics Institute, represents a low level SBML editor. The curators of the BioModels Database , a repository of well-annotated, curated and simulatable models, use this tool to annotate and curate the models. A user of the editor can view the XML tree, and make changes to the model. The software can convert MathML into the infix notation and back, in order to facilitate editing of kinetic laws, initial assignments, rules and constraints. SBMLeditor can check for consistency and the validity of the model by applying the libSBML consistency validators. SBMLeditor also features the SBW (Systems Biology Workbench) menu : this allows a user to send the model for further analysis, simulation or visualization to any installed module of the SBW . The upcoming version of the SBMLeditor will support the SBO (Systems Biology Ontology), as well as SBML Level 2 version 3.
Here we give an example of an SBML model. The following SBML text encodes a very basic model: Node0 as well as Node2 have been chosen as model boundaries, that is they have fixed concentrations. The model features two reactions, converting Node0 into Node1, and Node1 into Node2, and both employ mass action kinetics. It should be mentioned that in SBML each species ‘lives’ in a compartment.
Even this simple example shows how the structured XML format, which is optimally readable by a machine, becomes unwieldy for humans to read. The format becomes even more complicated to read once we start annotating the model, say by identifying “Node1” as a rat epidermal growth factor:
The problem is not so much that the format is unintelligible–it is simply a problem of long-running scopes which have to be remembered. Fortunately SBML does not have to be written by humans. Looking at the home page of SBML (http://sbml.org) we find more than 120 software applications available supporting SBML and many modelling tools among them. A recent review compares some of these tools .
Other related standards
Graphical modelling applications  routinely enhance computational models by layout annotations. Recently the SBML community has decided on a common standard on how to embed the layout information within SBML. The layout extension  allows a model to store the size and dimension of all model elements, along with textual annotations and reactions. Originally the view was to embed the layout extension in a model annotation for Level 2 versions of SBML but with the upcoming Level 3 the layout extension will be added to SBML as a first-class construct. LibSBML has been modified to provide access to all elements of the layout extension. Also several reference implementations exist [12,15].
Whereas the layout extension is concerned with representing simple elements, the SBGN (Systems Biology Graphical Notation; http://sbgn.org) aims to standardize the visual language of computational models unambiguously. Although this standard is still in development and strictly speaking independent of the SBML effort, experience in other fields such as electrical engineering has demonstrated the essential need for standardizing the visual notation for representing models in diagrammatic form.
Model definition languages such as SBML and CellML target the exchange of models. They aim to pass on the quantitative computational model from one software tool to another. However, these description formats do not concern themselves with semantic annotations. Semantic annotations here would uniquely identify model constituents, information about the relationship between model elements or could be basic identifications of model author and the date of last modification. These annotations can be interpreted even by software without any knowledge of the model definition languages. Both SBML and CellML have launched efforts to remedy this problem. Both communities agreed on MIRIAM . These annotations aim to further the confidence in quantitative biochemical models, making it easier and more precise to search for particular biochemical models, enabling researchers to identify biological phenomena captured by a biochemical model and perhaps most importantly to facilitate model reuse and model composition. In order to call a model MIRIAM compliant, the model has to be encoded in a standard format, such as SBML. Furthermore, it needs to be tied to a reference description, describing the properties and results that can be obtained from the model. Parameters of the computational model have to be provided so that the model can be loaded into a simulation environment where the results can be reproduced. Other information that has to be provided is a name for the model, the creator of the model, the date and time of the last modification, as well as a statement about the terms of distribution.
In order to assign meaning to model constituents an ontology specific to systems biology has been developed : SBO (http://www.ebi.ac.uk/sbo/). The controlled vocabulary consists of two relationships: is-part-of and is-a. Qualifying model participants, say as enzyme, macromolecule, metabolites or small species such as ions, will make it easy to generate meaning from the model. It will make the generation of standard visual notations such as SBGN possible. Moreover it presents a solution on how to interpret the model computationally, as the SBO allows tagging of a model as continuous, discrete or a logical model. One could even go a step further, making kinetic interaction in a model obsolete, by referencing that the rate law is one specified by an ontology identifier (e.g. tagging a reaction as following Henri–Michaelis–Menten enzyme kinetics and specifying the parameters). Since SBML Level 2 version 3, all SBML elements feature an optional sboTerm attribute, which makes tagging elements with the corresponding SBO term straightforward. The SBO is community driven and new terms or modifications to the existing ontology can be requested by the community.
Future prospects and conclusions
The most recent developments in CellML and particularly the SBML communities revolve around the creation of ontologies and refining the exchange semantics. Apart from classifying model constituents with an appropriate ontology, one of the current areas of interest is describing the dynamical behaviour of a model. TEDDY (Terminology for the Description of Dynamics; http://www.ebi.ac.uk/compneur-srv/teddy/) provides a rich ontology to describe and quantify the behaviour that a computational model is able to exhibit (e.g. the characteristics of a model could describe bifurcation behaviour where the functionality of a model could be described as featuring oscillations or switch behaviour). However, knowing that a model exhibits interesting behaviour is not enough: more information is needed in order to recreate that behaviour. The MIASE (Minimum Information About a Simulation Experiment; http://www.ebi.ac.uk/compneur-srv/miase/) project focuses on this problem. MIASE will help to describe the simulation algorithms and the simulation tool used along with all needed parameter settings. In order to do so it will use the KiSAO (Kinetic Simulation Algorithm Ontology) that relates simulation algorithms and methods to each other. As these ontologies are just being formulated, it will be interesting to see how they progress and are taken up by the community.
Although most recent developments in standardization have focused on the use of XML to represent models, there is a long tradition in the field to describing models using human readable text-based formats. Indeed the very first simulator BIOSSIM , allowed a user to describe a model using a list of reaction schemes. Variants of this have been employed by a number of simulators since, including SCAMP , Jarnac , E-Cell  and, more recently, PySCeS (Python Simulator for Cellular Systems) . Being able to represent models in a human readable format offers many advantages, including conciseness, portability and ease of manipulation via a simple text editor.
There has also, in recent years, been a movement [20–22] to develop models based on the idea that cellular pathways, particularly signalling pathways, operate on a different molecular scale. This view focuses on the idea that the number of actual states in a pathway expands exponentially with the number of atomic and covalent modification states. This view represents a significant departure from the traditional picture of a biochemical pathway. In many cases the number of states can be extremely large which means that the average number of molecules in each state can be correspondingly extremely low. The number of connections is likewise very large. This approach necessitates a different method of modelling and in fact there is currently no theoretical framework that can adequately describe the dynamics of such an assembly. In view of this dramatic change in how we perceive signalling pathways, the traditional methods [2,23–25] that are used to define a ‘signalling pathway’ are unworkable and efforts have been made to develop rule-based methods to describe such systems. The most well known of these is BioNetGen (Biological Network Generator, ), which allows a user to define the rules by which states and transformations are defined; computer software is then used to expand the set of rules into the full state model. Interchange standards such as SBML and CellML will need a significant revision to deal with rule-based models.
In the present chapter we have briefly summarized some of the developments and future prospects for model interchange in systems biology. Establishing standards is at the best of times very difficult, the process of acceptance is largely sociological and many factors contribute to the acceptance of a standard by a community. Although a small but significant minority of biologists are now publishing models in one of the two main interchange formats, many of the models we see published in peer-reviewed journals are published either in proprietary formats or are simple listings of equations in an appendix. As computational modelling becomes more important in biology, and as databases such as the BioModels Database make models more accessible, this will undoubtedly change.
Three web sources which are of interest to readers of this chapter include: (i) http://www.cellml.org (this is the main CellML site which has a very rich set of models expressed in CellML including specifications for the standard and pointers to software toolkits); (ii) http://www.sbml.org (this is the main SBML site; the site has ample documentation, examples illustrating how SBML is and should be used, and in addition has a rich set of software tools, in particular libSBML, which allows developers to easily add SBML support to their tools); and (iii) http://www.sys-bio.org (this is the main SBW site where the latest versions for SBW, developer documentation, example models, screenshots and user guides can be obtained; in addition a link to the main sourceforge site is given where all the source code for SBW is made available).
• Model exchange standards allow the free flow of computational models between different researchers.
• A variety of proposed standards exist, in particular, SBML and CellML are the most important.
• Standards such as CellML and SBML has enabled the development of model repositories such as the BioModels Database and comparison sites such as found at www.sys-bio.org.
• The initial SBML and CellML standards have spawned a variety of other initiatives, including the development of graphical standards, model behaviour standards, and new ontologies such as SBO.
We would like to acknowledge the generous support from the Japan Science and Technology Agency, DARPA (Defense Advanced Research Projects Agency; BAA01-26 Bio-Computation) and the US Department of Energy GTL (Genomics to Life) programme. We would also like to thank Anastasia Deckard for critically reading the manuscript.
- © The Authors Journal compilation © 2008 Biochemical Society