1 of 51

Introduction, history, and evolution 

Herbert Sauro

University of Washington, Bioengineering

The Path to Model Credibility

2 of 51

A little bit of history…

  • It starts around 1998/1999

  • Several modelers were frustrated that they couldn't exchange biochemical models between different modeling tools due to the use of incompatible formats. At that time there were about three modeling tools.

  • This was especially problematic if a particular software tool no longer worked rendering your model inaccessible. �
  • Academic software generally has a short-life span.

3 of 51

A little bit of history…

  • Funding from the Japanese ERATO program via Hiroaki Kitano, kick started the development of the Systems Biology Markup Language biology.

  • First version released in March 2001 with a publication in 2003

Hamid Bolouri, John Doyle, Andrew Finney, Mike Hucka, Herbert Sauro

4 of 51

A little bit of history…

This spawned a new community that focused on other aspects of infrastructure in systems biology, for example

Pathway diagrams (package extension to SBML),

Model repositories (Biomodels, JWS Online, BiGG, SEED, modelDB),

New ontologies (SBO, KiSAO, etc)

Standard ways to describe simulation experiments (SEDML)

5 of 51

A little bit of history…

  • Much of the work in this area is documented in the community's organization COMBINE (https://co.mbine.org/) and sbml.org and on Wikipedia.
  • We have two meetings a year, a conference and a hands-on hackathon. Last one was in Seattle, April 2023
  • This is very much a grass-roots effort where a lot of the work is done by younger scientists. This is a top down structure, its up to the community at large to drive the effort forward. If there is something missing, get involved.

6 of 51

SBML in a Nutshell: Systems Biology Markup Language

  • A machine-readable format for representing computational models in systems biology
  • Domain: systems of biochemical reactions
  • Specified using XML
  • Components in SBML reflect the natural conceptual constructs of the domain. Ie SBML does not store a final mathematical model
  • Over 200 tools use SBML, including commercial tools like MATLAB
  • Its particular good at repressing metabolic, signaling and gene regulatory pathways together with compartments, events, etc.

6

7 of 51

Scope of present day SBML

8 of 51

SBML Ecosystem: COMBINE Standards

https://www.degruyter.com/document/doi/10.1515/jib-2020-0022/html

9 of 51

SBML Ecosystem: Input from IMAG

https://www.degruyter.com/document/doi/10.1515/jib-2020-0022/html

10 of 51

Some Terminology

  • SBML : Model description format of subcellular processes
  • Antimony – Human readable from of SBML
  • SED-ML – A simulation experiment description language
  • COMBINE – Archive format (OMEX) for storing material related to a model

11 of 51

Center for Reproducible Biomedical Models - moving toward model credibility

The Path to Model Credibility

12 of 51

FAIR in Modeling

https://ccafs.cgiar.org/open-access-and-fair-principles

13 of 51

FAIR in Modeling

  • Reproducibility – can we recreate what you did?
  • Reusable – can I reuse a published model in new work?
  • Understandable – can we understand what you did?
  • Credibility – can we trust the model?

Reproducible

Understandable

Credible

Reusable

The results of a modeling study should be reproducible.

A published model should be reusable, either parts or combined with other models.

A model should be understandable.

A modeling should be credible in terms of it ability to predict future behavior. Its construction should be transparent in terms of decisions made, data acquired etc, a model should be verifiable

14 of 51

Reproducibility: A widespread problem

Incomplete parameters

Incomplete model definition

No parameter values

Analysis procedure not described

Irreproducible

Parameter incorrectly annotated

Irreproducible

No language for describing large models

Incomplete model definition

15 of 51

Reproducibility: A widespread problem

15 years ago, almost 100 % of all published models

in systems biology could not be reproduced from

the published article.

Side note: The terminology using to describe reproducibly is highly variable. There are at least three camps depending on the scientific domain and specific communities. I won’t be discussing the subtleties of this today. The best place to read about this is, without doubt, the following paper:

16 of 51

Today: 50% of models are published reproducible

At least half of all systems biology models published today are in SBML.

Large repositories such as BioModels, BiGG and SEED contain 1000s of curated models in SBML format.

In 2018, every month on average over 23 000 unique hosts accessed BioModels approximately 816 000 times, downloading 232 GB data from BioModels.

17 of 51

Reproducible models are more frequently cited.

This is especially the case

for models in SBML which

not only makes them

reproducible but

also reusable. Reusable

models are more cited.

18 of 51

Reproducibility Portal

https://reproducibilityportal.org/

Technical:

Using emscripten we have translated libsbml and libantimony to Web assembly with a JavaScript wrapper.

It means we now have full SBML support inside a browser.

It also means we can create pure client based web apps than can be served from a dumb server such as GitHub or cloudflare. Such apps can survive long after a grant award ends and have a much longer shelf-life than desktop apps.

19 of 51

BioSimulations

BioSimulations Provides a  Platform that Integrates Models, Model Languages, Model Repositories, Simulation Experiments, Simulations Tools and Data Visualizations

Availability: free and open source

Main URL: https://biosimulations.org/

Select links:

- https://biosimulators.org/ - registry of simulators

- https://biosimulations.org/projects - database of simulations

- https://run.biosimulations.org/ - execute and visualize

The BioSimulations project acknowledges contributions from numerous developers of modeling standards, ontologies, repositories, and simulation tools.

Biosimulations is a simulation farm and a model verification site.

It also hosts over a 1000 reproducible models which the

Reproducibilityportal.org can access.

20 of 51

BioSimulations

RunBioSimulations: an extensible web application that simulates a wide range of computational modeling frameworks, algorithms, and formats

May 2021 Nucleic Acids Research 49(W1) DOI:10.1093/nar/gkab411

The development of biosimulations involved the

collaboration of over 80 researchers.

21 of 51

Why is BioSimulations useful

  1. Easy model verification.
  2. Easily rerun published simulations.
  3. A simulation farm that allows users to use

simulation apps without having to install them

4. A repository of over 1000 reproducible models

5. Download runnable models from over 1000 articles

6. The reproducibilityportal.org is an example of a

site that uses the biosimulations API

22 of 51

Biosimulations site

23 of 51

Future Efforts

    • New SED-ML
    • Refactor Rest-API to make it easier to use
    • Multiscale models:
      • Help modelers create reproducible multiscale models
      • Help users package complex multiscale models that use multiple modeling approaches.
      • Work with CompuCell3D, PhysiCell and Tissue Forge

24 of 51

SED-ML: Simulation Experiment Description ML

24

From Mike Hucka

25 of 51

Reusability

25

From Mike Hucka

26 of 51

Reusability: 442 line of MATLAB code

ERK_MEK = [y(30) y(31)];

% This is the actual value of E1_tot in the presence of feedback

% alpha = -1 (negative feedback)

E1_Fac = E1_tot*(1.0 + alpha*((ERKPP/Ki))./(1.0 + ((ERKPP/Ki))));

% Remaining species using the conservation rules - E1, E2, Raf; MEKn, ERKn,

% P1n and P2n

E1 = [E1_Fac(1) - Raf_E1(1)]; E1(2) = 0;

E2 = [E2_tot - RafP_E2(1)]; E2(2) = 0;

Raf = [Raf_tot - (RafP(1) + Raf_E1(1) + RafP_E2(1) + MEK_RafP(1) + MEKP_RafP(1))]; Raf(2) = 0;

Modified_MEK_Species = (MEKP + MEKPP + MEK_RafP + MEKP_RafP + MEKP_P1 + MEKPP_P1 + ERK_MEKPP + ERKP_MEKPP + ERK_MEK);

Total_Modified_MEK_Species = Modified_MEK_Species(1) + Modified_MEK_Species(2)/Vrat;

MEK(2) = (MEK_tot - MEK(1) - Total_Modified_MEK_Species)*Vrat;

Modified_ERK_Species = (ERKP + ERKPP + ERK_MEKPP + ERKP_MEKPP + ERKP_P2 + ERKPP_P2 + ERK_MEK);

Total_Modified_ERK_Species = Modified_ERK_Species(1) + Modified_ERK_Species(2)/Vrat;

ERK(2) = (ERK_tot - ERK(1) - Total_Modified_ERK_Species)*Vrat;

P1(2) = Vrat*(P1_tot - P1(1) - (MEKP_P1(1) + MEKPP_P1(1)) - (MEKP_P1(2) + MEKPP_P1(2))/Vrat);

P2(2) = Vrat*(P2_tot - P2(1) - (ERKP_P2(1) + ERKPP_P2(1)) - (ERKP_P2(2) + ERKPP_P2(2))/Vrat);

dMEKP__dt = k(3)*MEK_RafP - a(4)*MEKP.*P1 + d(4)*MEKP_P1 - ...

a(5)*MEKP.*RafP + d(5)*MEKP_RafP + k(6)*MEKPP_P1 + MEKP*(TM_pMEK);

dMEKP_P1__dt = a(4)*MEKP.*P1 - (d(4) + k(4))*MEKP_P1 + MEKP_P1*(TM_MEK_Complexes);

dMEKP_RafP__dt = a(5)*MEKP.*RafP - (d(5) + k(5))*MEKP_RafP; % NO TRANSPORT

dMEKPP__dt = k(5)*MEKP_RafP - a(6)*MEKPP.*P1 + d(6)*MEKPP_P1 - ...

k7*ERK.*MEKPP./S7 + (d(7)+k(7))*ERK_MEKPP - ...

k9*ERKP.*MEKPP./S9 + (d(9)+k(9))*ERKP_MEKPP + MEKPP*(TM_pMEK);

dMEKPP_P1__dt = a(6)*MEKPP.*P1 - (d(6)+k(6))*MEKPP_P1 + MEKPP_P1*(TM_MEK_Complexes);

dP1__dt = -a(4)*MEKP.*P1 + (d(4)+k(4))*MEKP_P1 - a(6)*MEKPP.*P1 + (d(6)+k(6))*MEKPP_P1 + P1*(TM_P);

dERK__dt = -k7*ERK.*MEKPP./S7 + d(7)*ERK_MEKPP + k(8)*ERKP_P2 + ERK*(TM_ERK) ...

- k11*ERK.*MEK./S11 + d(11)*ERK_MEK;

dERK_MEKPP__dt = k7*ERK.*MEKPP./S7 - (d(7)+k(7))*ERK_MEKPP + ERK_MEKPP*(TM_MEK_Complexes);

39 differential equations, 11 conservation laws and two compartments

27 of 51

Reusability: 256 lines of SBML model

species E1_MKKK in cytoplasm, P_MKKK in cytoplasm, E2_P_MKKK in cytoplasm;

species MKK_c in cytoplasm, P_MKKK_MKK in cytoplasm, P_MKK_c in cytoplasm;

species P1_P_MKK_c in cytoplasm, P_MKKK_P_MKK in cytoplasm;

species PP_MKK_c in cytoplasm, P1_PP_MKK_c in cytoplasm, P1_c in cytoplasm;

species ERK_c in cytoplasm, PP_MKK_ERK_c in cytoplasm, P_ERK_c in cytoplasm;

species P2_P_ERK_c in cytoplasm, PP_MKK_P_ERK_c in cytoplasm, PP_ERK_c in cytoplasm;

species P2_PP_ERK_c in cytoplasm, P2_c in cytoplasm;

species P_MKK_n in nucleus, P1_P_MKK_n in nucleus, PP_MKK_n in nucleus;

r1a: MKKK + E1 -> E1_MKKK; cytoplasm*(r1a_a1*E1*MKKK - r1a_d1*E1_MKKK);

r1b: E1_MKKK => E1 + P_MKKK; cytoplasm*r1b_k1*E1_MKKK;

r2a: P_MKKK + E2 -> E2_P_MKKK; cytoplasm*(r2a_a2*E2*P_MKKK - r2a_d2*E2_P_MKKK);

r2b: E2_P_MKKK => E2 + MKKK; cytoplasm*r2b_k2*E2_P_MKKK;

r3a: MKK_c + P_MKKK -> P_MKKK_MKK; cytoplasm*(r3a_a3*MKK_c*P_MKKK - r3a_d3*P_MKKK_MKK);

r3b: P_MKKK_MKK => P_MKK_c + P_MKKK; cytoplasm*r3b_k3*P_MKKK_MKK;

r4a: P_MKK_c + P1_c -> P1_P_MKK_c; cytoplasm*(r4a_a4*P_MKK_c*P1_c - r4a_d4*P1_P_MKK_c);

r4b: P1_P_MKK_c => MKK_c + P1_c; cytoplasm*r4b_k4*P1_P_MKK_c;

r5a: P_MKK_c + P_MKKK -> P_MKKK_P_MKK; cytoplasm*(r5a_a5*P_MKK_c*P_MKKK)

r5b: P_MKKK_P_MKK => PP_MKK_c + P_MKKK; cytoplasm*r5b_k5*P_MKKK_P_MKK;

r6a: PP_MKK_c + P1_c -> P1_PP_MKK_c; cytoplasm*(r6a_a6*PP_MKK_c*P1_c - r6a_d6*P1_PP_MKK_c);

r6b: P1_PP_MKK_c => P_MKK_c + P1_c; cytoplasm*r6b_k6*P1_PP_MKK_c;

r7a: ERK_c + PP_MKK_c -> PP_MKK_ERK_c; cytoplasm*((r7a_A7*PP_MKK_c*ERK_c)/(r7a_Ks7 + ERK_c)

r7b: PP_MKK_ERK_c => P_ERK_c + PP_MKK_c; cytoplasm*r7b_k7*PP_MKK_ERK_c;

It took us a month (probably a weeks worth of work

in total) to get this working again in a reproducible

manner.

28 of 51

Reusability: Modular Modeling

29 of 51

Combing Modules

EGFR Signaling Network

Module 2

Runtime

Validation via time-course data

Model calibration: �Diff Evol/Boostraping

Executable

Module 1

Module 3

Module 4

Flatten

Tellurium

Antimony

SBML

SBML -> Code Compiler

30 of 51

How it works

Using SBML we describe the model in terms of the underlying biology and mathematical elements.

A compiler converts SBML into an executable form, the final mathematical instantiation is a user decision

Common Declarative Format of a Biological System

Python

MATLAB

Etc

Julia

ODE Model

Stochastic Model

Stoichiometric Model

Graph Model

Etc

31 of 51

Understandable

32 of 51

Model annotation

  • Problem: How to define the biological meaning of SBML elements? e.g. Glc stands for Glucose?, D-Glucose? L-Glucose?
  • SBML and other similar formats allows a modeler to add a semantic layer to the base models.
  • This allows more precise identification of model components:
    • Better search and discovery of models
    • Easier to comparing models
  • Adds a semantic layer:
    • Helps users better understand the underlying biology
    • Allows for better reuse of models
    • Converts concersion of models fron one form to anotgher

Giving Meaning to a Model

33 of 51

Model annotation

Giving Meaning to a Model

<reaction metaid="metaid_vGLK" sboTerm="SBO:0000176" id="vGLK" name="hexokinase">

<annotation>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqmodel="http://biomodels.net/model-qualifiers/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">

<rdf:Description rdf:about="#metaid_vGLK">

<bqbiol:isVersionOf>

<rdf:Bag>

<rdf:li rdf:resource="http://identifiers.org/ec-code/2.7.1.1"/>

<rdf:li rdf:resource="http://identifiers.org/obo.go/GO:0004396"/>

</rdf:Bag>

</bqbiol:isVersionOf>

<bqbiol:is>

<rdf:Bag>

<rdf:li rdf:resource="http://identifiers.org/kegg.reaction/R02848"/>

</rdf:Bag>

</bqbiol:is>

</rdf:Description>

</rdf:RDF>

</annotation>

<listOfReactants>

<speciesReference metaid="_165915" species="GLCi"/>

<speciesReference metaid="_165927" species="ATP"/>

</listOfReactants>

<listOfProducts>

<speciesReference metaid="_165939" species="G6P"/>

<speciesReference metaid="_165951" species="ADP"/>

</listOfProducts>

34 of 51

Model annotation

The kinds of things you can annotate a model with:

    • Unambigously idenifiy a molecular species (ChEBI)
    • Unambigously idenify a process (Rhea)
    • Add semantics to process rate laws:
      • What kind of process law is being used (SBO)
      • What assumptions went in to chosing a particular process law (SBO)

Giving Meaning to a Model

35 of 51

Annotating models

Annotating

models is tedious

We are about to release a python

package that will allow

you to automatically annotated

metabolic models specially species

and reactions.

Uses existing annotate models

From BiGG and Biomodels to infer

new annotations.

https://github.com/sys-bio/AMAS

36 of 51

Visual Studio Code Extension

Visual Studio Extension to help users annotate and inspect models

  1. Browsing Biomodels
  2. Annotating models
  3. Syntax highlighting
  4. Hover messages
  5. Error detection
  6. Automatic creation of rate laws

Future, automatically about SBO

terms to the model

Merge AMAS with VSC extension

https://github.com/sys-bio/vscode-antimony

37 of 51

Model Credibility

Inspired by MEMOTE and the great work done by CPMS.

Credibility checks are probably more important for large models or

models of national importance.

38 of 51

SBML is widely used by the whole genome

scale model community.

There are now 1000s of genome scale

models stored at repositories both

in Europe and the US. SBML provides a

systematic way to represent such as model

As well as allow those models to be heavily

annotated with additional metadata.

The fact that the models are in SBML format

means that software can be used to do

deep dives in the model to check for any issues.

MEMOTE

39 of 51

MEMOTE

40 of 51

MEMOTE

41 of 51

Model Credibility of Kinetic Models

The CPMS requires manual inspection

of a model.

However, if we used model formats like

SBML together with the combine archive

we could automatically inspect a model

and drill Into a model for a much

closer inspection.

1. Recent advances in biomedical simulations:

a manifesto for model engineering

Hellerstein and Sauro, F1000, 2019

2. Adapting Modeling and Simulation Credibility

Standards to Computational Systems Biology

https://arxiv.org/abs/2301.06007, 2023

Lillian T. Tatka, Lucian P. Smith, Joseph L. Hellerstein, Herbert M. Sauro

42 of 51

Checking a model using computer readable formats.

Data:

Sources,

Uncertainty

Model Tests (unit and system tests)

Combine

archive

Model:

Sources,

Assumptions,

Uncertainty

Model formats like SBML let us inspect a model

and drill into a model for a much closer inspection.

1. Recent advances in biomedical simulations:

a manifesto for model engineering

Hellerstein and Sauro, F1000, 20219

2. Adapting Modeling and Simulation Credibility

Standards to Computational Systems Biology

https://arxiv.org/abs/2301.06007, 2023

Lillian T. Tatka, Lucian P. Smith,

Joseph L. Hellerstein, Herbert M. Sauro

Accepted in the Journal of Translational Medicine

And will be available once we pay the open access charge.

43 of 51

Deep dives into a model

44 of 51

Example credibility checking app: ratesb

Technical:

This is also a client side web app served from a GitHub page. It uses the JavaScript adapted SBML libraries.

It also uses pyScript to run some of our existing Python code in the browser itself.

We get to double-dip both our C++ code and python code bases.

Ratesb can be used to identify errors in rate laws, especially useful for large models

https://sys-bio.github.io/ratesb/

45 of 51

Acknowledgements

Center CoPIs:

Ion Moraru - UConn team

John Gennari – UW Annotation

Jonathan Karr – company spinoff

Eran Agmon (SEDML V2, multisim)

Aldridch Fan (ratesb)

Eva Liu (VSC extension)

Anish Konanki (VSC extension)

Bart Jardine (real time simulator, makesbml)�Maxwell Neal (Annotation)

Michael Kochen, Steve Wiley,

Song Feng (EGFR model)

The entire COMBINE community

David Nickerson (ratesb, model curation)

Karin Lundengard (model curation)

Bilal Shaikh (biosimulations)

Lucian Smith (biosimulations, SBML, SED-ML)

Joe Hellerstein (portal, VSC, ratesb, AMAS)

Veronica Porubsky (Outreach, reviews)

Woosub Shin (Automated annotation)

Lilly Tatka – Model credibility

The BioModels Team in the UK

https://reproduciblebiomodels.org/

46 of 51

47 of 51

The three degrees of reproducibility

Model

Data

Algorithms

On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications September 2018 Ai Magazine 39(3):56-68 DOI:10.1609/aimag.v39i3.2816

SBML

48 of 51

Objectives

Biological Problem

Encoded in a reusable form

Executable form

Runtime

Mathematical Approach

Compiler

Static Credibility Checks

Dynamic Credibility Checks

Model calibration

C/C++, Python, Julia or MATLAB

49 of 51

Examples

Whole Genome kinetic model

Encoded in SBML

Flux Balance Analysis

Runtime

COBRApy

Static Checks: Mass-balance

Validation via perturbations in vivo

Model calibration: �ensemble using MCMC

Steady State linlog model

Runtime

Tellurium

50 of 51

Reproducibility Portal

https://reproducibilityportal.org/

51 of 51

Example of a Pure Client Based Tool

Technical:

This is an example of a purely client based web app served for a GitHub page.

Such tools will live long after a grant award has ended.

https://github.com/sys-bio/makesbml