Introduction, history, and evolution
Herbert Sauro
University of Washington, Bioengineering
The Path to Model Credibility
A little bit of history…
A little bit of history…
Hamid Bolouri, John Doyle, Andrew Finney, Mike Hucka, Herbert Sauro
A little bit of history…
This spawned a new community that focused on other aspects of infrastructure in systems biology, for example
Pathway diagrams (package extension to SBML),
Model repositories (Biomodels, JWS Online, BiGG, SEED, modelDB),
New ontologies (SBO, KiSAO, etc)
Standard ways to describe simulation experiments (SEDML)
A little bit of history…
SBML in a Nutshell: Systems Biology Markup Language
6
Scope of present day SBML
SBML Ecosystem: COMBINE Standards
https://www.degruyter.com/document/doi/10.1515/jib-2020-0022/html
SBML Ecosystem: Input from IMAG
https://www.degruyter.com/document/doi/10.1515/jib-2020-0022/html
Some Terminology
Center for Reproducible Biomedical Models - moving toward model credibility
The Path to Model Credibility
FAIR in Modeling
https://ccafs.cgiar.org/open-access-and-fair-principles
FAIR in Modeling
Reproducible
Understandable
Credible
Reusable
The results of a modeling study should be reproducible.
A published model should be reusable, either parts or combined with other models.
A model should be understandable.
A modeling should be credible in terms of it ability to predict future behavior. Its construction should be transparent in terms of decisions made, data acquired etc, a model should be verifiable
Reproducibility: A widespread problem
Incomplete parameters
Incomplete model definition
No parameter values
Analysis procedure not described
Irreproducible
Parameter incorrectly annotated
Irreproducible
No language for describing large models
Incomplete model definition
Reproducibility: A widespread problem
15 years ago, almost 100 % of all published models
in systems biology could not be reproduced from
the published article.
Side note: The terminology using to describe reproducibly is highly variable. There are at least three camps depending on the scientific domain and specific communities. I won’t be discussing the subtleties of this today. The best place to read about this is, without doubt, the following paper:
Today: 50% of models are published reproducible
At least half of all systems biology models published today are in SBML.
Large repositories such as BioModels, BiGG and SEED contain 1000s of curated models in SBML format.
In 2018, every month on average over 23 000 unique hosts accessed BioModels approximately 816 000 times, downloading 232 GB data from BioModels.
23 February 2021
Reproducibility in systems biology modelling
Krishna Tiwari Sarubini Kananathan Matthew G Roberts Johannes P Meyer Mohammad Umer Sharif Shohan Ashley Xavier Matthieu Maire Ahmad Zyoud Jinghao MenSzeyi NgTung V N Nguyen Mihai Glont Henning Hermjakob Rahuman S Malik-Sheriff Molecular Systems Biology (2021)17:e9982https://doi.org/10.15252/msb.20209982
Reproducible models are more frequently cited.
This is especially the case
for models in SBML which
not only makes them
reproducible but
also reusable. Reusable
models are more cited.
Reproducibility Portal
https://reproducibilityportal.org/
Technical:
Using emscripten we have translated libsbml and libantimony to Web assembly with a JavaScript wrapper.
It means we now have full SBML support inside a browser.
It also means we can create pure client based web apps than can be served from a dumb server such as GitHub or cloudflare. Such apps can survive long after a grant award ends and have a much longer shelf-life than desktop apps.
BioSimulations
BioSimulations Provides a Platform that Integrates Models, Model Languages, Model Repositories, Simulation Experiments, Simulations Tools and Data Visualizations
Availability: free and open source
Main URL: https://biosimulations.org/
Select links:
- https://biosimulators.org/ - registry of simulators
- https://biosimulations.org/projects - database of simulations
- https://run.biosimulations.org/ - execute and visualize
The BioSimulations project acknowledges contributions from numerous developers of modeling standards, ontologies, repositories, and simulation tools.
�
Biosimulations is a simulation farm and a model verification site.
It also hosts over a 1000 reproducible models which the
Reproducibilityportal.org can access.
BioSimulations
RunBioSimulations: an extensible web application that simulates a wide range of computational modeling frameworks, algorithms, and formats
May 2021 Nucleic Acids Research 49(W1) DOI:10.1093/nar/gkab411
The development of biosimulations involved the
collaboration of over 80 researchers.
Why is BioSimulations useful
simulation apps without having to install them
4. A repository of over 1000 reproducible models
5. Download runnable models from over 1000 articles
6. The reproducibilityportal.org is an example of a
site that uses the biosimulations API
Biosimulations site
Future Efforts
SED-ML: Simulation Experiment Description ML
24
From Mike Hucka
Reusability
25
From Mike Hucka
Reusability: 442 line of MATLAB code
ERK_MEK = [y(30) y(31)];
% This is the actual value of E1_tot in the presence of feedback
% alpha = -1 (negative feedback)
E1_Fac = E1_tot*(1.0 + alpha*((ERKPP/Ki))./(1.0 + ((ERKPP/Ki))));
% Remaining species using the conservation rules - E1, E2, Raf; MEKn, ERKn,
% P1n and P2n
E1 = [E1_Fac(1) - Raf_E1(1)]; E1(2) = 0;
E2 = [E2_tot - RafP_E2(1)]; E2(2) = 0;
Raf = [Raf_tot - (RafP(1) + Raf_E1(1) + RafP_E2(1) + MEK_RafP(1) + MEKP_RafP(1))]; Raf(2) = 0;
Modified_MEK_Species = (MEKP + MEKPP + MEK_RafP + MEKP_RafP + MEKP_P1 + MEKPP_P1 + ERK_MEKPP + ERKP_MEKPP + ERK_MEK);
Total_Modified_MEK_Species = Modified_MEK_Species(1) + Modified_MEK_Species(2)/Vrat;
MEK(2) = (MEK_tot - MEK(1) - Total_Modified_MEK_Species)*Vrat;
Modified_ERK_Species = (ERKP + ERKPP + ERK_MEKPP + ERKP_MEKPP + ERKP_P2 + ERKPP_P2 + ERK_MEK);
Total_Modified_ERK_Species = Modified_ERK_Species(1) + Modified_ERK_Species(2)/Vrat;
ERK(2) = (ERK_tot - ERK(1) - Total_Modified_ERK_Species)*Vrat;
P1(2) = Vrat*(P1_tot - P1(1) - (MEKP_P1(1) + MEKPP_P1(1)) - (MEKP_P1(2) + MEKPP_P1(2))/Vrat);
P2(2) = Vrat*(P2_tot - P2(1) - (ERKP_P2(1) + ERKPP_P2(1)) - (ERKP_P2(2) + ERKPP_P2(2))/Vrat);
dMEKP__dt = k(3)*MEK_RafP - a(4)*MEKP.*P1 + d(4)*MEKP_P1 - ...
a(5)*MEKP.*RafP + d(5)*MEKP_RafP + k(6)*MEKPP_P1 + MEKP*(TM_pMEK);
dMEKP_P1__dt = a(4)*MEKP.*P1 - (d(4) + k(4))*MEKP_P1 + MEKP_P1*(TM_MEK_Complexes);
dMEKP_RafP__dt = a(5)*MEKP.*RafP - (d(5) + k(5))*MEKP_RafP; % NO TRANSPORT
dMEKPP__dt = k(5)*MEKP_RafP - a(6)*MEKPP.*P1 + d(6)*MEKPP_P1 - ...
k7*ERK.*MEKPP./S7 + (d(7)+k(7))*ERK_MEKPP - ...
k9*ERKP.*MEKPP./S9 + (d(9)+k(9))*ERKP_MEKPP + MEKPP*(TM_pMEK);
dMEKPP_P1__dt = a(6)*MEKPP.*P1 - (d(6)+k(6))*MEKPP_P1 + MEKPP_P1*(TM_MEK_Complexes);
dP1__dt = -a(4)*MEKP.*P1 + (d(4)+k(4))*MEKP_P1 - a(6)*MEKPP.*P1 + (d(6)+k(6))*MEKPP_P1 + P1*(TM_P);
dERK__dt = -k7*ERK.*MEKPP./S7 + d(7)*ERK_MEKPP + k(8)*ERKP_P2 + ERK*(TM_ERK) ...
- k11*ERK.*MEK./S11 + d(11)*ERK_MEK;
dERK_MEKPP__dt = k7*ERK.*MEKPP./S7 - (d(7)+k(7))*ERK_MEKPP + ERK_MEKPP*(TM_MEK_Complexes);
39 differential equations, 11 conservation laws and two compartments
Reusability: 256 lines of SBML model
species E1_MKKK in cytoplasm, P_MKKK in cytoplasm, E2_P_MKKK in cytoplasm;
species MKK_c in cytoplasm, P_MKKK_MKK in cytoplasm, P_MKK_c in cytoplasm;
species P1_P_MKK_c in cytoplasm, P_MKKK_P_MKK in cytoplasm;
species PP_MKK_c in cytoplasm, P1_PP_MKK_c in cytoplasm, P1_c in cytoplasm;
species ERK_c in cytoplasm, PP_MKK_ERK_c in cytoplasm, P_ERK_c in cytoplasm;
species P2_P_ERK_c in cytoplasm, PP_MKK_P_ERK_c in cytoplasm, PP_ERK_c in cytoplasm;
species P2_PP_ERK_c in cytoplasm, P2_c in cytoplasm;
species P_MKK_n in nucleus, P1_P_MKK_n in nucleus, PP_MKK_n in nucleus;
r1a: MKKK + E1 -> E1_MKKK; cytoplasm*(r1a_a1*E1*MKKK - r1a_d1*E1_MKKK);
r1b: E1_MKKK => E1 + P_MKKK; cytoplasm*r1b_k1*E1_MKKK;
r2a: P_MKKK + E2 -> E2_P_MKKK; cytoplasm*(r2a_a2*E2*P_MKKK - r2a_d2*E2_P_MKKK);
r2b: E2_P_MKKK => E2 + MKKK; cytoplasm*r2b_k2*E2_P_MKKK;
r3a: MKK_c + P_MKKK -> P_MKKK_MKK; cytoplasm*(r3a_a3*MKK_c*P_MKKK - r3a_d3*P_MKKK_MKK);
r3b: P_MKKK_MKK => P_MKK_c + P_MKKK; cytoplasm*r3b_k3*P_MKKK_MKK;
r4a: P_MKK_c + P1_c -> P1_P_MKK_c; cytoplasm*(r4a_a4*P_MKK_c*P1_c - r4a_d4*P1_P_MKK_c);
r4b: P1_P_MKK_c => MKK_c + P1_c; cytoplasm*r4b_k4*P1_P_MKK_c;
r5a: P_MKK_c + P_MKKK -> P_MKKK_P_MKK; cytoplasm*(r5a_a5*P_MKK_c*P_MKKK)
r5b: P_MKKK_P_MKK => PP_MKK_c + P_MKKK; cytoplasm*r5b_k5*P_MKKK_P_MKK;
r6a: PP_MKK_c + P1_c -> P1_PP_MKK_c; cytoplasm*(r6a_a6*PP_MKK_c*P1_c - r6a_d6*P1_PP_MKK_c);
r6b: P1_PP_MKK_c => P_MKK_c + P1_c; cytoplasm*r6b_k6*P1_PP_MKK_c;
r7a: ERK_c + PP_MKK_c -> PP_MKK_ERK_c; cytoplasm*((r7a_A7*PP_MKK_c*ERK_c)/(r7a_Ks7 + ERK_c)
r7b: PP_MKK_ERK_c => P_ERK_c + PP_MKK_c; cytoplasm*r7b_k7*PP_MKK_ERK_c;
It took us a month (probably a weeks worth of work
in total) to get this working again in a reproducible
manner.
Reusability: Modular Modeling
Combing Modules
EGFR Signaling Network
Module 2
Runtime
Validation via time-course data
Model calibration: �Diff Evol/Boostraping
Executable
Module 1
Module 3
Module 4
Flatten
Tellurium
Antimony
SBML
SBML -> Code Compiler
How it works
Using SBML we describe the model in terms of the underlying biology and mathematical elements.
A compiler converts SBML into an executable form, the final mathematical instantiation is a user decision
Common Declarative Format of a Biological System
Python
MATLAB
Etc
Julia
ODE Model
Stochastic Model
Stoichiometric Model
Graph Model
Etc
Understandable
Model annotation
Giving Meaning to a Model
Model annotation
Giving Meaning to a Model
<reaction metaid="metaid_vGLK" sboTerm="SBO:0000176" id="vGLK" name="hexokinase">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqmodel="http://biomodels.net/model-qualifiers/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
<rdf:Description rdf:about="#metaid_vGLK">
<bqbiol:isVersionOf>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/ec-code/2.7.1.1"/>
<rdf:li rdf:resource="http://identifiers.org/obo.go/GO:0004396"/>
</rdf:Bag>
</bqbiol:isVersionOf>
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/kegg.reaction/R02848"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
<listOfReactants>
<speciesReference metaid="_165915" species="GLCi"/>
<speciesReference metaid="_165927" species="ATP"/>
</listOfReactants>
<listOfProducts>
<speciesReference metaid="_165939" species="G6P"/>
<speciesReference metaid="_165951" species="ADP"/>
</listOfProducts>
Model annotation
The kinds of things you can annotate a model with:
Giving Meaning to a Model
Annotating models
Annotating
models is tedious
We are about to release a python
package that will allow
you to automatically annotated
metabolic models specially species
and reactions.
Uses existing annotate models
From BiGG and Biomodels to infer
new annotations.
https://github.com/sys-bio/AMAS
Visual Studio Code Extension
Visual Studio Extension to help users annotate and inspect models
Future, automatically about SBO
terms to the model
Merge AMAS with VSC extension
https://github.com/sys-bio/vscode-antimony
Model Credibility
Inspired by MEMOTE and the great work done by CPMS.
Credibility checks are probably more important for large models or
models of national importance.
SBML is widely used by the whole genome
scale model community.
There are now 1000s of genome scale
models stored at repositories both
in Europe and the US. SBML provides a
systematic way to represent such as model
As well as allow those models to be heavily
annotated with additional metadata.
The fact that the models are in SBML format
means that software can be used to do
deep dives in the model to check for any issues.
MEMOTE
MEMOTE
MEMOTE
Model Credibility of Kinetic Models
The CPMS requires manual inspection
of a model.
However, if we used model formats like
SBML together with the combine archive
we could automatically inspect a model
and drill Into a model for a much
closer inspection.
1. Recent advances in biomedical simulations:
a manifesto for model engineering
Hellerstein and Sauro, F1000, 2019
2. Adapting Modeling and Simulation Credibility
Standards to Computational Systems Biology
https://arxiv.org/abs/2301.06007, 2023
Lillian T. Tatka, Lucian P. Smith, Joseph L. Hellerstein, Herbert M. Sauro
Checking a model using computer readable formats.
Data:
Sources,
Uncertainty
Model Tests (unit and system tests)
Combine
archive
Model:
Sources,
Assumptions,
Uncertainty
Model formats like SBML let us inspect a model
and drill into a model for a much closer inspection.
1. Recent advances in biomedical simulations:
a manifesto for model engineering
Hellerstein and Sauro, F1000, 20219
2. Adapting Modeling and Simulation Credibility
Standards to Computational Systems Biology
https://arxiv.org/abs/2301.06007, 2023
Lillian T. Tatka, Lucian P. Smith,
Joseph L. Hellerstein, Herbert M. Sauro
Accepted in the Journal of Translational Medicine
And will be available once we pay the open access charge.
Deep dives into a model
Example credibility checking app: ratesb
Technical:
This is also a client side web app served from a GitHub page. It uses the JavaScript adapted SBML libraries.
It also uses pyScript to run some of our existing Python code in the browser itself.
We get to double-dip both our C++ code and python code bases.
Ratesb can be used to identify errors in rate laws, especially useful for large models
https://sys-bio.github.io/ratesb/
Acknowledgements
Center CoPIs:
Ion Moraru - UConn team
John Gennari – UW Annotation
Jonathan Karr – company spinoff
Eran Agmon (SEDML V2, multisim)
Aldridch Fan (ratesb)
Eva Liu (VSC extension)
Anish Konanki (VSC extension)
Bart Jardine (real time simulator, makesbml)�Maxwell Neal (Annotation)
Michael Kochen, Steve Wiley,
Song Feng (EGFR model)
The entire COMBINE community
David Nickerson (ratesb, model curation)
Karin Lundengard (model curation)
Bilal Shaikh (biosimulations)
Lucian Smith (biosimulations, SBML, SED-ML)
Joe Hellerstein (portal, VSC, ratesb, AMAS)
Veronica Porubsky (Outreach, reviews)
Woosub Shin (Automated annotation)
Lilly Tatka – Model credibility
The BioModels Team in the UK
https://reproduciblebiomodels.org/
The three degrees of reproducibility
Model
Data
Algorithms
On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications September 2018 Ai Magazine 39(3):56-68 DOI:10.1609/aimag.v39i3.2816
SBML
Objectives
Biological Problem
Encoded in a reusable form
Executable form
Runtime
Mathematical Approach
Compiler
Static Credibility Checks
Dynamic Credibility Checks
Model calibration
C/C++, Python, Julia or MATLAB
Examples
Whole Genome kinetic model
Encoded in SBML
Flux Balance Analysis
Runtime
COBRApy
Static Checks: Mass-balance
Validation via perturbations in vivo
Model calibration: �ensemble using MCMC
Steady State linlog model
Runtime
Tellurium
Reproducibility Portal
https://reproducibilityportal.org/
Example of a Pure Client Based Tool
Technical:
This is an example of a purely client based web app served for a GitHub page.
Such tools will live long after a grant award has ended.
https://github.com/sys-bio/makesbml