1 of 61

SED-ML LEVEL 2: A PROPOSAL

LUCIAN SMITH

2 of 61

WHAT DO WE WANT? (HARMONY 2021)

  • Be compatible with workflow descriptions
  • Support reproducible experiments, not just repeatable
  • Easily define model/simulation/results
  • Clear definitions of data types and dimensions
  • Support parallelization
  • Be able to transform complicated output into input for next step
  • Be able to ‘outsource’ non-standardized manipulations
  • Support any modeling language
  • Support current and future modeling frameworks (FBC, Qual, Agent-based, etc.)
  • Be able to connect different tools
  • Provide scalability as users develop experiment

3 of 61

MY OWN DESIGN

  • Core ‘Script’ class, designed to be procedural
  • Didn’t worry about backwards compatibility
    • (But no change where nothing better)
  • UML diagrams

4 of 61

THIS PROPOSAL

  • Top-level objects:
    • Scripts
    • Global objects (used by scripts):
      • Models
      • External data
      • User-defined Functions
      • (also styles, and global algorithm parameters)

5 of 61

SIMPLE EXAMPLE (SCRIPT)

mod1 = model("urn:sedml:language:sbml", "munz2000.xml",

selectionList=["time", "Susceptible", "Zombie"])

sim1 = uniformTimeCourse(mod1, 0, 0, 5, 50)

plot2d(sim1, name="Figure 3")

6 of 61

SIMPLE EXAMPLE (YML)

sedML:

level: 2

version: 1

listOfModels:

model:

id: mod1

language: urn:sedml:language:sbml

source: munz2000.xml

selectionlist: time, Susceptible, Zombie

listOfScripts:

script:

id: main

uniformTimeCourse:

assignTo: sim1

model: mod1

initialTime: 0

outputStartTime: 0

outputEndTime: 5

numberOfSteps: 50

plot2D:

name: Figure 3

source: sim1

7 of 61

SIMPLE EXAMPLE (XML)

<?xml version="1.0" encoding="UTF-8"?>

<sedML xmlns="http://sed-ml.org/sed-ml/level2/version1" level="2" version="1">

<listOfModels>

<model id=“mod1" language="urn:sedml:language:sbml" source="munz2000.xml“

selectionlist="time, Susceptible, Zombie"/>

</listOfModels>

<listOfScripts>

<script id="main">

<uniformTimeCourse assignTo="sim1" model=“mod1" initialTime="0"

outputStartTime="0" outputEndTime="5" numberOfSteps="50"/>

<plot2D name="Figure 3" source="sim1"/>

</script>

</listOfScripts>

</sedML>

8 of 61

MODELS

  • Basics are the same
  • Now have ‘selectionList’ to define output
  • ModelChanges are now a script
  • Exist in scripts as objects with a state
    • Elements can be queried and set.
    • State can be reset to ‘post model changes’ state
    • Can be copied

9 of 61

MODELS

10 of 61

CHANGEMODEL

  • Four basic types:
    • Set
    • Add
    • Remove
    • Reset
  • No XPaths; just semantic
  • Push burden of meaning to model formats
  • Can appear in any Script, not just Model changeScripts.
  • Reset: resets everything to global initial state
    • Can ignore subsets of model, i.e. reset species but not parameters.

11 of 61

CHANGEMODEL

12 of 61

MODEL VARIABLES

  • All SED-ML defines is ‘a string’
  • Individual model formats define own semantics
  • Possible for SBML: use ID and ‘dot semantics’
    • Id: S1
    • Submodel elements: mod1.S1
    • Local variables: J0.k1
    • Element attributes: S1.boundary
    • Element children: E1.delay
    • Combinations: J0.k1.constant

13 of 61

MODEL VARIABLES/ELEMENTS

  • ModelRemove still somewhat straightforward
    • Anything with an ID can just use that.
    • Use metaids otherwise?
    • Could use some xml-based scheme, too: model.reactions[3].products[2]
  • ModelAdd introduces a lot more complexity
    • Antimony-like syntax? J0: S1->3 S2; k1*S1
    • Literal XML?
    • YML-like?
    • List with defined parts?
      • reaction(J0, [“S1 + S3”], [“3 S2”], “k1*S1”, reversible=false)

14 of 61

MODEL CHANGE EXAMPLE (SCRIPT)

mod1 = model(urn:sedml:language:sbml, "munz2000.xml")

mod1.set("Zombie", 0.0)

mod1.set("J0.Zombie.stoichiometry", 2)

15 of 61

MODEL CHANGE EXAMPLE (YML)

sedML:

level: 2

version: 1

listOfModels:

model:

id: mod1

language: urn:sedml:language:sbml

source: munz2000.xml

selectionlist: time, Susceptible, Zombie

changeScript:

modelSet:

target: Zombie

value: 0.0

modelSet:

target: J0.Zombie.stoichiometry

value: 2

16 of 61

EXTERNAL DATA

  • Just a DataDescription, without NuML
    • Instead, might need hints? i.e. CSV file might need ‘has header row’
    • Might also need ‘column first’ or ‘row first’.
    • Won’t need above with HDF5 files (?)
  • Treated as dictionary or array in Scripts.
  • Needs work: See https://github.com/SED-ML/sed-ml/issues/46
    • A file inside a COMBINE archive
    • A worksheet in an XLSX file

17 of 61

EXTERNAL DATA

18 of 61

EXTERNAL DATA EXAMPLE

sedML:

level: 2

version: 1

listOfDataDescriptions:

dataDescription:

id: SZ_data

source: munz2000_data.csv

format: CSV

<listOfDataDescriptions>

<dataDescription id="SZ_data" format="CSV" source="munz2000_data.csv"/>

</listOfDataDescriptions>

19 of 61

USER FUNCTIONS

  • Self-contained Scripts (no global scope)
  • Required arguments
  • Optional arguments
  • Optional return value
  • Requested ages ago:

20 of 61

USER FUNCTIONS

21 of 61

EXTERNAL DATA EXAMPLE (YML)

listOfUserFunctions:

userFunction:

id: computeSumOfSquares

ouput: ret

listOfRequiredArguments:

argument:

-

id: sim

type: dictionaryOrArray

-

id: data

type: dictionaryOrArray

script:

executeMath:

assignTo: ret

math: sum((sim-data)^2)

22 of 61

EXTERNAL DATA EXAMPLE (XML)

<listOfUserFunctions>

<userFunction id="computeSumOfSquares" output="ret">

<listOfRequriedArguments>

<argument id="sim" type="dictionaryOrArray"/>

<argument id="data" type="dictionaryOrArray"/>

</listOfRequriedArguments>

<script>

<executeMath assignTo="ret">

<math xmlns="http://www.w3.org/1998/Math/MathML">

<apply>

<sum/>

<apply>

<power/>

<apply>

<minus/>

<ci> sim </ci>

<ci> data </ci>

</apply>

<cn type="integer"> 2 </cn>

</apply>

</apply>

</math>

</executeMath>

</script>

</listOfUserFunctions>

23 of 61

SCRIPTS!

  • Core of SED-ML execution
  • Multiple scripts can facilitate parallelization
  • Scope = global objects plus dependency scripts
  • Each line of a script is a Command
  • Order matters (obviously)

24 of 61

SCRIPTS!

25 of 61

WAIT—REPEAT OR REPRODUCE?

  • SED-ML is used both to:
    • Repeat a simulation exactly, and
    • Reproduce a simulation in a new context.
  • So, just store this information!
    • ‘Repeat’ class to store specifics
    • ‘Reproduce’ class to store general requirements
  • Store on Document, Script, or individual Command.

26 of 61

REPEAT OR REPRODUCE

27 of 61

REPEAT/REPRODUCE EXAMPLE (YML)

listOfScripts:

script:

id: main

steadyState:

assignTo: ss1

model: mod1

repeatWith:

executable: roadrunner

executableVersion: 2.2.3

environment: Python

OS: Windows 11

reproduceWith:

capability: steadyStateSim

capability: SBML.distrib

28 of 61

COMMAND

  • Generic commands:
    • Execute math
    • Execute KiSAO
    • Execute external script(?)

  • SED-ML specific commands:
    • Script commands (if,loop,error)
    • ChangeModel
    • Simulate
    • Output

29 of 61

COMMAND

30 of 61

GENERIC COMMANDS

31 of 61

EXECUTEMATH

  • Execute basic math; optionally store result.
  • Model elements accessible with ‘X.y’ syntax.
  • Data elements accessible with dictionary/index syntax.
  • Examples:

S_tot = mod1.S1 + mod1.S2 + mod1.S3

err = (sim1[“S1”] – data[“S1”])^2

movement = sim1[“S1”][100] - sim1[“S1”][0]

32 of 61

EXECUTEMATH EXAMPLE (YML, XML)

listOfScripts:

script:

id: main

executeMath:

assignTo: S_tot

math: mod1.S1 + mod1.S2 + mod1.S3

<script id="main">

<executeMath assignTo="S_tot">

<math xmlns="http://www.w3.org/1998/Math/MathML">

<apply>

<plus/>

<ci> mod1.S1 </ci>

<ci> mod1.S1 </ci>

<ci> mod1.S1 </ci>

</apply>

</math>

</executeMath>

</script>

33 of 61

EXECUTEKISAO

  • Simulations not explicitly defined elsewhere
    • ‘Qual’ simulations, others.
  • Current ‘Variable’ KiSAO-named functions
    • Jacobian, elasticities, etc.
  • Some math functions (?)
    • average, max, min
    • (could move these to MathML? Would need ‘AppliedDimension’ functionality)

34 of 61

EXECUTEKISAO EXAMPLE (YML)

listOfScripts:

script:

id: main

executeKisao:

assignTo: sim1

kisao: 168

listOfKisaoParameters:

kisaoParameter:

-

kisao: 520

value: 3.3

-

kisao: 24

value: mod1

-

kisao: 21

value: True

-

kisao: 900

value: mod1.S1

35 of 61

EXECUTEEXTERNAL (?)

  • Basically SED-ML giving up
    • Obscure simulations
    • Proprietary analyses
    • Environment-dependent behavior
    • Computations not obviously encodable by KiSAO or MathML
    • Processing considered out-of-scope for SED-ML (fancy plots)

36 of 61

EXECUTE EXTERNAL EXAMPLE (YML)

listOfScripts:

script:

id: main

executeExternal:

language: vega

{

"$schema: "https://vega.github.io/schema/vega-lite/v5.json",

"description: "A simple bar chart with embedded data.",

"data: {

"values: [

{"a: "A", "b: 28}, {"a: "B", "b: 55}, {"a: "C", "b: 43},

{"a: "D", "b: 91}, {"a: "E", "b: 81}, {"a: "F", "b: 53},

{"a: "G", "b: 19}, {"a: "H", "b: 87}, {"a: "I", "b: 52}

]

},

"mark: "bar",

"encoding: {

"x: {"field: "a", "type: "nominal", "axis: {"labelAngle: 0}},

"y: {"field: "b", "type: "quantitative"}

}

}

37 of 61

SCRIPTING COMMANDS

38 of 61

CONDITIONAL / LOOP

  • if/else blocks
  • for loops
  • while loops
  • Contain math, plus child Script
  • Private scope for child Script-defined variables.
  • Parallelization possible with flag

39 of 61

CONDITIONAL

40 of 61

LOOP

41 of 61

LOOP EXAMPLE (YML)

listOfScripts:

script:

id: main

loop:

id: n

start: 0

end: 10

modelReset:

model: mod1

modelSet:

model: mod1

target: S1

value: n/100

steadyState:

assignTo: ssvals[n]

model: mod1

for n in range(0,10):

mod1.reset()

mod1.set("S1", n/100)

ssvals[n] = steadyState(mod1)

42 of 61

SED-ML SPECIFIC COMMANDS

43 of 61

SIMULATE

  • Not much different from old ‘Simulation’ class
  • Basic three: UniformTimeCourse, SteadyState, OneStep
  • Also ParameterEstimation
  • Absorbs old ‘Algorithm’ child into object itself.
  • Old ‘Analysis’ is now ExecuteKisao
  • Now has ‘assignTo’ attribute to store results
    • Solves most of the confusing RepeatedTask problems
    • Each class has precisely-defined output (dictionary of results)
    • ParameterEstimations return a parameter estimation object

44 of 61

SIMULATE

45 of 61

OUTPUT

  • Structure largely unchanged from before:
    • Plot
    • Report
    • Figure
  • All input now local variables in script
    • Including multidimensional data: handle differently?
  • ParameterEstimation plots now children of Plot
  • ‘source’ attribute if multidimensional data

46 of 61

PLOT

47 of 61

CURVES AND SURFACES

48 of 61

REPORT

49 of 61

FIGURE

50 of 61

PLOT EXAMPLE (YML)

plot2D:

name: Figure 3

listOfCurves:

curve:

-

name: [Susceptible]

xDataReference: sim1[time]

yDataReference: sim1[Susceptible]

-

name: [Zombie]

xDataReference: sim1[time]

yDataReference: sim1[Zombie]

51 of 61

PLOT EXAMPLE (XML)

<?xml version="1.0" encoding="UTF-8"?>

<sedML xmlns="http://sed-ml.org/sed-ml/level2/version1" level="2" version="1">

<listOfModels>

<model id="model" language="urn:sedml:language:sbml" source="munz2000.xml“

selectionlist="time, Susceptible, Zombie"/>

</listOfModels>

<listOfScripts>

<script id="main">

<uniformTimeCourse assignTo="sim1" model="model" initialTime="0“

outputStartTime="0" outputEndTime="5" numberOfSteps="50"/>

<plot2D name="Figure 3">

<listOfCurves>

<curve name="[Susceptible]" logX="false" xDataReference="sim1[time]“

logY="false" yDataReference="sim1[Susceptible]"/>

<curve name="[Zombie]" logX="false" xDataReference="sim1[time]“

logY="false" yDataReference="sim1[Zombie]"/>

</listOfCurves>

</plot2D>

</script>

</listOfScripts>

</sedML>

52 of 61

OTHER OPTIONS?

  • Interpolate
  • Assert/Error/Warn
    • (not useful for repeats, but maybe for reproduction?)
  • Return
    • instead of a VarIdRef attribute on UserFunction
  • Print/Log
  • Bring some currently-only-KiSAO analyses into SED-ML core?

53 of 61

ORIGINAL EXAMPLE (SCRIPT)

model = model("urn:sedml:language:sbml", "munz2000.xml",

selectionList=["time", "Susceptible", "Zombie"])

sim1 = uniformTimeCourse(model, 0, 0, 5, 50)

plot2d(sim1, name="Figure 3")

54 of 61

STEADYSTATE EXAMPLE

model = model("urn:sedml:language:sbml", "munz2000.xml",

selectionList=["Susceptible", "Zombie"])

ss1 = steadyState(model)

report(name=“Steady States”, [ss1])

55 of 61

CHANGEMODEL EXAMPLE: MODEL

model = model("urn:sedml:language:sbml", "munz2000.xml",

selectionList=["Susceptible", "Zombie"])

model.set("Zombie", "0.0")

ss1 = steadyState(model)

56 of 61

CHANGEMODEL EXAMPLE: SCRIPT

model = model("urn:sedml:language:sbml", "munz2000.xml",

selectionList=["Susceptible", "Zombie"])

results = array(10)

for Z in range(1, 10):

modelReset("model")

setModel("model", "Zombie", Z)

sim1 = uniformTimeCourse(model, 0, 0, 5, 50)

results[Z] = sim1

report(results, "Different starting Z values")

57 of 61

PARAMETER SCAN EXAMPLE: (PYTHON)

model = model("urn:sedml:language:sbml", "munz2000.xml")

SZ_data = dataDescription("munz2000_data.csv", "CSV", data=row)

def sumOfSquares(sim, data):

return sum((sim-data)^2)

results = array[10, 10, 10]

for alpha in (0.001, 0.010, step=0.001):

for beta in (0.001, 0.010, step=0.001):

for zeta in (0.001, 0.010, step=0.001):

modelReset("model")

setModel("model", “alpha", alpha)

jc1 = executekisao(65,

sim1 = uniformTimeCourse(model, 0, 0, 5, 50,

selectionList=["Susceptible", "Zombie"])

results[alpha][beta][zeta] = sumOfSquares(sim1, SZ_data)

report(results, "Parameter scan")

58 of 61

PARAMETER SCAN EXAMPLE: (YML)

sedML:

level: 2

version: 1

listOfModels:

model:

id: model

language: urn:sedml:language:sbml

source: munz2000.xml

selectionlist: Susceptible, Zombie

listOfDataDescriptions:

dataDescription:

id: SZ_data

format: CSV

source: munz2000_data.csv

listOfUserFunctions:

userFunction:

id: computeSumOfSquares

output: ret

listOfRequriedArguments:

argument:

-

id: sim

-

id: data

script:

executeMath:

assignTo: ret

math: sum((sim-data)^2)

listOfScripts:

script:

id: main

executeMath:

assignTo: results

forLoop:

id: alpha

start: 0.001

numberOfSteps: 9

step: 0.001

parallel: true

script:

forLoop:

id: beta

start: 0.001

numberOfSteps: 9

step: 0.001

parallel: true

script:

forLoop:

id: zeta

start: 0.001

numberOfSteps: 9

step: 0.001

parallel: true

59 of 61

PARAMETER SCAN EXAMPLE: (YML)

script:

modelReset:

model: model

modelSet:

-

model: model

target: alpha

value: alpha

-

model: model

target: beta

value: beta

-

model: model

target: zeta

value: zeta

uniformTimeCourse:

assignTo: sim1

model: model

initialTime: 0

outputStartTime: 0

outputEndTime: 5

numberOfSteps: 50

executeMath:

assignTo: results[alpha][beta][zeta]

math: computeSumOfSquares(sim1, SZ_data)

report:

name: Parameter scan

data: results

60 of 61

PARAMETER SCAN EXAMPLE: (XML)

<sedML xmlns="http://sed-ml.org/sed-ml/level2/version1" level="2" version="1">

<listOfModels>

<model id="model" language="urn:sedml:language:sbml" source="munz2000.xml“

selectionlist="Susceptible, Zombie"/>

</listOfModels>

<listOfDataDescriptions>

<dataDescription id="SZ_data" format="<csv>" source="munz2000_data.csv"/>

</listOfDataDescriptions>

<listOfUserFunctions>

<userFunction id="computeSumOfSquares" output="ret">

<listOfRequriedArguments>

<argument id="sim"/>

<argument id="data"/>

</listOfRequriedArguments>

<script>

<executeMath assignTo="ret">

<math xmlns="http://www.w3.org/1998/Math/MathML">

<apply>

<sum/>

<apply>

<power/>

<apply>

<minus/>

<ci> sim </ci>

<ci> data </ci>

</apply>

<cn type="integer"> 2 </cn>

</apply>

</apply>

</math>

</executeMath>

</script>

</listOfUserFunctions>

61 of 61

PARAMETER SCAN EXAMPLE: (XML)

<listOfScripts>

<script id="main">

<executeMath assignTo="results"/>

<forLoop id="alpha" start="0.001" numberOfSteps="9" step="0.001" parallel="true">

<script>

<forLoop id="beta" start="0.001" numberOfSteps="9" step="0.001" parallel="true">

<script>

<forLoop id="zeta" start="0.001" numberOfSteps="9" step="0.001" parallel="true">

<script>

<modelReset model="model"/>

<modelSet model="model" target="alpha" value="alpha"/>

<modelSet model="model" target="beta" value="beta"/>

<modelSet model="model" target="zeta" value="zeta"/>

<uniformTimeCourse assignTo="sim1" model="model" initialTime="0"

outputStartTime="0" outputEndTime="5" numberOfSteps="50"/>

<executeMath assignTo="results[alpha][beta][zeta]">

<math xmlns="http://www.w3.org/1998/Math/MathML">

<apply>

<ci> computeSumOfSquares </ci>

<ci> sim1 </ci>

<ci> SZ_data </ci>

</apply>

</math>

</executeMath>

</script>

</forLoop>

</script>

</forLoop>

</script>

</forLoop>

<report name="Parameter scan" data="results"/>

</script>

</listOfScripts>