1 of 16

MS219 SIAM CSE 2025

The Tricks Required for Scientific Machine Learning to Work on Real Data

Avik Pal

Ph.D. Candidate

Julia Lab

MIT CSAIL

2 of 16

Automatic Differentiation and SciML: What Can Go Wrong

3 hour Workshop Version: Search Youtube

Massachusetts Institute of Technology

3 of 16

Universal (Approximator) Differential Equations

Massachusetts Institute of Technology

4 of 16

Universal (Approximator) Differential Equations

Massachusetts Institute of Technology

5 of 16

UDEs show Accurate Extrapolation & Generalization

Keith, Brendan, Akshay Khadse, and Scott E. Field. "Learning orbital dynamics of binary black hole systems from gravitational wave measurements." Physical Review Research 3, no. 4 (2021): 043101.

Example using binary black hole dynamics with LIGO gravitational wave data

Massachusetts Institute of Technology

6 of 16

Choosing a good loss function is fundamental to making this work in practice.

7 of 16

Single Shooting

Fitting by Running the Simulator and Doing Gradient Based Optimization

Massachusetts Institute of Technology

8 of 16

Single shooting is not numerically robust. Other loss functions & tricks are required in practice!

9 of 16

Multiple Shooting & Collocation

Roesch, Elisabeth, Christopher Rackauckas, and Michael PH Stumpf. "Collocation based training of neural ordinary differential equations." Statistical Applications in Genetics and Molecular Biology (2021).

Turan, E. M., & Jäschke, J. (2021). Multiple shooting with neural differential equations. arXiv preprint arXiv:2109.06786.

Massachusetts Institute of Technology

10 of 16

Growing the Time Interval

Doing the optimization in a single pass may not be robust,

Successively grow the interval

Massachusetts Institute of Technology

11 of 16

Start with Adam & Finish with (L-)BFGS

Start training with Adam / SGD

Finish training with (L-)BFGS

Massachusetts Institute of Technology

12 of 16

Global Optimization

Dixit, V. K., Samaroo, J., Pal, A., Edelman, A., & Rackauckas, C. V. Efficient GPU-Accelerated Global Optimization for Inverse Problems. In ICLR 2024 Workshop on AI4DifferentialEquations In Science.

Massachusetts Institute of Technology

13 of 16

Let’s get back to this example

Keith, Brendan, Akshay Khadse, and Scott E. Field. "Learning orbital dynamics of binary black hole systems from gravitational wave measurements." Physical Review Research 3, no. 4 (2021): 043101.

Example using binary black hole dynamics with LIGO gravitational wave data

Massachusetts Institute of Technology

14 of 16

Let’s get back to this example

Keith, Brendan, Akshay Khadse, and Scott E. Field. "Learning orbital dynamics of binary black hole systems from gravitational wave measurements." Physical Review Research 3, no. 4 (2021): 043101.

The neural network is a residual, so start the training as a small perturbation!

Massachusetts Institute of Technology

15 of 16

Conclusion: So much more to say,

Making this work in practice requires extra tricks beyond the first tutorial

16 of 16

SciML Open Source Software Organization sciml.ai

If you work in SciML and think optimized and maintained implementations of your method would be valuable, please let us know and we can add it to the queue.

Democratizing SciML via pedantic code optimization, because we believe full-scale open benchmarks matter

  • DifferentialEquations.jl: 2x-10x Sundials, Hairer, …
  • DiffEqFlux.jl: adjoints outperforming Sundials and PETSc-TS
  • ModelingToolkit.jl: 15,000x Simulink
  • Catalyst.jl: >100x SimBiology, gillespy, Copasi
  • DataDrivenDiffEq.jl: >10x pySindy
  • NeuralPDE.jl: ~2x DeepXDE* (more optimizations to be done)
  • NeuralOperators.jl: ~3x original papers (more optimizations required)
  • ReservoirComputing.jl: 2x-10x pytorch-esn, ReservoirPy, PyRCN
  • SimpleChains.jl: 5x PyTorch GPU with CPU, 10x Jax (small only!)
  • DiffEqGPU.jl: Some wild GPU ODE solve speedups coming soon

Massachusetts Institute of Technology