The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models
Samuel M. Blau
Lawrence Berkeley National Lab
Berkeley Lab
Atomistic simulations & the promise of ML interatomic potentials (MLIPs) for chemistry and materials
Predict energy + atomic forces
How do we calculate energy and forces?
MLIP
3D atomic positions
Update positions, repeat iteratively
Molecular dynamics
Elucidate complex reactivity
Geometry
optimization + free energy
Berkeley Lab
Lots of general MLIP dev has focused on materials
https://matbench-discovery.materialsproject.org/
2022: MPtrj dataset, 2023: Matbench Discovery leaderboard
*
Berkeley Lab
Lots of general MLIP dev has focused on materials
https://matbench-discovery.materialsproject.org/
2022: MPtrj dataset, 2023: Matbench Discovery leaderboard
*
>300 citations in <3 years
Berkeley Lab
Lots of general MLIP dev has focused on materials
https://matbench-discovery.materialsproject.org/
2022: MPtrj dataset, 2023: Matbench Discovery
Architectural improvements can only get you so far…
*
Berkeley Lab
Barroso-Luque, Zitnick, Ulissi et al. “Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models” Arxiv 2024
>60x larger than MPtrj
Much higher forces
More diverse sampling
Berkeley Lab
Models trained on more/better data are better!
Now, atomistic simulations of well-behaved crystalline materials should always start with a pre-trained MLIP
Berkeley Lab
DFT datasets
Catalysis 230M
2021
2022
2023
2024
2025
OC20
OC22
ODAC23
OMat24
MOFs 29M
Materials 100M
~400M
core hrs
~400M
core hrs
~400M
core hrs
Industrial resources enable massive data generation
All fully open-source!
“Hey Sam…”
What’s missing? Molecular chemistry!
Berkeley Lab
DFT datasets
Catalysis 230M
2021
2022
2023
2024
2025
OC20
OC22
ODAC23
OMat24
MOFs 30M
Materials 100M
Molecules 100M
OMol25
“Hey Sam…”
Berkeley Lab
Open Molecules 2025: Unlocking MLIPs for chemistry
Industry
Government
Academia
ωB97M-V / def2-TZVPD / 99-590 grid
>6 billion core hours
Berkeley Lab
OMol coverage & complexity
AIMNet2:
14 elements
SPICE2:
17 elements
QCML:
79 elements
83 elements
Berkeley Lab
OMol coverage & complexity
SPICE2
AIMNet2
QCML
Berkeley Lab
OMol coverage & complexity
QCML: <26 atoms
only one molecule
AIMNet2: <80 atoms
SPICE2: <110 atoms
“The richness (and challenge) of chemistry is mostly in the intermolecular interactions”
Berkeley Lab
OMol coverage & complexity
OMol25: 5.9B atoms
AIMNet2: ~600M atoms
QCML: ~600M atoms
Better functional
Better basis set
Tighter grid
Berkeley Lab
OMol construction / structural sampling
Boltzmann rattling + optimization
Geodesic interpolation R -> TS -> P
10% also run as triplets
10% add electron, 10% remove electron
Up to 3 optimization steps
Berkeley Lab
OMol construction / structural sampling
Snapshots pulled from 300K, 400K MD
Multiple protonation/tautomer states
Berkeley Lab
OMol construction / structural sampling
Berkeley Lab
OMol construction / structural sampling
Template reactions from MOBH35, MOR41, ROST61
Berkeley Lab
OMol construction / structural sampling
Berkeley Lab
Berkeley Lab
Quick aside – FAIR Chemistry’s UMA model(s)
Universal Model for Atoms (UMA)
Structure
Router
Total Charge & Spin Multiplicity
Expert
Task Embedding
Expert
Merged Mixture of Linear Experts UMA Model
OMol
OMC
OMat
ODAC
OC20
Input Structure
Null Charge & Spin Multiplicity
OMol Task
Not OMol Task
Energy
Forces
Stress
Input Task
Composition
Wood et al. “UMA: A Family of Universal Models for Atoms” Arxiv 2025
Berkeley Lab
Dataset splitting and test sets
Berkeley Lab
Baseline results: test set energy and forces
Berkeley Lab
Baseline results: test set energy and forces
Berkeley Lab
Novel model evaluation tasks + metrics
Berkeley Lab
Baseline results
Berkeley Lab
Baseline results
Lots of room for MLIP architecture development to improve treatment of charge, spin, long-range interactions
Berkeley Lab
Baseline results
https://benchmarks.rowansci.com/
Martinez group Slack:
GMTKN55
w/ metals,
charged + open-shell
Berkeley Lab
Models capture chemistry changing with charge, spin
Cu1+
Cu2+
Tetrahedral
Square planar
Berkeley Lab
Models capture chemistry changing with charge, spin
Neutral ethylene carbonate
Radical anion EC
Stable
Ring bond breaks
Berkeley Lab
Training on OMol continues to improve model performance
Berkeley Lab
Enthusiastic community reception
“OMol25-trained models give much better energies than the DFT level of theory I can afford allow for computations on huge systems that I previously never even attempted to compute."
Another Rowan user called this "an AlphaFold moment for computational chemistry”.
Rowan: Models correctly predicted relative barrier heights for C–F reductive elimination, C–O reductive elimination, and different aryl groups.
Grimme: “Benchmark results for OS reaction energies and barrier heights as well as for TM geometries confirm UMA’s strong transferability across broad regions of chemical space.”
g-xTB: WTMAD-2 of 9.3 kcal mol−1
UMA-s-1: WTMAD-2 of 6.1 kcal mol−1
30 citations in 10 weeks!
Berkeley Lab
First paper using OMol/UMA: 25 days after release!
Avg RMSD = 0.24 Å
MAE = 1.65 kcal/mol
Berkeley Lab
More OMol/UMA transition state opt from 3 weeks ago
ColabReaction: Accelerating Transition State Searches with Machine Learning Potentials on Google Colaboratory
Masayuki Karasawa,
Chee Siang Leow,
Hideaki Yajima,
Shuta Arai,
Hiromitsu Nishizaki,
Tohru Terada,
Hajime Sato
Berkeley Lab
Additional OMol results: protein-ligand binding (Rowan)
https://rowansci.com/blog/benchmarking-protein-ligand-interaction-energy
Berkeley Lab
Additional OMol results: bond dissociation energies (Rowan)
https://rowansci.com/publications/expbde54
Note: MLIPs much faster on GPU
Berkeley Lab
Last week:
Berkeley Lab
OMol isn’t quite done yet…
Berkeley Lab
Acknowledgements
Muhammed Shuaibi
Daniel Levine
Brandon Wood
Santiago Vargas (LBL)
Andrew Rosen (Princeton)
Evan Spotte-Smith (CMU)
Michael Taylor (LANL)
Muhammad Haysim (NYU)
Ilyes Batatia (Cambridge)
Gabor Csanyi (Cambridge)
Peter Eastman (Stanford)
Nathan Frey (Prescient / Genentech)
Aditi Krishnapriyan (Berkeley)
Joshua Rackers (Prescient / Genentech)
Sanjeev Raja (Berkeley)
Larry Zitnick
Zack Ulissi
Kyle
Michel
Misko Dzamba
Vahe Gharakhanyan
Ammar Rizvi
Xiang Fu
Berkeley Lab