1 of 86

Introduction to machine-learning potentials

Sungwoo Kang

Computational Science Research Center

Korea Institute of Science and Technology (KIST)

2025. 07. 01.

sung.w.kang@kist.re.kr

대한금속 재료학회 제 11회 인공지능 겨울학교

2 of 86

Contents

Section 1: Introduction to the concept of machine-learning potentials (MLPs)

  • Molecular dynamics & DFT
  • Basic concept of MLIP
  • Models
  • Training set generation
  • Universal pretrained MLIPs

Section 2: Practical use of MLIPs

  • How to choose model and code for your problem?
  • How to sample training set?
  • How to set hyperparameters?
  • How to know whether the simulation is going wrong or not?

Section 3: Practice with codes (Colab)

14:00

15:25

15:35

16:25

16:30

10 min break

5 min break

~

~

3 of 86

2024 Nobel prize in physics

Machine-learning potential paper

18-year old field!

4 of 86

Success stories

Hydrogen phase transition

Amorphous Si

NH3 decomposition catalysis

Phase change memory

Cheng et al. Nature 585, 217 (2020)

Deringer et al. Nature 589, 59 (2021)

Yang, Parrinello et al. Nat. Catal. 6 829 (2023)

Zhou, Zhang, Deringer et al. Nat. Electron. 6 746 (2023)

5 of 86

Introduction to the concept of machine-learning potentials

  • Molecular dynamics & DFT
  • Basic concept of MLP
  • Model 1: Descriptor-based models
  • Model 2: E(3)-equivariant graph models
  • Training set generation
  • Universal pretrained MLPs

6 of 86

Molecular dynamics

What is molecular dynamics (MD)?

  • Change in atomic positions over time:

 

  • Velocity, v:

 

  • Acceleration, a:

 

Density functional theory (DFT)

Ionic bonding

Covalent bonding

Noble gases

(r1,r2,..rN) = EΨ (r1,r2,..rN)

  • Accurate & general
  • Low speed (<500 atoms)

Classical interatomic potentials

Quantum mechanical calculations

  • High speed (millions of atoms)
  • Limited to specific systems

 

 

J. Manuf. Sci. Eng. Apr 2014, 136(2): 021015

 

V: Potential energy

7 of 86

Density-functional theory (DFT)

Structure

Input:

Energy, wavefunction

Output:

Periodic boundary condition (PBC)

Quantum mechanics

(r1,r2,..rN) = EΨ (r1,r2,..rN)

0.02 L = 6.02×1023 atoms

How can we simulate such a large number of atoms?

Typically, 100–200 atoms are used to simulate liquid structures in DFT calculations.

8 of 86

Scales of materials simulations

Commun. Mater. 4, 66 (2023)

Å

nm

All chemical reactions

Specific chemical reactions

Limited chemical reactions

9 of 86

Machine-learning potentials (MLPs)

MLP

Machine learning

model

Machine-learning potentials

Training set from DFT calculations

Target simulation

Small structures

Big structures

102

103

104

10-6

10-2

100

102

106

104

10-4

# of atoms

Time (s)

Computation time for Si

DFT~ O(N3)

Classical MD

MLP ~ O(N)

~ O(N)

Energy = f(structure)

10 of 86

Example: modeling HF etching process with MLP

Diverse sampling techniques / Time scale: ps scale

Time scale: ns scale

a-Si3N4

a-Si3N4 + HF

Training set generation (DFT)

Simulation (MLP)

Simulation target

.

  • HF etching of amorphous Si3N4 for semiconductor process

C. Hong et al. ACS Appl. Mater. Interfaces 16, 48457 (2024)

11 of 86

Introduction to the concept of machine-learning potentials

  • Molecular dynamics & DFT
  • Basic concept of MLP
  • Model 1: Descriptor-based models
  • Model 2: E(3)-equivariant graph models
  • Training set generation
  • Universal pretrained MLPs

12 of 86

Challenges in representing material structures

  • A straightforward try:

1

2

3

4

5

6

Input: (x1,y1,z1,x2, … x6,y6,z6)

x1

y1

z6

EDFT

  • Challenges you may encounter:

(1) Unable to account for translational, rotational, and permutational invariance.

Input: (x1, y1, z1, x2, … x6, y6, z6)

Input: (x1+Δ, y1, z1, x2+Δ, … x6+Δ, y6, z6)

Constant

shift by Δ

Get different outputs

(2) Not transferable to systems with larger or smaller cells.

The input length varies, making it incompatible with the trained model.

Energy = f(structure)

13 of 86

Symmetries that MLP should satisfy

(x1+Δ, y1, z1, x2+Δ, … x6+Δ, y6, z6)

(1) Translational symmetry (+periodic boundary condition)

(2) Rotational symmetry

Constant

shift by Δ

1

2

3

4

5

6

(x1,y1,z1,x2, … x6,y6,z6)

(x1,y1,z1,x2, … x6,y6,z6)

(y1,x1,z1,y2, … y6,x6,z6)

(3) Permutational symmetry

1

2

3

4

5

6

4

5

6

1

2

3

(x1,y1,z1,x2, … x6,y6,z6)

(x4,y4,z4,x5, … x3,y3,z3)

→ The output (energy) should remain invariant (does not change) under these transformations.

14 of 86

Atomic energy mapping

Atom 1

Relative coordinates

Eatom,1

Eatom,2

Eatom,3

Eatom,N

Etot

Relative coordinates

Atom 2

Atomic energies

Total energy

(DFT)

  • DFT total energies are assumed to be decomposable into transferable atomic energies.
  • However, training is performed to predict total energy values.
  • See PRM 3, 093802 (2019) for details.

Etotal = ∑ Eatom

Atomic energy mapping

Not given

(Estimated during training)

Data given for training

  • Relative coordinates: translational invariance
  • Atomic energies: permutational invariance

15 of 86

Force calculation

Atom 1

Relative coordinates

Eatom,1

Eatom,2

Eatom,3

Eatom,N

Etot

Relative coordinates

Atom 2

Atomic energies

Total energy

(DFT)

tot

Force

Atomic

index

Directional index: x, y, or z

Loss function

Total energy error

Atomic force error

Stress error

16 of 86

Types of MLP models

(1) Descriptor-based models

(2) Graph models

Descriptor function

Eatom,1

Eatom,2

Etot

Descriptor function

Atom 1

Atom 2

Total energy

(DFT)

NN of element 1

NN of element 2

Figure: PRL 120, 145301 (2018)

but this paper is not about MLP

Graph construction by connectivity

Graph convolution neural network

Atomic

energies

Atomic

energies

E1

E2

E3

17 of 86

Types of MLP models

(1) Descriptor-based models

Descriptor function

Eatom,1

Eatom,2

Etot

Descriptor function

Atom 1

Atom 2

Total energy

(DFT)

NN of element 1

NN of element 2

Atomic

energies

18 of 86

Descriptor model 1: Behler-Parrinello neural network (BPNN) potential

Atom 1

Relative coordinates

Eatom,1

Eatom,2

Eatom,N

Etot

Relative coordinates

Atom 2

Atomic energies

Total energy

(DFT)

Behler and Parrinello, PRL, 98, 146401 (2007)

Descriptor

Descriptor

Descriptor: symmetry function

Gi = [Giradial,η1, Giradial,η2, Giradial,η3, … Giangular,ζ1, Giangular,ζ2, Giangular,ζ3, …]

 

 

 

Rij

i

j

k

l

Rik

θijk

Rc

 

 

R(Å)

R(Å)

θ(rad)

fc: cutoff function

→ Used as input vectors for neural networks predicting atomic energies.

Rotationally

invariant

2-body

3-body

19 of 86

Descriptor model 2: DeePMD-kit

Zhang, Wang, E et al. PRL 120, 143001 (2018)

Descriptor

Di = {Dij | j ∈ neighbors of i}

How can rotational and permutational invariance be ensured?

  • Relative coordinates are directly used as an input of NN

(1) Rotational invariance: Adjust relative axes based on first- and second-nearest neighbors.

(2) Permutational invariance: sort Dij by Rij

Rotational matrix:

Ria: first nearest neighbor

Rib: second nearest neighbor

* Disadvantage: discontinuity

20 of 86

DeePMD-kit ver. 2: DeepPot-SE

  • Continuously differentiable version of DeePMD-kit

Zhang, E, et al. NeurIPS (2018); arxiv:1805.09003

21 of 86

Descriptor model 3: Gaussian approximation potential (GAP)

Descriptor: Smooth Overlap of Atomic Positions (SOAP)

Training point 1:

See Gabor Csányi, https://www.youtube.com/watch?v=wpJbSjq6QDw

Training point 2:

Training point 3:

Training point N:

New point (NP)

k(1,NP)

k(2,NP)

k(3,NP):

k(N,NP):

  • k(i,j): kernel. similarity between i and j
  • Mathematically same as 1-layer neural network
  • Easy estimation of uncertainty

Gaussian process

Spherical harmonics

Bartók, Csányi, et al. PRL 104, 136403 (2020)

22 of 86

Descriptor model 3: Gaussian approximation potential (GAP)

Training point 1:

Training point 2:

Training point 3:

Training point N:

New point (NP)

k(1,NP)

k(2,NP)

k(3,NP):

k(N,NP):

  • k(i,j): kernel. similarity between i and j
  • Mathematically same as 1-layer neural network
  • Easy estimation of uncertainty

Gaussian process

Bartók, Csányi, et al. PRL 104, 136403 (2020)

Uncertainty estimation of gaussian process

High uncertainty region (lack of training data)

  • Uncertainty is directly derived from the machine-learning model without using an ensemble.

23 of 86

Descriptor models summary: BP-NNP vs DeePMD-kit vs GAP

Neural network

Gaussian process

Behler-Parrinello NNP

DeePMD-kit

GAP

Training point 1:

Training point 2:

Training point 3:

Training point N:

k(1,NP)

k(2,NP)

k(3,NP):

k(N,NP):

Rij

i

j

k

l

Rik

θijk

Rc

Eatom,1

Eatom,2

Eatom,N

Etot

Total energy

(DFT)

 

  • Complicated descriptor
  • Shallow NN model
  • Simple descriptor
  • Deep NN model
  • General reliability
  • Low gpu acceleration
  • Careful attentions required
  • High gpu acceleration
  • Pros: large training data point (shorter inference time)
  • Cons: long training time
  • Pros: Short training time + uncertainty
  • Cons: inference time ~ training set size

24 of 86

Limitations of descriptor models

Atom 1

Relative coordinates

Eatom,1

Eatom,2

Eatom,N

Etot

Relative coordinates

Atomic energies

Total energy

(DFT)

Descriptor

Descriptor

Element A

Element B

  • NN of element A

2-body: A-B

2-body: A-A

3-body: A-A-B

3-body: A-A-A

3-body: B-A-B

2-body: A-A

3-body: B-A-B

2-body: A-A

3-body: B-A-B

Hyperparameter set 1

Hyperparameter set 2

Hyperparameter set N

Atomic

energy

  • Number of NN = number of element
  • Size of each input vector ~ (number of elements)2
  • → Toal parameters ~ (number of elements)3

Limitaton1:

Limitation 2: Knowledge from one element is not transferred to others, as a distinct network is used for each element.

Input

Hidden

25 of 86

Types of MLP models

(1) Descriptor-based models

(2) Graph models

Descriptor function

Eatom,1

Eatom,2

Etot

Descriptor function

Atom 1

Atom 2

Total energy

(DFT)

NN of element 1

NN of element 2

Figure: PRL 120, 145301 (2018)

but this paper is not about MLP

Graph construction by connectivity

Graph convolution neural network

Atomic

energies

Atomic

energies

E1

E2

E3

26 of 86

Types of MLP models

(2) Graph models

Figure: PRL 120, 145301 (2018)

but this paper is not about MLP

Graph construction by connectivity

Graph convolution neural network

Atomic

energies

E1

E2

E3

27 of 86

E(3)-equivariant graph machine-learning potentials

1st convolution

2nd convolution

3rd convolution

Message passing

E(3)-equivariant graph model

Node

Features

(scalars)

Scalar

(l=0)

Vector

(l=1)

Rank 2 tensor

(l=2)

Feature vectors consist of tensors, in addition to scalars.

Rotational transformation

  • Equivariant networks are more data-efficient and accurate compared to descriptor-based models.
  • Errors can be reduced by a factor of two to three with equivariant networks, though they are computationally heavier than descriptor-based models.

28 of 86

Equivariant graph neural network

x22 = σ(w112x11 + w122x12 + b1)

x21

x11

w111

w114

w112

w113

x21 = σ(w112x11 + w112x12 + …)

Neural network

Graph NN (massage passing NN)

Equivariant GNN

Message from 1 to 2 = w112x12

Edge tensor, w112,lm = R(r12)Ylm(r12)

w122

w112

x11

x12

x22

Edge

Node

x11

x12

x13

x14

w111

w112

w113

w114

x21

x11

x12

w112

r12

Radial term

(include trainable weights)

Spherical harmonics

Graph

Graph

Graph

Input

Hidden

Output

Input

Hidden

Output

Tensor

For instance, when l = 1

Y1-1(θ, φ) = C sinθ sinφ y

Y10(θ, φ) = C cosθ z

Y11(θ, φ) = C sinθ cosφ x

^

^

^

29 of 86

What is tensor?

E(3) group = 3D Euclidean group, which comprises translations, rotations, and reflections (parity).

l = 0

Even parity (p = 1)

Odd parity (p = -1)

l = 1

l = 2

Pseudo scalar (0o)

Vector (1o)

Scalar (0e)

Pseudo vector (1e)

2o

2e

Parity

(from mirror symmetry)

Order (l)

= 각운동량 양자수

Projection index (m)

= 자기 양자수

m∈[−l, −(l −1)..., (l−1), l]

m = 0

0

1

m = −1

−1

2

m = −2

0

1

dxy

dyz

dz2

dxz

dx2−y2

py

pz

px

s

 

 

 

 

 

 

 

 

 

30 of 86

Structure of equivariant network (NequIP structure for example)

One-hot embedding

First-layer node

Second-layer node

Scalars (0/1)

Scalars

Conventional NN

E(3) NN

Scalar

(l=0)

Vector

(l=1)

Tensor

(l=2)

Edge (=filter, f)

b

a

  • Convolution

Eigenfunction

of rotational

operator

CG coefficient:

Radial neural network

Bessel function

Nat. Commun.

13, 2453 (2022)

Clebsch-Gordon

coeff.

Radial

part

Spherical

harmonics

Node

feature

[Edge tensor Node tensor]lf,pf

Energy

(scalar)

Message from node b to a

31 of 86

Atomic cluster expansion

Many-body messages

2-body

3-body

4-body

5-body

Atomic cluster expansion (ACE)

Multi-ACE

Cf: Graph ACE (GRACE)

PRB 99,014104 (2019)

=L in Previous slides

32 of 86

Invariance vs equivariance

x

Rx

Descriptor (input)

Descriptor model = Invariant model

f(x)

f(Rx)

=

Hidden layers

g(f(x))

g(f(Rx))

=

Output

Energy

Energy

=

Equivariant model

x

Rx

f(x)

f(Rx) = Rf(x)

R

Convolution

layers

Output

Energy

Energy

=

33 of 86

Role of equivariance

x

Descriptor (input)

f(x)

Hidden layers

g(f(x))

Output

Energy

x

f(x)

Convolution

layers

Output

Energy

Structural

representation

Energy regression

Structural representation + energy regression

at the same time

→ MLIP learns effective structural representation way as well

Descriptor model = Invariant model

Equivariant model

34 of 86

Why E(3)-equivariant graph NNs are powerful?

(1) Increase of the cutoff through message passing

(2) All elements share the same network, differing only in their initial embedding vectors.

→ Computational cost does not increase with the number of elements.

One-hot embedding

Scalars (0/1)

(3) The network consists of high-rank tensors, enhancing representability in geometric spaces.

Nat. Mach. Intell. 7, 56 (2025)

Scalar

(l=0)

Vector

(l=1)

Rank 2 tensor

(l=2)

35 of 86

Parallelization issue

Improved parallelization algorithm of SevenNet

Parallelization performance

Park, Han et al. J. Chem. Theory Ccomput. 20, 4857 (2024)

Problem: Graph neural network potentials exhibit poor parallelization performance due to constant communication between nodes.

  • SevenNet addresses the parallelization issue by integrating a communication block within the convolution layers.

36 of 86

Descriptor model vs graph model

Rij

i

j

k

l

Rik

θijk

Rc

Eatom,1

Eatom,2

Eatom,N

Etot

Total energy

(DFT)

 

Descriptor models

Graph models

  • BP-NNP, DeePMD-kit, GAP, …
  • Fast, and available with CPUs
  • Higher errors (0.1 ~ 0.5 eV/Å)
  • Less than quinary composition
  • Poor data efficiency

Scalars

Scalar

(l=0)

Vector

(l=1)

Tensor

(l=2)

  • NequIP, MACE, …
  • Slow, and may not be available with CPUs
  • Lower errors
  • Up to 100-element compositions
  • Good data efficiency

37 of 86

Tensor-based, but not message-passing models

Moment tensor potential (MTP)

Allegro

  • Linear function
  • Descriptors: tensors in cartesian coordinates

Cf) NequIP & MACE: tensors in spherical coordinates

Shapeev, arxiv:1512.06054 (2015)

Review: Mach. Learn. Sci. Technol. 2 025002 (2021)

  • Two-body messages as a descriptor for MLP
  • Not message-passing model; local model

Musaelian, Kozinsky et al. Nat. Commun. 14, 579 (2023)

38 of 86

Summary of MLP models

Neural network

Gaussian process

Behler-Parrinello NNP

DeePMD-kit

GAP

Rij

i

j

k

l

Rik

θijk

Rc

 

Descriptor-based models

E(3)-equivariant graph models

Scalars (0/1)

Scalars

Scalar

(l=0)

Vector

(l=1)

Tensor

(l=2)

  • NequIP, MACE, …

39 of 86

Long-range interaction

Cutoff

Long range: Mostly Coulomb interaction

→ Cannot fully described by conventional MLPs

Charge equilibration (Qeq) scheme + MLP

Predicted by ML

Qeq scheme

Electrostatic energy calculation with ewald summation

  • Cons: high computational cost

Ko, Behler et al. Nat. Commun. 12, 398 (2021)

40 of 86

Issues arising from neglecting electrostatic interactions

Bulk diffusion barrier without defects

Defect formation energy

  • GAP and electrostatics-considered GAP (ES-GAP) yield similar results.
  • Error cancellation occurs due to isotropy.
  • Defect formation energy deviates by 0.3 eV when ES is not considered.
  • The error arises due to anisotropy around defects.

41 of 86

Introduction to the concept of machine-learning potentials

  • Molecular dynamics & DFT
  • Basic concept of MLP
  • Model 1: Descriptor-based models
  • Model 2: E(3)-equivariant graph models
  • Training set generation
  • Universal pretrained MLPs

42 of 86

Example: HF etching (1)

Target simulation:

a-Si3N4

a-Si3N4 + HF

Training set generation:

Non-reactive data:

Crystal, amorphous, molecules, …

molecular dynamics

Target events:

guided MD

Unexpected events:

4,500 – 10,000 K

To increase accuracy in unexpected structures.

Hong, Oh, Han et al. ACS Appl. Mater. Interfaces 16, 48457 (2024)

43 of 86

Example: HF etching (2)

Guided MD accelerates rare reactions by gradually applying constraints on a chosen reaction coordinate

Guided MD

Let RN-H + RSi-F decrease with at a constant rate (0.02 Å/fs)

Let RN-H + RSi-F remains constant

Results

Hong, Oh, Han et al. ACS Appl. Mater. Interfaces 16, 48457 (2024)

44 of 86

Atomic energy mapping

Atom 1

Relative coordinates

Eatom,1

Eatom,2

Eatom,3

Eatom,N

Etot

Relative coordinates

Atom 2

Atomic energies

Total energy

(DFT)

  • DFT total energies are assumed to be decomposable into transferable atomic energies.
  • However, training is performed to predict total energy values.
  • See PRM 3, 093802 (2019) for details.

Etotal = ∑ Eatom

Atomic energy mapping

Not given

(Estimated during training)

Data given for training

  • Relative coordinates: translational invariance
  • Atomic energies: permutational invariance

45 of 86

Sampling training set 1 – using intuition

InP core

ZnSe shell

Bulk

Surface

Interface

Edge and vertex

Simulation target

Kang et al. ACS Mater. Au (2022)

46 of 86

Sampling training set 2 – active learning / iterative learning

Active learning framework

Simulation

with MLP

Configuration not included

in the training set

DFT calculations

MLP update

Simulation with the updated MLP

!

Uncertainty estimation with ensemble

Uncertainty

= standard variation

Untrained structure

Trained structure

High variation

W. Jung and S. Han et al. J. Phys. Chem. Lett. (2020)

47 of 86

Uncertainty estimation based on energy deviations within an ensemble

Atomic energy mapping is not unique!

How can uncertainty be estimated using energy values?

Atomic energy training procedure

  1. Train one MLP
  2. Calculate atomic energies
  3. Train an ensemble of 4–6 MLPs on atomic energies rather than total energies.

Deviations can arise from both uncertainty and variations in atomic mapping across models.

W. Jung and S. Han et al. J. Phys. Chem. Lett. (2020)

→ Implemented in the SIMPLE-NN code

48 of 86

Uncertainty estimation based on force deviations within an ensemble

Nat. Catal. 6, 829 (2023)

  • When using force-based uncertainty, zero uncertainty may occur even in configurations with high errors.
  • In untrained configurations, forces can be zero if the structure is 'symmetric' due to the symmetry of the MLP architecture.

49 of 86

Other uncertainty prediction methods

50 of 86

Open-source active learning codes

https://github.com/mir-group/flare

  • GAP+ACE
  • DeePMD-kit

https://github.com/deepmodeling/dpgen

51 of 86

Sampling training set 3 – advanced sampling methods

Cannot sample

Only sample near equilibrium

Apply bias potential to avoid

already sampled configurations

Metadynamics

Molecular dynamics

How to define “sampled” configurations

Bias potential, Ub:

Bias force:

G: collective variable

Example: G=N-N distance, for N2 dissociation

Bias

T. Ludwig and J. K. Nørskovet al. J. Phys. Chem. C (2020)

52 of 86

General collective variables for sampling training set

Descriptor function

Eatom,1

Eatom,2

Etot

Descriptor function

Atom 1

Atom 2

Total energy

(DFT)

NN of element 1

NN of element 2

Using descriptor function itself as a collective variable would allow general sampling!

D. Yoo, S. Han et al. npj Comput. Mater. (2021); https://github.com/MDIL-SNU/G-metaD

Results

Metadynamics trajectory

Amorphous

Clusters

53 of 86

Atomic energy mapping

Every known MLP follows this structure.

Q: Can we establish a sectioning method for atomic energies applicable across

universal chemical environment?

A: Yes. Mathematical proof is done in this paper:

Q: Then, is the method for segmenting atomic energies unique?

A: No. It means that there can be multiples ways to define atomic energies for the

same training set.

In typical error range (~10 meV/atom), the atomic energies may differ by a few

eV/atom across models, even when the training is successful for each model.

Example: E(SiC) = -10 eV

Pontetial 1) E(Si) = -6 eV, E(C) = -4 eV

Potential 2) E(Si) = -3 eV, E(C) = -7 eV

54 of 86

Ad hoc mapping

Model 1: 100 K MD trajectory

Model 2: 1000 K MD trajectory

Model 1 → ad hoc mapping

Model 2

While the total energies remain consistent, the atomic energies differ between the two models.

55 of 86

Other examples for ad hoc mapping

Case 1: lack of training epoch

Case 2: lack of composition sampling

RMSEs for total energies and forces remain consistent after 100 epochs, but the RMSE for atomic energies converges only after 600 epochs.

Trained on 1:1 composition

Trained on diverse composition

While the total energies in a 1:1 composition are identical, errors exist in the atomic energies.

→ Fails in other compositions

Unphysical

MD trajectory

56 of 86

Introduction to the concept of machine-learning potentials

  • Molecular dynamics & DFT
  • Basic concept of MLP
  • Model 1: Descriptor-based models
  • Model 2: E(3)-equivariant graph models
  • Training set generation
  • Universal pretrained MLPs

57 of 86

Universal interatomic potential

Conventional approach: MLPs for individual systems

Recent approach: universal MLP

Training set

Simulation

Kang et al. ACS Mater. Au (2022)

Kang* et al. ACS Catal. (2023)

Kang* et al. Nano Lett. (2024)

Kang et al. PRB (2020)

Kang et al. npj Comput. Mater. (2022)

Kang* et al. JACS (2023)

Training set

(big data):

Simulation

(universal):

Batatia, Benner, Chiang, Elena, Kovács, Riebesell, Csányi* et al. arXiv:2401.00096 (2023)

Universal

model

58 of 86

Extrapolation behavior of universal MLIP

Training set

.1

  • Materials Project DB

SevenNet-0 & MACE-MP-0 results

Water & ice

Disordered structure

Organic liquid

Etching simulation

SC

BCC

FCC

  • Crystalline material: an ordered solid composed of atoms arranged in a periodic lattice.

Example:

Materials Project is a computational database containing 200,000 inorganic crystal structures.

.1

.1

Not inorganic

Not crystal

arXiv:2401.00096 (2023), JCTC (2024), arXiv:2501.05211 (2025)

59 of 86

Benchmark test of foundation models

Matbench Discovery benchmark test

  • Energy error: non-listed compositions in Materials Project through substitution
  • Thermal conductivity error

META (Facebook)

SNU (Prof. Seungwu Han)

Cambridge

Microsoft

DP technology

(China)

Google DeepMind

Orbital Material (start-up)

Ruhr-Universität Bochum

60 of 86

Multi-fidelity learning

Purpose: we want to learn inconsistent datasets at once (for instance, PBE and SCAN data)

Add a fidelity-dependent term to the input of the ML model.

For instance, PBE = (1,0), SCAN = (0,1)

J. Am. Chem. Soc. 2025, 147, 1042

  • Multi-fidelity learning is the key factor behind SevenNet's high performance in Matbench Discovery.

61 of 86

Cf) Problems of direct force predictions

Non-conservative force model

Conservative force model

arxiv:2405.04967

tot

  • Accurate
  • Fast (3~4 times)

Example problem of non-conservative force models

NVE simulation

  • Non-conservative force models can be unstable during MD simulations.

62 of 86

Practical use of MLPs

  • How to choose model and code for your problem?
  • How to sample training set?
  • How to set hyperparameters?
  • How to know whether the simulation is going wrong or not?

63 of 86

Descriptor model vs graph model

Rij

i

j

k

l

Rik

θijk

Rc

Eatom,1

Eatom,2

Eatom,N

Etot

Total energy

(DFT)

 

Descriptor models

Graph models

  • BP-NNP, DeePMD-kit, GAP, …
  • Fast, and available with CPUs
  • Higher errors (0.1 ~ 0.5 eV/Å)
  • Less than quinary composition
  • Poor data efficiency

Scalars

Scalar

(l=0)

Vector

(l=1)

Tensor

(l=2)

  • NequIP, MACE, …
  • Slow, and may not be available with CPUs
  • Lower errors
  • Up to 100-element compositions
  • Good data efficiency

64 of 86

Descriptor models

Neural network

Gaussian process

Behler-Parrinello NNP

DeePMD-kit

GAP

Training point 1:

Training point 2:

Training point 3:

Training point N:

k(1,NP)

k(2,NP)

k(3,NP):

k(N,NP):

Rij

i

j

k

l

Rik

θijk

Rc

Eatom,1

Eatom,2

Eatom,N

Etot

Total energy

(DFT)

 

  • Complicated descriptor
  • Shallow NN model
  • Simple descriptor
  • Deep NN model
  • General reliability
  • Low gpu acceleration
  • Code: SIMPLE-NN, …
  • Careful attentions required
  • High gpu acceleration
  • Code: DeePMD-kit
  • Pros: large training data point (shorter inference time)
  • Cons: long training time
  • Pros: Short training time + uncertainty
  • Cons: inference time ~ training set size
  • Code: QUIP, VASP

65 of 86

Speed and accuracy: MTP vs GAP vs BP-NNP

J. Phys. Chem. A 2020, 124, 731−745

  • Computational cost: MTP ≳ (BP) NNP > GAP
  • Note that 10 meV/atom of energy error is enough.
  • Using SIMPLE-NN, NNP ~ MTP (much more accelerated by other NNP codes)

66 of 86

SIMPLE-NN code

Optimized GPU usage

Optimized CPU usage

Kyuhyun Lee, Thesis (2019)

https://simple-nn-v2.readthedocs.io/

Various features

  • PCA whitening, and scaling scheme for efficient training
  • Uncertainty
  • GDF weight

67 of 86

Accuracy of E(3)-equivariant models

G. Kim, B. Na, Y. Kim, et al. NeurIPS (2023)

  • In the above paper, a comprehensive benchmark of MLPs is conducted, including tests on their extrapolation capability.
  • The Python framework for these models, along with the data, is available on GitHub.

Blue: in distribution (ID)

Red: out-of-distribution (OOD)

OOD: melt-quench trajectory & random structures

68 of 86

Speed vs accuracy of equivariant models

arXiv:2505.02503

Equivariant, but not message-passing

Equivariant graph (much longer cutoff)

69 of 86

NequIP vs MACE

Model

Parallelization performance

SevenNet-0 (NequIP base) vs MACE-MP-0

Kang, J. Chem. Phys. 161, 244102 (2024)

Park, Han et al. J. Chem. Theory Ccomput. 20, 4857 (2024)

  • Larger number of layers: might be advantageous when modeling electrostatic interactions
  • At least 3 layers
  • Larger inference & training time

  • Smaller number of layers
  • Typically two layers
  • Smaller inference & training time
  • SevenNet demonstrates better parallelization performance compared to MACE.
  • The D3 functional was recently added to the SevenNet code.

70 of 86

Practical use of MLPs

  • How to choose model and code for your problem?
  • How to sample training set?
  • How to set hyperparameters?
  • How to know whether the simulation is going wrong or not?

71 of 86

Constructing training set

  • Even with active learning, constructing a well-structured primary dataset is essential for efficiency.
  • Baseline: Pristine structures, typically obtained from MD simulations.
  • Reaction-specific: Structures of interest, requiring MD, NEB simulations, or advanced sampling methods.
  • General-purpose: Prevents simulations from diverging into untrained regions. High-temperature MD and advanced sampling methods (e.g., metadynamics) are used. However, excessive data inclusion may increase training errors.

72 of 86

Issue in gaussian process model

Training point 1:

Training point 2:

Training point 3:

Training point N:

k(1,NP)

k(2,NP)

k(3,NP):

k(N,NP):

  • Inference time ~ number of training set
  • Dataset sparsification is crucial for Gaussian process-based models.
  • Example: CUR decomposition

n datapoints → k datapoints

73 of 86

Considering atomic energy mapping

Trained on 1:1 composition

Trained on diverse composition

Composition

Volume

Temperature

Unphysical

MD trajectory

  • Training set simulations should also be conducted under non-target conditions to enhance robustness.

74 of 86

Practical use of MLPs

  • How to choose model and code for your problem?
  • How to sample training set?
  • How to set hyperparameters?
  • How to know whether the simulation is going wrong or not?

75 of 86

Machine learning 101

Andrew Ng

76 of 86

The most important hyperparameter in machine learning

"If you can adjust only one hyperparameter, tuning the learning rate."

  • Carefully observe the difference between validation and training errors.
  • Monitor energy, force, and stress errors separately—do not rely solely on the total loss.

Bible:

Small learning rate (lr)

Big lr

Right lr

Epoch

Loss

Learning rate adjustment!

Do it manually, or use scheduler

77 of 86

Important hyperparameters

Read prior studies. Refer to commonly used hyperparameters.

# Network

nodes: '30-30'

acti_func: 'sigmoid'

double_precision: True

weight_initializer:

type: 'xavier normal'

dropout: 0.0

use_scale: True

use_pca: True

use_atomic_weights: False

weight_modifier:

type: null

# Optimization

optimizer:

method: 'Adam'

batch_size: 8

full_batch: False

total_epoch: 1000

learning_rate: 0.0001

decay_rate: null

l2_regularization: 1.0e-6

# Loss function

energy_coeff: 1.

force_coeff: 0.1

stress_coeff: 1.0e-6

Example: SIMPLE-NN input file

Usually, 30-30 ~ 60-60

Usually, stress coefficient is the smallest,

Energy loss / force loss = 10 ~ 0.1

10-3 ~ 10-5

78 of 86

Sampling bias in materials science problem

In-configuration bias

Out-of-configuration bias

  • Defect atom : Bulk atom = 1:100

[ structure_type_1 : 1.0]

/location/of/calculation/data/oneshot_output_file :

/location/of/calculation/data/MDtrajectory_output_file 100:2000:20

[ structure_type_2 : 3.0 ]

/location/of/calculation/data/same_folder_format{1..10}/oneshot_output_file :

Training weight

  • Relaxation and NEB trajectories consist of small simulation steps, whereas MD simulations involve larger steps, potentially biasing the training.
  • Proper weight assignment is crucial.

Example: SIMPLE-NN structure_list file

79 of 86

Addressing in-configuration bias: gaussian density function (GDF) weight

Descriptor values of training set

Density of data points

Jeong, Han et al. J. Phys. Chem. C 122, 22790 (2018)

80 of 86

Practical use of MLPs

  • How to choose model and code for your problem?
  • How to sample training set?
  • How to set hyperparameters?
  • How to know whether the simulation is going wrong or not?

81 of 86

Rule no. 1

In machine learning, data is the most important factor.

Training: If training does not converge properly, first verify the quality of the training set. Ensure there are no errors in the DFT calculations

Simulation: If a simulation collapses, check whether the collapsed structures were included in the training set.

82 of 86

The most important thing: do it and test!

Carefully construct a test set that accurately represents the properties of your target simulation.

  • Example: nanoparticle

Training set

Test set

Simulation

Test

Refine

  • What configurations should be included in the test set?
  • Targeted events: NEB trajectory, defect formation energy, etc.
  • Fundamental PES properties: Equation of state (EOS), phonon calculations, etc.
  • Extrapolatability in scale: Larger system sizes not included in the training set, but still feasible for DFT calculations (avoiding excessively large structures).

Kang et al. ACS Mater. Au, 2, 103 (2022)

83 of 86

How to know whether the given structure is included or not in the training set

  • PCA or t-SNE analysis

PRB 102, 224104 (2020)

  • Uncertainty

ACS Catal. 13, 16078 (2023)

84 of 86

Example: J. Phys. Chem. Lett. 2020, 11, 6090 (1)

Target simulation

Training set

Crystal

Liquid

Simulation

New phase is found at the interface!

85 of 86

Example: J. Phys. Chem. Lett. 2020, 11, 6090 (2)

Uncertainty

  • The newly discovered structure is absent from the training set.
  • Its phase is energetically unstable.
  • The energy is inaccurately described due to its omission from the training set.

Simulation with the re-trained MLP

No high-uncertainty configuration

86 of 86

Practice with codes: SIMPLE-NN tutorial

1. 파이썬 환경 설치

2. SIMPLE-NN 설치

3. SIMPLE-NN 튜토리얼

3.1. Preprocess

3.2. Training

3.3. Preprocess + training at once (실행은 하지 않을 예정)

3.4. Continue training with different hyperparameters

3.5. Continual learning (training set을 추가한 다음 기존의 포텐셜도 계속 training하는 경우)

4. PCA analysis