1 of 34

Equivariant Flow Matching for 3D Molecule Generation with Hybrid Probability Path Transport

Yuxuan Song*, Jingjing Gong*, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, Wei-Ying Ma

AIR, Tsinghua University & Stanford University

April. 2024

1

2 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Method
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

2

3 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Method
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

3

4 of 34

Geometric Graph Generation

4

Catalysis Systems

Protein Design

Structure-based Drug Design

With pocket condition

With lattice matrix

RNA/DNA

 

Applications

General Formulation

Molecule Geometric graph could represent the information of topology, chemical property and conformation of the molecular.

As large graphs

5 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

5

6 of 34

Challenges and Limitations

6

Structure Constraint

Image data

The information in image data could be decomposed by adding multi-level noises which results to a coarse-to-fine modeling order

Curves of bond length and energy of H-H

Structure information is very sensitive to perturbation

Diffusion Models

7 of 34

Challenges and Limitations

7

Structure Constraint

Image data

The information in image data could be decomposed by adding multi-level noises which results to a coarse-to-fine modeling order

Curves of bond length and energy of H-H

Structure information is very sensitive to perturbation

Diffusion Models

EDM

GeoLDM

8 of 34

Challenges and Limitations

8

Multi-modality

 

Hydrogen

[1]

Discrete atom types

Discretised Charges

Geometric Symmetry

The learned density function should be roto-translational invariant.

9 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

9

10 of 34

Normalizing Flows

  • Normalizing Flow:
    • The distribution transformation view:

    • K layers of transformation

10

11 of 34

Continuous Normalizing Flow

  • Continuous Normalizing Flow:
    • K layers?
    • What about infinite layers?

We could model a continuous-in-time transformation:

11

Change of Variable

ODE solver

12 of 34

Flow Matching

  • Flow Matching: (simulation-free)
    • A more General Objective:

    • For all possible paths, match them by learning the vector field.

12

13 of 34

Flow Matching for Generative Modeling

  • Construct a Probability Path
    • Conditional probability Path:
      • For each x, it with be easy to define such path.

13

P0(x|x1) = P(x) (N(0,1))

P1(x|x1) = N(x1,0.0001)

P0(x|x1) = P(x)

14 of 34

Flow Matching for Generative Modeling

  • Key Observation:
    • Marginal and Conditional:

14

“Marginal Field Generate Marginal Path”

15 of 34

Flow Matching

  • Flow matching could be implemented with a simple objective by interpolation:
    • (OT-path)

15

Linear interpolations

Learned vector field

https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html

16 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

16

17 of 34

Intuition for EquiFM

  • The training of diffusion is essentially an SDE process 🡪 unstable dynamics consider the Structure Constraint.
  • Flow matching:

17

Learn the vector field and then generate by solving the ODE.

Simple objective and

Stable generation

18 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

18

19 of 34

Equivariant Optimal Transport

19

Diffusion Models

OT Flow Matching

Could we minimize the transport distance during generation?

20 of 34

Equivariant Optimal Transport

  • Equivariant Optimal Transport (EOT).
    • Align the noise and data consider all the permutation and SE(3) transformation:[1,2]

20

[1]. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. Song et al NeurIPS2023

[2]. Equivariant Flow Matching. Klein et al NeurIPS2023

Essentially an non isotropic Gaussian Prior/structure prior

21 of 34

Equivariant Optimal Transport

  • Equivariant Optimal Transport (EOT).
    • Align the noise and data consider all the permutation and SE(3) transformation:[1,2]

21

[1]. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. Song et al NeurIPS2023

[2]. Equivariant Flow Matching. Klein et al NeurIPS2023

22 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

22

23 of 34

Hybrid Probability Path

  • Towards Multi-Modality issues:
    • What is the probability path for atom-types?

23

[0.2,0.1,0.8,0.5,-0.1]

If we conduct linear interpolation, the MAX category would be only changed once on the interpolation path in approximate middle timestep. (~t=0.5)

Could we have better designed paths?

24 of 34

Hybrid Probability Path

  • Towards Multi-Modality issues:
    • A hybrid probability path:

24

Intuition: Make the information emergence of different modality, i.e. coordinate and atom type, in the same speed

Observation: Determine the Atom types when the structure is good.

25 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

25

26 of 34

Theoretical Property

  • Roto-Translational Density Modeling

26

Key steps:

The Jacobian matrix is being equivariant.

27 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

27

28 of 34

Empirical Results

28

Superior Performance on Several Benchmarks,

[1]. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. Yuxuan Song et al NeurIPS2023

Diffusion Models

Flow Matching

Eot Flow Matching

29 of 34

Empirical Results

  • Ablation Study on Hybrid Probability Path

  • Conditional Experiments & Speed-up Sampling:

29

4.75 speed-up with Dopri5

30 of 34

Equivariant Flow Matching with Hybrid Probability Transport[2]

30

Exploring the best generation path for Geometry Graphs

  • Guided the generative model with EOT map(minimizing moving distance during sampling.)
  • Joint generation of different variable with a hybrid probability path

Generation of EquiFM could enjoy the benefit of adaptive ODE solvers:SOTA results with 4.75 times speed up

[1]. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. Song et al NeurIPS2023

31 of 34

Overview

  • Backgrounds:
    • Geometric Graph Generation
      • Definition
      • Challenges
    • Flow Matching
  • Equivariant Flow Matching (EquiFM)
    • Methodology
      • Equivariant Optimal Transport
      • Hybrid Probability Path
    • Theoretical Property
    • Experimental Results
  • Discussion and Future Works

31

32 of 34

Future Directions

  • Better Modeling of Categorical Modality:
    • Discrete Diffusion etc.

  • Application of EquiFM to structure-based drug design.

  • More flexible design of the prior of EquiFM.

32

33 of 34

Thanks

  • The sampling codes is at:

33

34 of 34

Some Related Works

  • Unified Generative Modeling of 3D molecules via Bayesian Flow Networks. (ICLR 2024 Oral)

  • MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
    • https://arxiv.org/abs/2404.12141

34