1 of 32

Generating graphs with Generative Adversarial Networks

Oct 20^th, 2022

BMI/CS 775 Computational Network Biology�Fall 2022

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 32

Representation learning on graphs
Graph neural networks
Graph transformers
Generative graph models

Topics in this section

3 of 32

Goals for today

Generative Adversarial Network appeal and definitions

Likelihood-based versus implicated generative models
Loss functions
Generating graphs

Generating biological and biochemical graphs

MolGAN molecule generation
Alternative approaches for larger graphs

Challenges in training Generative Adversarial Networks

4 of 32

Graph generation task

Given:

Example graphs from a desired distribution
Score for a graph (optional)

Do:

Sample new graphs from that same distribution
Preferentially sample graphs with high scores (optional)

Challenges:

How do we represent the distribution of graphs with certain properties?
How do we efficiently sample from it?

5 of 32

Biochemical graph generation task

Given:

Example druglike chemical graphs
A scoring function for drug-related chemical properties

Do:

Sample new graphs that represent druglike chemicals

Key idea:

No longer score existing chemical libraries for drug-related properties
Directly generate candidate chemicals to synthesize

6 of 32

Generating classic random graphs

Lobster graph:

Central path (backbone)
All other vertices ≤ 2 edges from backbone
Easy to parameterize distribution and sample from it

Expected number of nodes in backbone
Probability of adding a layer 1 edge
Probability of adding a layer 2 edge

7 of 32

Generating classic random graphs

Erdős-Rényi graph:

Choose each possible edge with uniform probability
Two parameters

Number of nodes
Probability of adding edge

Barabási–Albert graph:

Preferentially add new nodes to high degree nodes
Two parameters

Number of nodes
Number of edges per new node

8 of 32

Likelihood-based generative models

9 of 32

Generative Adversarial Networks (GANS)

Generative model with a clever training strategy
No explicit likelihood required for generated objects
Can be much more difficult to train than variational autoencoders
Powerful data synthesis applications
Goodfellow et al. 2014 arXiv:1406.2661

10 of 32

GAN examples: CycleGAN

Image-to-image translation

Zhu et al. ICCV 2017 and code

11 of 32

GAN examples: image inpainting

Context encoders

Pathak et al. CVPR 2016 and code

12 of 32

GAN examples: living portraits

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Zakharov et al. 2019 arXiv:1905.08233
Video

13 of 32

How does a GAN work?

Frame as a two player game

G: generator
D: discriminator

G wants to generate samples that fool the discriminator
D wants to accurately distinguish real from generated samples

14 of 32

How does a GAN work?

Initially G generates unrealistic samples, easy for D

After training, the samples start to improve

Eventually D is fooled by the realistic samples from G

Images from https://developers.google.com/machine-learning/gan/gan_structure

15 of 32

GAN neural network architecture

G and D can each be some form of neural network

G samples noise, generates an object
D classifies real and generated objects

Image from https://developers.google.com/machine-learning/gan/gan_structure

16 of 32

GAN loss functions

17 of 32

18 of 32

MolGAN: generating druglike molecules

Generate small molecules (chemicals) that resemble known drugs
Extend GAN framework by incentivizing certain properties

Image from De Cao and Kipf 2018 arXiv:1805.11973

19 of 32

Druglike small molecules

QM9 dataset enumerated 133,885 organic compounds and calculated properties

Heavy atoms: carbon (C), oxygen (O), nitrogen (N), fluorine (F)

Universe of graphs to generate

Only up to nine heavy atoms (nodes)
Three types of edges: single, double, triple bonds

Estimate properties with RDKit

Druglikeness
Solubility
Synthetizability

20 of 32

MolGAN generator

Image from De Cao and Kipf 2018 arXiv:1805.11973

21 of 32

MolGAN discriminator

Image from De Cao and Kipf 2018 arXiv:1805.11973

Edge types

Adjacency matrix

for each edge type

22 of 32

MolGAN reward function

Graph regression
Same strategy as discriminator but predict continuous value
Differentiable approximation of reward function from RDKit
Inspired by Reinforcement Learning

Image from De Cao and Kipf 2018 arXiv:1805.11973

23 of 32

Overall generator objective

Generator parameters

Hyperparameter

24 of 32

MolGAN evaluation

25 of 32

MolGAN evaluation

MolGAN struggles to generate unique molecules

26 of 32

Challenges with GANs

Sensitive to hyperparameters
Training may not converge
Difficult to evaluate
Vanishing gradients: discriminator is too good at rejecting generated samples
Mode collapse: generator repeatedly produces the same or very similar samples

Can observe some of these challenges interactively

Train GAN to generate 2D data
https://poloclub.github.io/ganlab/

27 of 32

Wasserstein GAN

28 of 32

Alternative graph generation approaches

Reinforcement learning to add/remove graph components
Recurrently add graph components
Graph Recurrent Attention Networks (Liao et al. 2019 arXiv:1910.00760)

Scales to up to 5,000 nodes

29 of 32

Conclusions

GANs are flexible generative models

Do not require explicit likelihood function
Approach is difficult to reproduce with traditional graph models
Amazing synthesis capabilities in many domains

Drug and protein design

Not widely used in gene/protein networks
Many practical challenges in training and evaluation

30 of 32

What’s next in generative modeling?

31 of 32

What’s next in generative modeling?

Diffusion models are quickly replacing GANs in many domains

Image generation
Text prompt to image
Text prompt to video
Image to image translation

Applications for proteins and chemicals

32 of 32

What’s next in generative modeling?

“Computers fighting bacteria, sci fi, high resolution”

Image from Keras Stable Diffusion Colab notebook

Fun to generate images

Many ethical issues to consider