1 of 32

Generating graphs with Generative Adversarial Networks

Oct 20th, 2022

BMI/CS 775 Computational Network Biology�Fall 2022

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 32

  • Representation learning on graphs
  • Graph neural networks
  • Graph transformers
  • Generative graph models

Topics in this section

3 of 32

Goals for today

  • Generative Adversarial Network appeal and definitions
    • Likelihood-based versus implicated generative models
    • Loss functions
    • Generating graphs
  • Generating biological and biochemical graphs
    • MolGAN molecule generation
    • Alternative approaches for larger graphs
  • Challenges in training Generative Adversarial Networks

4 of 32

Graph generation task

  • Given:
    • Example graphs from a desired distribution
    • Score for a graph (optional)
  • Do:
    • Sample new graphs from that same distribution
    • Preferentially sample graphs with high scores (optional)

  • Challenges:
    • How do we represent the distribution of graphs with certain properties?
    • How do we efficiently sample from it?

5 of 32

Biochemical graph generation task

  • Given:
    • Example druglike chemical graphs
    • A scoring function for drug-related chemical properties

  • Do:
    • Sample new graphs that represent druglike chemicals

  • Key idea:
    • No longer score existing chemical libraries for drug-related properties
    • Directly generate candidate chemicals to synthesize

6 of 32

Generating classic random graphs

  • Lobster graph:
    • Central path (backbone)
    • All other vertices ≤ 2 edges from backbone
    • Easy to parameterize distribution and sample from it
      • Expected number of nodes in backbone
      • Probability of adding a layer 1 edge
      • Probability of adding a layer 2 edge

7 of 32

Generating classic random graphs

  • Erdős-Rényi graph:
    • Choose each possible edge with uniform probability
    • Two parameters
      • Number of nodes
      • Probability of adding edge
  • Barabási–Albert graph:
    • Preferentially add new nodes to high degree nodes
    • Two parameters
      • Number of nodes
      • Number of edges per new node

8 of 32

Likelihood-based generative models

  •  

9 of 32

Generative Adversarial Networks (GANS)

  • Generative model with a clever training strategy
  • No explicit likelihood required for generated objects
  • Can be much more difficult to train than variational autoencoders
  • Powerful data synthesis applications
  • Goodfellow et al. 2014 arXiv:1406.2661

10 of 32

GAN examples: CycleGAN

  • Image-to-image translation

  • Zhu et al. ICCV 2017 and code

11 of 32

GAN examples: image inpainting

  • Context encoders

  • Pathak et al. CVPR 2016 and code

12 of 32

GAN examples: living portraits

  • Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
  • Zakharov et al. 2019 arXiv:1905.08233
  • Video

13 of 32

How does a GAN work?

  • Frame as a two player game
    • G: generator
    • D: discriminator
  • G wants to generate samples that fool the discriminator
  • D wants to accurately distinguish real from generated samples

14 of 32

How does a GAN work?

  • Initially G generates unrealistic samples, easy for D

  • After training, the samples start to improve

  • Eventually D is fooled by the realistic samples from G

15 of 32

GAN neural network architecture

  • G and D can each be some form of neural network
    • G samples noise, generates an object
    • D classifies real and generated objects

16 of 32

GAN loss functions

  •  

17 of 32

Top Hat question

18 of 32

MolGAN: generating druglike molecules

  • Generate small molecules (chemicals) that resemble known drugs
  • Extend GAN framework by incentivizing certain properties

Image from De Cao and Kipf 2018 arXiv:1805.11973

19 of 32

Druglike small molecules

  • QM9 dataset enumerated 133,885 organic compounds and calculated properties
    • Heavy atoms: carbon (C), oxygen (O), nitrogen (N), fluorine (F)
  • Universe of graphs to generate
    • Only up to nine heavy atoms (nodes)
    • Three types of edges: single, double, triple bonds
  • Estimate properties with RDKit
    • Druglikeness
    • Solubility
    • Synthetizability

20 of 32

MolGAN generator

  •  

Image from De Cao and Kipf 2018 arXiv:1805.11973

21 of 32

MolGAN discriminator

  •  

Image from De Cao and Kipf 2018 arXiv:1805.11973

 

Edge types

Adjacency matrix

for each edge type

22 of 32

MolGAN reward function

  • Graph regression
  • Same strategy as discriminator but predict continuous value
  • Differentiable approximation of reward function from RDKit
  • Inspired by Reinforcement Learning

Image from De Cao and Kipf 2018 arXiv:1805.11973

23 of 32

Overall generator objective

 

Generator parameters

Hyperparameter

24 of 32

MolGAN evaluation

25 of 32

MolGAN evaluation

MolGAN struggles to generate unique molecules

26 of 32

Challenges with GANs

  • Sensitive to hyperparameters
  • Training may not converge
  • Difficult to evaluate
  • Vanishing gradients: discriminator is too good at rejecting generated samples
  • Mode collapse: generator repeatedly produces the same or very similar samples

  • Can observe some of these challenges interactively
    • Train GAN to generate 2D data
    • https://poloclub.github.io/ganlab/

27 of 32

Wasserstein GAN

  •  

28 of 32

Alternative graph generation approaches

  • Reinforcement learning to add/remove graph components
  • Recurrently add graph components
  • Graph Recurrent Attention Networks (Liao et al. 2019 arXiv:1910.00760)

  • Scales to up to 5,000 nodes

29 of 32

Conclusions

  • GANs are flexible generative models
    • Do not require explicit likelihood function
    • Approach is difficult to reproduce with traditional graph models
    • Amazing synthesis capabilities in many domains
      • Drug and protein design
    • Not widely used in gene/protein networks
    • Many practical challenges in training and evaluation

30 of 32

What’s next in generative modeling?

31 of 32

What’s next in generative modeling?

  • Diffusion models are quickly replacing GANs in many domains
    • Image generation
    • Text prompt to image
    • Text prompt to video
    • Image to image translation
  • Applications for proteins and chemicals

32 of 32

What’s next in generative modeling?

“Computers fighting bacteria, sci fi, high resolution”

Image from Keras Stable Diffusion Colab notebook

Fun to generate images

Many ethical issues to consider