1 of 40

Graph convolutional neural networks: extensions and limitations

Oct 3rd, 2024

BMI/CS 775 Computational Network Biology�Fall 2024

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 40

Topics in this section

  • Graph neural networks
  • Graph neural network extensions and limitations
  • Graph transformers
  • Generative graph models

3 of 40

Goals for today

  • Graph neural network extensions
    • Edge features
    • Alternative ways to handle variable node degree
    • Graph attention
    • Pooling
  • Limitations of graph neural networks
  • More biological applications
    • Protein function prediction
    • Epigenomics
    • Drug side effects
    • Drug discovery

4 of 40

Basic graph convolutional neural network

  • Assumptions made in graph convolution layer
    • No edge features
    • All neighbors should be treated in the same way (shared weight matrix)
    • All neighbors are equally important
    • Do not need to aggregate information until the end

5 of 40

Edge features in graph convolutions

  • Recall protein graph had edge features
  • Minor modification makes it easy to update in first graph convolution layer
  • Unlike node features, harder to update in later layers

Image from Fout et al. NIPS 2017

 

 

6 of 40

Unique neighbor weights in graph convolutions

  • Have been using the same weight vector for all neighbors
  • Can generalize if we can sort neighbors
    • In protein interface prediction, can sort by residue-residue distance

 

 

7 of 40

Protein interface prediction results

  • More complex network architectures did not have a big impact

Graph architecture

Optimal graph layers

AUC

None

1

0.812

Node weights

3

0.891

Node and edge weights

2

0.898

Unique node weights and edge weights

3

0.891

8 of 40

GraphSAGE: alternative approach for graphs with variable degree

  •  

9 of 40

GraphSAGE: alternative approach for graphs with variable degree

Image from Hamilton et al. NIPS 2017

Sampling naturally handles variable degree

Defined multiple aggregation strategies in addition to mean

10 of 40

Which neighbors are important

  • Graph convolutional neural networks give each neighbor the same influence
  • What if some neighbors are more important?
  • How do we know which neighbors are important in advance?

  • Attention is a deep learning approach to address this
    • Originally developed for sequence models
    • Key idea: learn which neighbors are important

11 of 40

Graph Attention Networks

  •  

5

1

3

2

4

 

All edges have equal importance

5

1

3

2

4

 

 

 

 

Edges have varying importance

12 of 40

Calculating attention in graphs

Image from Veličković et al. ICLR 2018

 

Use a simple neural network to calculate attention

13 of 40

Calculating attention in graphs

Image from Veličković et al. ICLR 2018

 

Can have parallel kinds of attention (multiple heads): blue, green, purple

Concatenate or aggregate them

 

14 of 40

Calculating attention in graphs

Need to normalize attention over neighbors

Use softmax function

 

 

 

 

Before normalization

After normalization

 

 

 

 

 

15 of 40

Graph Attention Networks: putting it all together

Simplified notation from Zhang et al. in DGL docs

Transform current representation

Calculate edge attention as function of node representations

Normalize edge attentions

Use edge attentions in node updates

Concatenation

16 of 40

Top Hat question

17 of 40

Partial graph attention example

1.1

-0.3

2.1

 

3.0

0.8

-2.6

 

1.9

2.4

0.1

 

-0.9

-1.4

-2.3

 

0.3

3.2

2.7

 

1

0

-1

-1

0

0

1

0

1

 

-1

1

0

1

-1

0

 

1

2

3

4

5

6

7

8

9

10

Examine partial graph attention calculations for nodes 1 and 10

-3.5

-4.4

1.2

 

3.0

0.8

-2.6

 

0.2

2.9

-1.5

 

3.1

-1.0

-2.7

 

-1.1

2.2

-3.3

 

Nodes 2 and 7 have same initial features

18 of 40

Partial graph attention example

1.1

-0.3

2.1

 

3.0

0.8

-2.6

 

1

0

-1

-1

0

0

1

0

1

1

2

3

4

5

6

7

8

9

10

3.0

0.8

-2.6

 

-1.1

2.2

-3.3

 

1

0

-1

-1

0

0

1

0

1

1

0

-1

-1

0

0

1

0

1

1

0

-1

-1

0

0

1

0

1

Only showing some of the calculations

-1

-1.1

3.2

5.6

-3

0.4

2.2

1.1

-4.4

5.6

-3

0.4

 

 

 

 

 

 

 

 

19 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

-1

-1.1

3.2

5.6

-3

0.4

 

 

 

 

2.2

1.1

-4.4

5.6

-3

0.4

-1

1

0

1

-1

0

 

-1

1

0

1

-1

0

 

 

 

 

 

20 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

https://paperswithcode.com/method/leaky-relu

-1

-1.1

3.2

5.6

-3

0.4

 

 

 

 

2.2

1.1

-4.4

5.6

-3

0.4

-1

1

0

1

-1

0

 

-1

1

0

1

-1

0

 

 

 

 

 

 

 

21 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22 of 40

Pooling

  • In 1D or 2D convolutional neural networks, pooling is common
  • Search for patterns in small regions, then aggregate, repeat

Initially recognize holes in the ball

Then recognize configurations of holes

23 of 40

Pooling

  • Pooling is important in graphs as well but challenging
    • No pre-defined grid structure
    • Can make computation more efficient
    • Could help generalize across graphs

  • Consider how to learn the pooling
    • Self-attention graph pooling
    • Lee et al. 2019 arXiv:1904.08082

24 of 40

Self-attention graph pooling

Image from Lee et al. 2019 arXiv:1904.08082

Compute attention with graph convolution

Select top k fraction of the nodes to keep

Use attention mask to shrink the graph

25 of 40

Limitations of graph neural networks

  • Empirical and theoretical studies of classic graph neural networks revealed limitations:
    • Under-reach
    • Over-smoothing
    • Over-squashing
    • Expressiveness

26 of 40

Limitations of graph neural networks: under-reach

Image adapted from Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks

Require k graph layers to share information k edges away

Cannot learn long distance relationships

i

Layer 1

Layer 2

27 of 40

Limitations of graph neural networks: over-smoothing

1

3

2

4

Input graph

 

0.1

7.8

3.4

2.2

9.4

1.1

4.1

8.7

8.3

2.1

1.5

3.3

3.3

4.0

5.6

2.9

6.9

5.8

2.3

4.7

7.6

4.1

2.3

5.7

8.2

2.3

1.7

3.6

4.4

3.5

4.1

2.8

6.1

3.2

3.0

7.8

6.1

3.2

3.0

7.8

6.1

3.2

3.0

7.8

6.1

3.2

3.0

7.8

 

GNN layer 1

 

GNN layer 50

GNN layer 2

1

2

3

4

28 of 40

Limitations of graph neural networks: over-squashing

Also known as bottlenecks

Image from Alon and Yahav ICLR 2021

Label green node from {A, B, C} based on number of neighbors

General graph

29 of 40

Limitations of graph neural networks: over-squashing

Images from Alon and Yahav ICLR 2021

Accuracy drops for r > 4 or 5

Evaluate different graph neural network variants

Information from all leaves must be aggregated in top node’s representation

Test case: binary tree with depth r, predict label of root

30 of 40

Limitations of graph neural networks: expressiveness

Image from PubChem

Image from GraphGPS blog post

Decalin molecule

Molecular graph

Standard graph neural network representation

Ideal representation

31 of 40

Biological applications

  • Initially popular in chemistry / biochemistry
  • Gained traction in many other areas
  • Biological network analysis with deep learning

  • Key considerations:
    • Unsupervised / supervised
    • Node / edge / graph predictions
    • What varies? Node features, graph structure, etc.

32 of 40

Predicting quantitative protein function

  • Input: Wild type protein structure, amino acid sequence
  • Output: Functional activity score
  • Gelman et al. 2021 doi:10.1073/pnas.2104878118

33 of 40

Predicting quantitative protein function

  • Importance of control experiments
  • Does graph neural network performance change with alternative graphs?

  • For these datasets, can often still get good performance with the control graphs
  • Requires fully connected layer after the graph layers

Gelman et al. 2021 doi:10.1073/pnas.2104878118

34 of 40

Predicting protein function

  • Input: Protein structure, residue (node) features
  • Output: Gene Ontology labels
  • Gligorijevic et al. 2021 Nature Communications

35 of 40

Predicting epigenetic state

  • Input: DNA sequence, Hi-C contact maps
  • Output: TF binding, histone modifications, accessibility
  • Lanchantin and Qi 2020 Bioinformatics

36 of 40

Predicting polypharmacy side effects

  • Input: 3 types of protein and drug interactions, node features
  • Output: Types of side effects for drug-drug pair
  • Zitnik et al. 2018 Bioinformatics

37 of 40

Predicting drug properties

  • Input: Chemical structure
  • Output: Chemical’s ability to inhibit a drug target
  • Chen et al. 2018 Drug Discovery Today

Does this chemical bind an important protein domain and affect bioactivity?

38 of 40

Predicting drug properties

  • IDG-DREAM Drug-Kinase Binding Prediction Challenge
  • Predict binding affinity between 25-70 compounds and ~200 kinases
  • One top performer used graph convolutional neural networks
    • Park et al. 2019 DMIS_DK Submission
    • Tested multiple graph architectures in ensemble
    • Graph Attention Networks most important in ensemble but random forest also competitive
    • 78 node features from RDKit: atom type, degree, bound Hs, valence, aromatic

39 of 40

Conclusions

  • Graph convolutional networks can be applied to many of the other biological tasks in this course
  • Can be advantages to coupling node representation with final prediction
  • Very flexible formulation, can tweak almost anything
  • Many hyperparameters, non-trivial to train
  • Limitations in how information is shared across graph
  • Critical to benchmark against simpler models

40 of 40

Recent GNN extensions

  • Wang and Cho 2024 Non-convolutional Graph Neural Networks
    • New approach to overcome some limitations of GNNs we won’t have time to discuss
  • Finkelshtein 2024 Cooperative Graph Neural Networks
    • GNN extension where nodes play a game and select among different actions during message passing