Graph convolutional neural networks: extensions and limitations
Oct 3rd, 2024
BMI/CS 775 Computational Network Biology�Fall 2024
Anthony Gitter
Topics in this section
Goals for today
Basic graph convolutional neural network
Edge features in graph convolutions
Image from Fout et al. NIPS 2017
Unique neighbor weights in graph convolutions
Protein interface prediction results
Graph architecture | Optimal graph layers | AUC |
None | 1 | 0.812 |
Node weights | 3 | 0.891 |
Node and edge weights | 2 | 0.898 |
Unique node weights and edge weights | 3 | 0.891 |
GraphSAGE: alternative approach for graphs with variable degree
GraphSAGE: alternative approach for graphs with variable degree
Image from Hamilton et al. NIPS 2017
Sampling naturally handles variable degree
Defined multiple aggregation strategies in addition to mean
Which neighbors are important
Graph Attention Networks
5
1
3
2
4
All edges have equal importance
5
1
3
2
4
Edges have varying importance
Calculating attention in graphs
Image from Veličković et al. ICLR 2018
Use a simple neural network to calculate attention
Calculating attention in graphs
Image from Veličković et al. ICLR 2018
Can have parallel kinds of attention (multiple heads): blue, green, purple
Concatenate or aggregate them
Calculating attention in graphs
Need to normalize attention over neighbors
Use softmax function
Before normalization
After normalization
Graph Attention Networks: putting it all together
Simplified notation from Zhang et al. in DGL docs
Transform current representation
Calculate edge attention as function of node representations
Normalize edge attentions
Use edge attentions in node updates
Concatenation
Top Hat question
Partial graph attention example
1.1 |
-0.3 |
2.1 |
3.0 |
0.8 |
-2.6 |
1.9 |
2.4 |
0.1 |
-0.9 |
-1.4 |
-2.3 |
0.3 |
3.2 |
2.7 |
1 | 0 | -1 |
-1 | 0 | 0 |
1 | 0 | 1 |
-1 |
1 |
0 |
1 |
-1 |
0 |
1
2
3
4
5
6
7
8
9
10
Examine partial graph attention calculations for nodes 1 and 10
-3.5 |
-4.4 |
1.2 |
3.0 |
0.8 |
-2.6 |
0.2 |
2.9 |
-1.5 |
3.1 |
-1.0 |
-2.7 |
-1.1 |
2.2 |
-3.3 |
Nodes 2 and 7 have same initial features
Partial graph attention example
1.1 |
-0.3 |
2.1 |
3.0 |
0.8 |
-2.6 |
1 | 0 | -1 |
-1 | 0 | 0 |
1 | 0 | 1 |
1
2
3
4
5
6
7
8
9
10
3.0 |
0.8 |
-2.6 |
-1.1 |
2.2 |
-3.3 |
1 | 0 | -1 |
-1 | 0 | 0 |
1 | 0 | 1 |
1 | 0 | -1 |
-1 | 0 | 0 |
1 | 0 | 1 |
1 | 0 | -1 |
-1 | 0 | 0 |
1 | 0 | 1 |
Only showing some of the calculations
-1 |
-1.1 |
3.2 |
5.6 |
-3 |
0.4 |
2.2 |
1.1 |
-4.4 |
5.6 |
-3 |
0.4 |
Partial graph attention example
1
2
3
4
5
6
7
8
9
10
-1 |
-1.1 |
3.2 |
5.6 |
-3 |
0.4 |
2.2 |
1.1 |
-4.4 |
5.6 |
-3 |
0.4 |
-1 |
1 |
0 |
1 |
-1 |
0 |
-1 |
1 |
0 |
1 |
-1 |
0 |
Partial graph attention example
1
2
3
4
5
6
7
8
9
10
https://paperswithcode.com/method/leaky-relu
-1 |
-1.1 |
3.2 |
5.6 |
-3 |
0.4 |
2.2 |
1.1 |
-4.4 |
5.6 |
-3 |
0.4 |
-1 |
1 |
0 |
1 |
-1 |
0 |
-1 |
1 |
0 |
1 |
-1 |
0 |
Partial graph attention example
1
2
3
4
5
6
7
8
9
10
Pooling
Initially recognize holes in the ball
Then recognize configurations of holes
…
…
…
…
…
Pooling
Self-attention graph pooling
Image from Lee et al. 2019 arXiv:1904.08082
Compute attention with graph convolution
Select top k fraction of the nodes to keep
Use attention mask to shrink the graph
Limitations of graph neural networks
Limitations of graph neural networks: under-reach
Image adapted from Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks
Require k graph layers to share information k edges away
Cannot learn long distance relationships
i
Layer 1
Layer 2
Limitations of graph neural networks: over-smoothing
1
3
2
4
Input graph
0.1 | 7.8 | 3.4 | 2.2 |
9.4 | 1.1 | 4.1 | 8.7 |
8.3 | 2.1 | 1.5 | 3.3 |
3.3 | 4.0 | 5.6 | 2.9 |
6.9 | 5.8 | 2.3 | 4.7 |
7.6 | 4.1 | 2.3 | 5.7 |
8.2 | 2.3 | 1.7 | 3.6 |
4.4 | 3.5 | 4.1 | 2.8 |
6.1 | 3.2 | 3.0 | 7.8 |
6.1 | 3.2 | 3.0 | 7.8 |
6.1 | 3.2 | 3.0 | 7.8 |
6.1 | 3.2 | 3.0 | 7.8 |
GNN layer 1
GNN layer 50
GNN layer 2
…
1
2
3
4
Limitations of graph neural networks: over-squashing
Also known as bottlenecks
Image from Alon and Yahav ICLR 2021
Label green node from {A, B, C} based on number of neighbors
General graph
Limitations of graph neural networks: over-squashing
Images from Alon and Yahav ICLR 2021
Accuracy drops for r > 4 or 5
Evaluate different graph neural network variants
Information from all leaves must be aggregated in top node’s representation
Test case: binary tree with depth r, predict label of root
Limitations of graph neural networks: expressiveness
Image from PubChem
Image from GraphGPS blog post
Decalin molecule
Molecular graph
Standard graph neural network representation
Ideal representation
Biological applications
Predicting quantitative protein function
Predicting quantitative protein function
Gelman et al. 2021 doi:10.1073/pnas.2104878118
Predicting protein function
Predicting epigenetic state
Predicting polypharmacy side effects
Predicting drug properties
Does this chemical bind an important protein domain and affect bioactivity?
Predicting drug properties
Conclusions
Recent GNN extensions