1 of 40

Graph convolutional neural networks: extensions and limitations

Oct 3^rd, 2024

BMI/CS 775 Computational Network Biology�Fall 2024

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 40

Topics in this section

Graph neural networks
Graph neural network extensions and limitations
Graph transformers
Generative graph models

3 of 40

Goals for today

Graph neural network extensions

Edge features
Alternative ways to handle variable node degree
Graph attention
Pooling

Limitations of graph neural networks
More biological applications

Protein function prediction
Epigenomics
Drug side effects
Drug discovery

4 of 40

Basic graph convolutional neural network

Assumptions made in graph convolution layer

No edge features
All neighbors should be treated in the same way (shared weight matrix)
All neighbors are equally important
Do not need to aggregate information until the end

5 of 40

Edge features in graph convolutions

Recall protein graph had edge features
Minor modification makes it easy to update in first graph convolution layer
Unlike node features, harder to update in later layers

Image from Fout et al. NIPS 2017

6 of 40

Unique neighbor weights in graph convolutions

Have been using the same weight vector for all neighbors
Can generalize if we can sort neighbors

In protein interface prediction, can sort by residue-residue distance

7 of 40

Protein interface prediction results

More complex network architectures did not have a big impact

Graph architecture	Optimal graph layers	AUC
None	1	0.812
Node weights	3	0.891
Node and edge weights	2	0.898
Unique node weights and edge weights	3	0.891

8 of 40

GraphSAGE: alternative approach for graphs with variable degree

9 of 40

GraphSAGE: alternative approach for graphs with variable degree

Image from Hamilton et al. NIPS 2017

Sampling naturally handles variable degree

Defined multiple aggregation strategies in addition to mean

10 of 40

Which neighbors are important

Graph convolutional neural networks give each neighbor the same influence
What if some neighbors are more important?
How do we know which neighbors are important in advance?

Attention is a deep learning approach to address this

Originally developed for sequence models
Key idea: learn which neighbors are important

11 of 40

Graph Attention Networks

5

1

3

2

4

All edges have equal importance

5

1

3

2

4

Edges have varying importance

12 of 40

Calculating attention in graphs

Image from Veličković et al. ICLR 2018

Use a simple neural network to calculate attention

13 of 40

Calculating attention in graphs

Image from Veličković et al. ICLR 2018

Can have parallel kinds of attention (multiple heads): blue, green, purple

Concatenate or aggregate them

14 of 40

Calculating attention in graphs

Need to normalize attention over neighbors

Use softmax function

Before normalization

After normalization

15 of 40

Graph Attention Networks: putting it all together

Simplified notation from Zhang et al. in DGL docs

Transform current representation

Calculate edge attention as function of node representations

Normalize edge attentions

Use edge attentions in node updates

Concatenation

16 of 40

17 of 40

Partial graph attention example

1.1
-0.3
2.1

3.0
0.8
-2.6

1.9
2.4
0.1

-0.9
-1.4
-2.3

0.3
3.2
2.7

1	0	-1
-1	0	0
1	0	1

-1
1
0
1
-1
0

1

2

3

4

5

6

7

8

9

10

Examine partial graph attention calculations for nodes 1 and 10

-3.5
-4.4
1.2

3.0
0.8
-2.6

0.2
2.9
-1.5

3.1
-1.0
-2.7

-1.1
2.2
-3.3

Nodes 2 and 7 have same initial features

18 of 40

Partial graph attention example

1.1
-0.3
2.1

3.0
0.8
-2.6

1	0	-1
-1	0	0
1	0	1

1

2

3

4

5

6

7

8

9

10

3.0
0.8
-2.6

-1.1
2.2
-3.3

1	0	-1
-1	0	0
1	0	1

1	0	-1
-1	0	0
1	0	1

1	0	-1
-1	0	0
1	0	1

Only showing some of the calculations

-1
-1.1
3.2

5.6
-3
0.4

2.2
1.1
-4.4

5.6
-3
0.4

19 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

-1
-1.1
3.2

5.6
-3
0.4

2.2
1.1
-4.4

5.6
-3
0.4

-1
1
0
1
-1
0

-1
1
0
1
-1
0

20 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

https://paperswithcode.com/method/leaky-relu

-1
-1.1
3.2

5.6
-3
0.4

2.2
1.1
-4.4

5.6
-3
0.4

-1
1
0
1
-1
0

-1
1
0
1
-1
0

21 of 40

Partial graph attention example

1

2

3

4

5

6

7

8

9

10

22 of 40

Pooling

In 1D or 2D convolutional neural networks, pooling is common
Search for patterns in small regions, then aggregate, repeat

Image source

Initially recognize holes in the ball

Then recognize configurations of holes

…

23 of 40

Pooling

Pooling is important in graphs as well but challenging

No pre-defined grid structure
Can make computation more efficient
Could help generalize across graphs

Consider how to learn the pooling

Self-attention graph pooling
Lee et al. 2019 arXiv:1904.08082

24 of 40

Self-attention graph pooling

Image from Lee et al. 2019 arXiv:1904.08082

Compute attention with graph convolution

Select top k fraction of the nodes to keep

Use attention mask to shrink the graph

25 of 40

Limitations of graph neural networks

Empirical and theoretical studies of classic graph neural networks revealed limitations:

Under-reach
Over-smoothing
Over-squashing
Expressiveness

26 of 40

Limitations of graph neural networks: under-reach

Image adapted from Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks

Require k graph layers to share information k edges away

Cannot learn long distance relationships

i

Layer 1

Layer 2

27 of 40

Limitations of graph neural networks: over-smoothing

1

3

2

4

Input graph

0.1	7.8	3.4	2.2

9.4	1.1	4.1	8.7

8.3	2.1	1.5	3.3

3.3	4.0	5.6	2.9

6.9	5.8	2.3	4.7

7.6	4.1	2.3	5.7

8.2	2.3	1.7	3.6

4.4	3.5	4.1	2.8

6.1	3.2	3.0	7.8

6.1	3.2	3.0	7.8

6.1	3.2	3.0	7.8

6.1	3.2	3.0	7.8

GNN layer 1

GNN layer 50

GNN layer 2

…

1

2

3

4

28 of 40

Limitations of graph neural networks: over-squashing

Also known as bottlenecks

Image from Alon and Yahav ICLR 2021

Label green node from {A, B, C} based on number of neighbors

General graph

29 of 40

Limitations of graph neural networks: over-squashing

Images from Alon and Yahav ICLR 2021

Accuracy drops for r > 4 or 5

Evaluate different graph neural network variants

Information from all leaves must be aggregated in top node’s representation

Test case: binary tree with depth r, predict label of root

30 of 40

Limitations of graph neural networks: expressiveness

Image from PubChem

Image from GraphGPS blog post

Decalin molecule

Molecular graph

Standard graph neural network representation

Ideal representation

31 of 40

Biological applications

Initially popular in chemistry / biochemistry
Gained traction in many other areas
Biological network analysis with deep learning

Key considerations:

Unsupervised / supervised
Node / edge / graph predictions
What varies? Node features, graph structure, etc.

32 of 40

Predicting quantitative protein function

Input: Wild type protein structure, amino acid sequence
Output: Functional activity score
Gelman et al. 2021 doi:10.1073/pnas.2104878118

33 of 40

Predicting quantitative protein function

Importance of control experiments
Does graph neural network performance change with alternative graphs?

For these datasets, can often still get good performance with the control graphs
Requires fully connected layer after the graph layers

Gelman et al. 2021 doi:10.1073/pnas.2104878118

34 of 40

Predicting protein function

Input: Protein structure, residue (node) features
Output: Gene Ontology labels
Gligorijevic et al. 2021 Nature Communications

35 of 40

Predicting epigenetic state

Input: DNA sequence, Hi-C contact maps
Output: TF binding, histone modifications, accessibility
Lanchantin and Qi 2020 Bioinformatics

36 of 40

Predicting polypharmacy side effects

Input: 3 types of protein and drug interactions, node features
Output: Types of side effects for drug-drug pair
Zitnik et al. 2018 Bioinformatics

37 of 40

Predicting drug properties

Input: Chemical structure
Output: Chemical’s ability to inhibit a drug target
Chen et al. 2018 Drug Discovery Today

Does this chemical bind an important protein domain and affect bioactivity?

38 of 40

Predicting drug properties

IDG-DREAM Drug-Kinase Binding Prediction Challenge
Predict binding affinity between 25-70 compounds and ~200 kinases
One top performer used graph convolutional neural networks

Park et al. 2019 DMIS_DK Submission
Tested multiple graph architectures in ensemble
Graph Attention Networks most important in ensemble but random forest also competitive
78 node features from RDKit: atom type, degree, bound Hs, valence, aromatic

39 of 40

Conclusions

Graph convolutional networks can be applied to many of the other biological tasks in this course
Can be advantages to coupling node representation with final prediction
Very flexible formulation, can tweak almost anything
Many hyperparameters, non-trivial to train
Limitations in how information is shared across graph
Critical to benchmark against simpler models

40 of 40

Recent GNN extensions

Wang and Cho 2024 Non-convolutional Graph Neural Networks

New approach to overcome some limitations of GNNs we won’t have time to discuss

Finkelshtein 2024 Cooperative Graph Neural Networks

GNN extension where nodes play a game and select among different actions during message passing