1 of 49

Graph convolutional neural networks

Oct 6^th, 2022

BMI/CS 775 Computational Network Biology�Fall 2022

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 49

Topics in this section

Representation learning on graphs
Graph neural networks
Graph transformers
Generative graph models

3 of 49

Goals for today

Neural network definitions and appeal

Fully connected neural networks
Convolutional neural networks
Graph convolutional neural networks

Problem settings with graph convolutional neural networks

Node or edge classification
Graph classification
Node, edge, or graph embedding

Biological applications

Protein-protein interaction interface prediction
Essential gene prediction

4 of 49

Power of deep learning on graphs

AlphaFold: a solution to a 50-year-old grand challenge in biology

“A folded protein can be thought of as a “spatial graph”, where residues are the nodes and edges connect the residues in close proximity… AlphaFold, used at CASP14, we created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building.”

5 of 49

Supervised graph prediction task

Given:

Graph structure G_i = (V_i, E_i)
Optional features on the nodes V_i
Optional features on the edges E_i

Do:

Predict a label y_i for the graph
Can be discrete (classification) or continuous (regression)

Goal:

Use the node relationships to influence the predictions
Avoid deciding how to represent a graph structure or graph-graph similarity in advance

6 of 49

One advantage of neural networks

Many supervised learning problems require non-linear decision boundaries
Example of explicit feature engineering

x	y	sin(x*y)	x*y	x²*y²	label
-1.2	-2.3	0.37	2.76	7.62	True
0.6	2.1	0.95	1.26	1.59	True
2.8	-0.7	-0.92	-1.96	3.84	False
-1.9	1.8	0.27	-3.42	11.69	False

7 of 49

One advantage of neural networks

Many supervised learning problems require non-linear decision boundaries
Example of explicit feature engineering

Neural networks can approximate these (and other) functions
Do not specify kernels or feature transformations in advance
Learn encoding from training data

x	y	sin(x*y)	x*y	x²*y²	label
-1.2	-2.3	0.37	2.76	7.62	True
0.6	2.1	0.95	1.26	1.59	True
2.8	-0.7	-0.92	-1.96	3.84	False
-1.9	1.8	0.27	-3.42	11.69	False

8 of 49

How do neural networks learn feature transformations?

Combine many simple non-linear transformations

Neuron

Actual neural network

Artificial neural network

(perceptron)

Perceptron

9 of 49

How a perceptron works

-1

-2

3

2

Fire if weighted sum > 1

1

0

1

Weighted sum = 1*-1 + 0*-2 + 1*3 + 1*2 = 4

1

(fire)

Inputs

Weights

10 of 49

Learning perceptron weights (parameters)

w_x

3

True if sum > 1

x

0

1

y

?

w_y

We need to choose weights w_x and w_y

Inputs

Weighted sum = ?

11 of 49

Learning perceptron weights

-1

3

True if sum > 1

x

0

1

y

?

3

Inputs

Weighted sum = ?

Random guess: w_x= -1 and w_y= 3

12 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

?

3

Inputs

Weighted sum = ?

Random guess: w_x= -1 and w_y= 3

x	y	label
1	1	False

Training data

13 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

?

3

Inputs

Random guess: w_x= -1 and w_y= 3

x	y	label
1	1	False

Training data

Weighted sum = 1*-1 + 1*3 = 2 (True)

14 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

?

1

Inputs

x	y	label
1	1	False

Training data

Weighted sum = ?

Update our guess: w_x= -1 and w_y= 1

15 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

?

1

Inputs

x	y	label
1	1	False

Training data

Update our guess: w_x= -1 and w_y= 1

Weighted sum = 1*-1 + 1*1 = 0 (False)

16 of 49

Learning perceptron weights

-1

3

True if sum > 1

9

0

1

4

?

1

Inputs

x	y	label
1	1	False
9	4	True

Training data

Weighted sum = ?

Does it work for another example?

17 of 49

Learning perceptron weights

-1

3

True if sum > 1

9

0

1

4

?

1

Inputs

x	y	label
1	1	False
9	4	True

Training data

Weighted sum = 9*-1 + 4*1 = -5 (False)

Does it work for another example?

18 of 49

Perceptron observations

Random guessing is fine for initialization
Need a way to formally update weights to fit training data

Use gradient of objective function

Not yet a non-linear classifier

Add activation functions
Combine many different perceptrons

19 of 49

Activation functions

What makes the neuron “fire”?

Step function

Sigmoid function

Rectified linear unit (ReLU)

Images from Wikipedia: Activation function

20 of 49

From perceptrons to fully connected neural networks

Try varying weights, hidden units, and layers

What patterns can you learn with 0 hidden layers?
1 hidden layer?
More?

21 of 49

From perceptrons to fully connected neural networks

If we stack perceptrons, we get a fully connected neural network
Has to learn weights independently for each input feature (e.g. node)
Not using edges of input graph

Input graph

Hidden layers

(perceptrons)

Prediction

22 of 49

Some datasets have special structure

1D sequences

2D images

A 1 0 0 0 1 0 0 0 0 0

C 0 0 0 1 0 0 0 0 1 0

G 0 1 1 0 0 1 0 1 0 1

T 0 0 0 0 0 0 1 0 0 0

A G G C A G T G C G

AG pattern repeats

GC pattern repeats

Bowling ball pattern in different locations (image 1, 2, 3)

One hot encoding of DNA sequence

23 of 49

Some datasets have special structure

1D sequences

Convolutional neural networks
Share parameters instead of trying to recognize patterns independently in each position

A 1 0 0 0 1 0 0 0 0 0

C 0 0 0 1 0 0 0 0 1 0

G 0 1 1 0 0 1 0 1 0 1

T 0 0 0 0 0 0 1 0 0 0

A G G C A G T G C G

AG pattern repeats

GC pattern repeats

w w

8 independent parameters

24 of 49

Adapting convolutions for graphs

Image from Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks

2D images

Look for common patterns

in local regions

Graphs

Look for common patterns

node neighborhoods

25 of 49

Challenge for convolutions on graphs

2D images

Fixed number of pixels

Graphs

Graph size varies

Node degree varies

One solution is to share parameters for each neighbor

26 of 49

Graph convolution definitions

27 of 49

Basic graph convolutional neural network

28 of 49

Graph convolution example

1.1
-0.3
2.1

3.0
0.8
-2.6

1.9
2.4
0.1

-0.9
-1.4
-2.3

0.3
3.2
2.7

1	0	1
-1	0	1

1	2	0
-1	1	1

29 of 49

Graph convolution example

3.0
0.8
-2.6

1.9
2.4
0.1

-0.9
-1.4
-2.3

0.3
3.2
2.7

1	2	0
-1	1	1

1	2	0
-1	1	1

1	2	0
-1	1	1

1	2	0
-1	1	1

4.6
-4.8

6.7
0.6

-3.7
-2.8

6.7
5.6

30 of 49

Graph convolution example

4.6
-4.8

6.7
0.6

-3.7
-2.8

6.7
5.6

3.6
-1.4

Elementwise average

31 of 49

Graph convolution example

1.1
-0.3
2.1

1	0	1
-1	0	1

3.2
1.0

32 of 49

Graph convolution example

3.6
-1.4

3.2
1.0

6.8
-0.4

6.8
-0.4

6.8
0

Apply ReLU

Interactive examples in Distill GNN articles by Sanchez-Lengeling et al. and Daigavane et al.

33 of 49

Multiple graph convolutional layers

Image from Gligorijevic et al. 2019 bioRxiv 786236

Stack K graph convolutional layers

34 of 49

What type of prediction to make?

35 of 49

Aggregation or readout layer

Can follow by multilayer perceptron (fully connected)

Image from Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks

36 of 49

37 of 49

Example of biological graph convolutional neural networks

Many network algorithms require accurate protein-protein interaction networks
Experimental networks are incomplete, many approaches to predict interactions

Related problem is predicting which parts of interacting proteins directly interface
Can frame protein-protein interface prediction as edge prediction

Represent each protein as a graph
Predict class label for a pair of residues

38 of 49

Protein Interface Prediction using Graph Convolutional Networks

Red: ligand

Blue: receptor

Yellow: interface

Image from Fout et al. NIPS 2017

Predict this: interacting residues

39 of 49

Protein Interface Prediction using Graph Convolutional Networks

Pair representations of residue x from graph 1 and residue y from graph 2

Image from Fout et al. NIPS 2017

40 of 49

Representing proteins as a graph

For each residue, search within fixed radius in 3D space
Or take k nearest residues
Define contact map (adjacency matrix)

Image from Gligorijevic et al. 2019 bioRxiv 786236, reviewed by Fasoulis et al . 2021

41 of 49

Featurizing protein graphs

Image from Fout et al. NIPS 2017

42 of 49

Evaluating protein interface prediction

Benchmark with Docking Benchmark Dataset
Consider multiple graph neural networks
Best model with no graph convolution: 0.812 AUC
Best model with graph convolutions: 0.891 AUC
3 graph convolution layers better than 4

43 of 49

Predicting gene essentiality

Yeast has approximately 6,000 genes
Roughly 20% are essential, or required for growth
Deletion strain is inviable

Gene 1

Gene 2

Non-essential gene

Essential gene

44 of 49

Predicting gene essentiality

Relationship between yeast gene essentiality and centrality in protein interaction network

Image from Jeong et al. 2001 Nature

Red: lethal

Orange: slow-growth

Green: non-lethal

Yellow: unknown

45 of 49

Predicting gene essentiality

Can we predict yeast gene essentiality?
Node classification problem
Graph structure: protein-protein interactions
Node features: gene expression data, other phenotypes, etc.

EPGAT: Gene Essentiality Prediction With Graph Attention Networks (Schapke et al . 2021)

46 of 49

Conclusions

Graph convolutions generalize 1D sequence and 2D image convolutions

Take advantage of parameter sharing and invariances

Basic operation: get neighbors’ features, multiply by weight vector(s) to transform, aggregate, non-linear activation
One framework adapts to multiple problem settings

Unsupervised or supervised
Node, edge, or graph tasks

Unlike graph kernels customize (learn parameters) from training data

47 of 49

Resources

2021 lecture slides on extensions to graph neural networks
Wu et al. 2019 A Comprehensive Survey on Graph Neural Networks

Excellent slides by Jack Lanchantin

PyTorch Geometric implementations of dozens of graph neural networks
Stanford Network Analysis Project with tutorials, implementations, biological datasets
DeepChem with biochemistry datasets and graph neural network implementations
Deep Graph Library
CS 540 Fall 2020 slides on neural networks
Distill GNN articles by Sanchez-Lengeling et al. and Daigavane et al.
Review Biological network analysis with deep learning

48 of 49

Kernels for graph classification

A kernel k is a function that maps pairs of objects to a real-valued space
It enables one to define very general similarity functions between pairs of objects
Let denote a set of objects of a particular type

Such as an entire graph!

A kernel k is defined as

Slide from Prof. Sushmita Roy

49 of 49

Kernels for graph classification

Given a kernel function k we can solve the graph classification task
Many type of graph kernels

Shortest path-based
Random walk-based
Graph featurization-based
Kriege et al. 2019 A Survey on Graph Kernels

Support vector machine and other approaches directly applicable after defining kernel
Possible limitation: decide type of kernel in advance