1 of 49

Graph convolutional neural networks

Oct 6th, 2022

BMI/CS 775 Computational Network Biology�Fall 2022

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

2 of 49

Topics in this section

  • Representation learning on graphs
  • Graph neural networks
  • Graph transformers
  • Generative graph models

3 of 49

Goals for today

  • Neural network definitions and appeal
    • Fully connected neural networks
    • Convolutional neural networks
    • Graph convolutional neural networks
  • Problem settings with graph convolutional neural networks
    • Node or edge classification
    • Graph classification
    • Node, edge, or graph embedding
  • Biological applications
    • Protein-protein interaction interface prediction
    • Essential gene prediction

4 of 49

Power of deep learning on graphs

    • “A folded protein can be thought of as a “spatial graph”, where residues are the nodes and edges connect the residues in close proximity… AlphaFold, used at CASP14, we created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building.”

5 of 49

Supervised graph prediction task

  • Given:
    • Graph structure Gi = (Vi , Ei )
    • Optional features on the nodes Vi
    • Optional features on the edges Ei
  • Do:
    • Predict a label yi for the graph
    • Can be discrete (classification) or continuous (regression)

  • Goal:
    • Use the node relationships to influence the predictions
    • Avoid deciding how to represent a graph structure or graph-graph similarity in advance

6 of 49

One advantage of neural networks

  • Many supervised learning problems require non-linear decision boundaries
  • Example of explicit feature engineering

x

y

sin(x*y)

x*y

x2*y2

label

-1.2

-2.3

0.37

2.76

7.62

True

0.6

2.1

0.95

1.26

1.59

True

2.8

-0.7

-0.92

-1.96

3.84

False

-1.9

1.8

0.27

-3.42

11.69

False

7 of 49

One advantage of neural networks

  • Many supervised learning problems require non-linear decision boundaries
  • Example of explicit feature engineering

  • Neural networks can approximate these (and other) functions
  • Do not specify kernels or feature transformations in advance
  • Learn encoding from training data

x

y

sin(x*y)

x*y

x2*y2

label

-1.2

-2.3

0.37

2.76

7.62

True

0.6

2.1

0.95

1.26

1.59

True

2.8

-0.7

-0.92

-1.96

3.84

False

-1.9

1.8

0.27

-3.42

11.69

False

8 of 49

How do neural networks learn feature transformations?

  • Combine many simple non-linear transformations

Actual neural network

Artificial neural network

(perceptron)

9 of 49

How a perceptron works

-1

-2

3

2

Fire if weighted sum > 1

1

0

1

1

Weighted sum = 1*-1 + 0*-2 + 1*3 + 1*2 = 4

1

(fire)

Inputs

Weights

10 of 49

Learning perceptron weights (parameters)

wx

3

True if sum > 1

x

0

1

y

?

wy

We need to choose weights wx and wy

Inputs

Weighted sum = ?

11 of 49

Learning perceptron weights

-1

3

True if sum > 1

x

0

1

y

?

3

Inputs

Weighted sum = ?

Random guess: wx= -1 and wy= 3

12 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

1

?

3

Inputs

Weighted sum = ?

Random guess: wx= -1 and wy= 3

x

y

label

1

1

False

Training data

13 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

1

?

3

Inputs

Random guess: wx= -1 and wy= 3

x

y

label

1

1

False

Training data

Weighted sum = 1*-1 + 1*3 = 2 (True)

14 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

1

?

1

Inputs

x

y

label

1

1

False

Training data

Weighted sum = ?

Update our guess: wx= -1 and wy= 1

15 of 49

Learning perceptron weights

-1

3

True if sum > 1

1

0

1

1

?

1

Inputs

x

y

label

1

1

False

Training data

Update our guess: wx= -1 and wy= 1

Weighted sum = 1*-1 + 1*1 = 0 (False)

16 of 49

Learning perceptron weights

-1

3

True if sum > 1

9

0

1

4

?

1

Inputs

x

y

label

1

1

False

9

4

True

Training data

Weighted sum = ?

Does it work for another example?

17 of 49

Learning perceptron weights

-1

3

True if sum > 1

9

0

1

4

?

1

Inputs

x

y

label

1

1

False

9

4

True

Training data

Weighted sum = 9*-1 + 4*1 = -5 (False)

Does it work for another example?

18 of 49

Perceptron observations

  • Random guessing is fine for initialization
  • Need a way to formally update weights to fit training data
    • Use gradient of objective function
  • Not yet a non-linear classifier
    • Add activation functions
    • Combine many different perceptrons

19 of 49

Activation functions

  • What makes the neuron “fire”?
    • Step function

    • Sigmoid function

    • Rectified linear unit (ReLU)

20 of 49

From perceptrons to fully connected neural networks

  • Try varying weights, hidden units, and layers
    • What patterns can you learn with 0 hidden layers?
    • 1 hidden layer?
    • More?

21 of 49

From perceptrons to fully connected neural networks

  • If we stack perceptrons, we get a fully connected neural network
  • Has to learn weights independently for each input feature (e.g. node)
  • Not using edges of input graph

Input graph

Hidden layers

(perceptrons)

Prediction

22 of 49

Some datasets have special structure

  • 1D sequences

  • 2D images

A 1 0 0 0 1 0 0 0 0 0

C 0 0 0 1 0 0 0 0 1 0

G 0 1 1 0 0 1 0 1 0 1

T 0 0 0 0 0 0 1 0 0 0

A G G C A G T G C G

AG pattern repeats

GC pattern repeats

Bowling ball pattern in different locations (image 1, 2, 3)

One hot encoding of DNA sequence

23 of 49

Some datasets have special structure

  • 1D sequences

  • Convolutional neural networks
  • Share parameters instead of trying to recognize patterns independently in each position

A 1 0 0 0 1 0 0 0 0 0

C 0 0 0 1 0 0 0 0 1 0

G 0 1 1 0 0 1 0 1 0 1

T 0 0 0 0 0 0 1 0 0 0

A G G C A G T G C G

AG pattern repeats

GC pattern repeats

w w

w w

w w

w w

8 independent parameters

24 of 49

Adapting convolutions for graphs

2D images

Look for common patterns

in local regions

Graphs

Look for common patterns

node neighborhoods

25 of 49

Challenge for convolutions on graphs

2D images

Fixed number of pixels

Graphs

Graph size varies

Node degree varies

One solution is to share parameters for each neighbor

26 of 49

Graph convolution definitions

 

27 of 49

Basic graph convolutional neural network

 

 

 

28 of 49

Graph convolution example

 

 

 

 

1.1

-0.3

2.1

 

 

3.0

0.8

-2.6

 

1.9

2.4

0.1

 

-0.9

-1.4

-2.3

 

0.3

3.2

2.7

 

1

0

1

-1

0

1

 

1

2

0

-1

1

1

 

29 of 49

Graph convolution example

 

 

 

3.0

0.8

-2.6

 

1.9

2.4

0.1

 

-0.9

-1.4

-2.3

 

0.3

3.2

2.7

 

1

2

0

-1

1

1

1

2

0

-1

1

1

1

2

0

-1

1

1

1

2

0

-1

1

1

 

 

 

 

4.6

-4.8

6.7

0.6

-3.7

-2.8

6.7

5.6

30 of 49

Graph convolution example

 

 

 

 

 

 

 

 

4.6

-4.8

6.7

0.6

-3.7

-2.8

6.7

5.6

 

 

 

3.6

-1.4

 

Elementwise average

31 of 49

Graph convolution example

 

 

 

1.1

-0.3

2.1

 

1

0

1

-1

0

1

 

3.2

1.0

32 of 49

Graph convolution example

 

 

 

3.6

-1.4

 

 

3.2

1.0

 

 

6.8

-0.4

 

6.8

-0.4

 

6.8

0

Apply ReLU

 

 

 

Interactive examples in Distill GNN articles by Sanchez-Lengeling et al. and Daigavane et al.

33 of 49

Multiple graph convolutional layers

Image from Gligorijevic et al. 2019 bioRxiv 786236

 

 

 

Stack K graph convolutional layers

34 of 49

What type of prediction to make?

  •  

35 of 49

Aggregation or readout layer

Can follow by multilayer perceptron (fully connected)

 

36 of 49

Top Hat question

37 of 49

Example of biological graph convolutional neural networks

  • Many network algorithms require accurate protein-protein interaction networks
  • Experimental networks are incomplete, many approaches to predict interactions

  • Related problem is predicting which parts of interacting proteins directly interface
  • Can frame protein-protein interface prediction as edge prediction
    • Represent each protein as a graph
    • Predict class label for a pair of residues

38 of 49

Protein Interface Prediction using Graph Convolutional Networks

Red: ligand

Blue: receptor

Yellow: interface

Image from Fout et al. NIPS 2017

Predict this: interacting residues

39 of 49

Protein Interface Prediction using Graph Convolutional Networks

Pair representations of residue x from graph 1 and residue y from graph 2

Image from Fout et al. NIPS 2017

40 of 49

Representing proteins as a graph

  • For each residue, search within fixed radius in 3D space
  • Or take k nearest residues
  • Define contact map (adjacency matrix)

Image from Gligorijevic et al. 2019 bioRxiv 786236, reviewed by Fasoulis et al. 2021

41 of 49

Featurizing protein graphs

Image from Fout et al. NIPS 2017

42 of 49

Evaluating protein interface prediction

  • Benchmark with Docking Benchmark Dataset
  • Consider multiple graph neural networks
  • Best model with no graph convolution: 0.812 AUC
  • Best model with graph convolutions: 0.891 AUC
  • 3 graph convolution layers better than 4

43 of 49

Predicting gene essentiality

  • Yeast has approximately 6,000 genes
  • Roughly 20% are essential, or required for growth
  • Deletion strain is inviable

Gene 1

Gene 2

Non-essential gene

Essential gene

44 of 49

Predicting gene essentiality

  • Relationship between yeast gene essentiality and centrality in protein interaction network

Image from Jeong et al. 2001 Nature

Red: lethal

Orange: slow-growth

Green: non-lethal

Yellow: unknown

45 of 49

Predicting gene essentiality

  • Can we predict yeast gene essentiality?
  • Node classification problem
  • Graph structure: protein-protein interactions
  • Node features: gene expression data, other phenotypes, etc.

  • EPGAT: Gene Essentiality Prediction With Graph Attention Networks (Schapke et al. 2021)

46 of 49

Conclusions

  • Graph convolutions generalize 1D sequence and 2D image convolutions
    • Take advantage of parameter sharing and invariances
  • Basic operation: get neighbors’ features, multiply by weight vector(s) to transform, aggregate, non-linear activation
  • One framework adapts to multiple problem settings
    • Unsupervised or supervised
    • Node, edge, or graph tasks
  • Unlike graph kernels customize (learn parameters) from training data

47 of 49

Resources

48 of 49

Kernels for graph classification

  • A kernel k is a function that maps pairs of objects to a real-valued space
  • It enables one to define very general similarity functions between pairs of objects
  • Let denote a set of objects of a particular type
    • Such as an entire graph!
  • A kernel k is defined as

Slide from Prof. Sushmita Roy

49 of 49

Kernels for graph classification

  • Given a kernel function k we can solve the graph classification task
  • Many type of graph kernels
    • Shortest path-based
    • Random walk-based
    • Graph featurization-based
    • Kriege et al. 2019 A Survey on Graph Kernels
  • Support vector machine and other approaches directly applicable after defining kernel
  • Possible limitation: decide type of kernel in advance