1 of 85

Explaining Graph Neural Networks

Fabrizio Silvestri

Artificial Intelligence for Biomedical Applications - AI4BA 2024

25 Giugno 2024

2 of 85

Introduction to Machine Learning

Where we recap, quickly, what ML is and why it is important…

AI4BA 2024 - June 25, 2024

3 of 85

Computer Programming

  • Computers are designed to be programmed by humans in order to solve a task/problem quicker and better than humans
  • Example:
    • Task/Problem: Find the maximum element of a list of 1 million unsorted numbers
    • Solution/Algorithm: Scan all the numbers in the list and keep track of the largest found "so far"
    • Code/Program: Encode the algorithm above into one specific programming language (e.g., C/C++, Java, Python)

AI4BA 2024 - June 25, 2024

4 of 85

Computer Programming

AI4BA 2024 - June 25, 2024

010011011

110111001

010110001

Encoding of

Computable Function f

Solution/Algorithm

explicitly designed by human

Problem

5 of 85

Computer Programming

AI4BA 2024 - June 25, 2024

Y = f(X)

Output

Y

01001101

01011001

Input

X

f

6 of 85

Can We Always Do That?

AI4BA 2024 - June 25, 2024

7 of 85

Chihuahua or Muffin?

AI4BA 2024 - June 25, 2024

8 of 85

Chihuahua or Muffin?

AI4BA 2024 - June 25, 2024

9 of 85

Well…

AI4BA 2024 - June 25, 2024

10 of 85

Well…

AI4BA 2024 - June 25, 2024

11 of 85

Well…

AI4BA 2024 - June 25, 2024

12 of 85

Programming vs. Learning

  • There exist some problems like the chihuahua vs. muffin above which are too hard to be solved directly
  • Hard to design an algorithm which is general enough to capture all the nuances of the problem and gives the correct output for any input

AI4BA 2024 - June 25, 2024

Programming vs. "Training" a Computer

13 of 85

Machine Learning

AI4BA 2024 - June 25, 2024

Problem

Labeled Data

14 of 85

Machine Learning

AI4BA 2024 - June 25, 2024

Problem

Labeled Data

15 of 85

Machine Learning

AI4BA 2024 - June 25, 2024

Problem

Labeled Data

Solution/Algorithm

learned by computer from data: (input, output) pairs

16 of 85

Machine Learning

AI4BA 2024 - June 25, 2024

Problem

Labeled Data

01001101

11011100

01011000

Encoding of

Computable Function f

Solution/Algorithm

learned by computer from data: (input, output) pairs

17 of 85

Machine Learning

AI4BA 2024 - June 25, 2024

Problem

Labeled Data

01001101

11011100

01011000

Encoding of

Computable Function f

Solution/Algorithm

learned by computer from data: (input, output) pairs

Eventually, the function f is learned by the learning algorithm from a (large) set of labeled data

18 of 85

Machine Learning

  • A broad discipline concerned with how to teach machines to learn (i.e., extract knowledge) from data

AI4BA 2024 - June 25, 2024

19 of 85

Machine Learning

  • A broad discipline concerned with how to teach machines to learn (i.e., extract knowledge) from data
  • Here’s the two main definitions:

AI4BA 2024 - June 25, 2024

"The field of study that gives computers the ability to learn without being explicitly programmed"

Arthur Samuel

20 of 85

Machine Learning

  • A broad discipline concerned with how to teach machines to learn (i.e., extract knowledge) from data
  • Here’s the two main definitions:

AI4BA 2024 - June 25, 2024

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E"

Tom Mitchell

"The field of study that gives computers the ability to learn without being explicitly programmed"

Arthur Samuel

21 of 85

Machine Learning: A Taxonomy

AI4BA 2024 - June 25, 2024

22 of 85

Supervised Learning

AI4BA 2024 - June 25, 2024

23 of 85

Supervised Learning

Where we introduce the tasks that are amongst the most used …

AI4BA 2024 - June 25, 2024

24 of 85

Supervised Learning

AI4BA 2024 - June 25, 2024

25 of 85

The Pipeline

AI4BA 2024 - June 25, 2024

0. Be sure your problem needs actually to be tackled using Machine Learning techniques

(i.e., there is no point in adopting any fancy ML solution if it can be solved “directly”!)

26 of 85

The Pipeline

AI4BA 2024 - June 25, 2024

0. Be sure your problem needs actually to be tackled using Machine Learning techniques

(i.e., there is no point in adopting any fancy ML solution if it can be solved “directly”!)

1. Data collection: get data from your domain of interest

27 of 85

The Pipeline

AI4BA 2024 - June 25, 2024

0. Be sure your problem needs actually to be tackled using Machine Learning techniques

(i.e., there is no point in adopting any fancy ML solution if it can be solved “directly”!)

1. Data collection: get data from your domain of interest

2. Feature extraction: represent data in a “machine-friendly” format

28 of 85

The Pipeline

AI4BA 2024 - June 25, 2024

0. Be sure your problem needs actually to be tackled using Machine Learning techniques

(i.e., there is no point in adopting any fancy ML solution if it can be solved “directly”!)

1. Data collection: get data from your domain of interest

2. Feature extraction: represent data in a “machine-friendly” format

3. Model training: “build” one (or more) learning models

29 of 85

The Pipeline

AI4BA 2024 - June 25, 2024

0. Be sure your problem needs actually to be tackled using Machine Learning techniques

(i.e., there is no point in adopting any fancy ML solution if it can be solved “directly”!)

1. Data collection: get data from your domain of interest

2. Feature extraction: represent data in a “machine-friendly” format

3. Model training: “build” one (or more) learning models

4. Model selection/evaluation: pick the best-performing model according to some quality metrics

30 of 85

Feature Extraction

AI4BA 2024 - June 25, 2024

Collected data need to be encoded with a machine-readable format

Domain Objects

31 of 85

Feature Extraction

AI4BA 2024 - June 25, 2024

0.27

3.8

1.12

4.75

56.9

0.0

3.65

0.04

6.81

1.0

Collected data need to be encoded with a machine-readable format

Domain Objects

6.94

5.4

0.9

...

3.92

...

...

8.2

i-th object

Each domain object is translated into a n-dimensional vector of features

32 of 85

Feature Extraction

AI4BA 2024 - June 25, 2024

0.27

3.8

1.12

4.75

56.9

0.0

3.65

0.04

6.81

1.0

Collected data need to be encoded with a machine-readable format

Domain Objects

6.94

5.4

0.9

...

3.92

...

...

8.2

i-th object

Each domain object is translated into a n-dimensional vector of features

j-th feature

33 of 85

AI4BA 2024 - June 25, 2024

The Manifold Hypothesis

High dimensional data (e.g., images) lie on low-dimensional manifolds

(i.e., sub-space) embedded in the high-dimensional space

34 of 85

The Manifold Hypothesis

AI4BA 2024 - June 25, 2024

Random samples from 400-d space

True digits living in a

400-d space

35 of 85

The Manifold Hypothesis

AI4BA 2024 - June 25, 2024

36 of 85

Estimating Generalization Performance

  • More advanced techniques than simply train/test splitting
  • train/valid/test splitting:
    • Use train for learning the model, valid to tune its hyperparameters, and test for assessing its performance

AI4BA 2024 - June 25, 2024

37 of 85

How Much Data?

AI4BA 2024 - June 25, 2024

38 of 85

Model Zoo

AI4BA 2024 - June 25, 2024

KNN

K-means

LogReg

LinReg

MLP

LSTM

CNN

Random

Forest

SVM

GNN

DBSCAN

39 of 85

No Free Lunch Theorem

AI4BA 2024 - June 25, 2024

40 of 85

Deep Learning

Where we introduce (deep) neural networks and the features they learn …

AI4BA 2024 - June 25, 2024

41 of 85

Model Training

  • Learning f means "finding" another function h* which best approximates f using the data we observed
  • h* is chosen among a family of functions H called hypothesis space by specifying two components:
    • loss function: measures the error of using h* instead of the true f
    • learning algorithm: explores the hypothesis space to pick the function which minimizes the loss on the observed data

AI4BA 2024 - June 25, 2024

42 of 85

Hypothesis Space

  • The set of functions the learning algorithm will search through to pick the hypothesis h* which best approximates the true target f
  • The larger the hypothesis space:
    • the larger will be the set of functions that can be represented
    • the harder will be for the learning algorithm to pick h*

AI4BA 2024 - June 25, 2024

43 of 85

Hypothesis Space

  • The set of functions the learning algorithm will search through to pick the hypothesis h* which best approximates the true target f
  • The larger the hypothesis space:
    • the larger will be the set of functions that can be represented
    • the harder will be for the learning algorithm to pick h*

AI4BA 2024 - June 25, 2024

Trade-off

Put some constraints on H, e.g., limit the search space only to linear functions

44 of 85

Loss Functions

  • Measures the error we would make if a hypothesis h is used instead of the true (yet unknown) mapping f
  • It can be computed only on the data we observed, therefore depends on the hypothesis and the dataset

  • This in-sample error (a.k.a. empirical loss) is an estimate of the out-of-sample error (a.k.a. expected loss or risk)

AI4BA 2024 - June 25, 2024

45 of 85

The Learning Algorithm

  • Defines the strategy we use to search the hypothesis space H for picking our best hypothesis h*
  • Here, "best" means the hypothesis that minimizes the loss function on the observed data (Empirical Risk Minimization)
  • In other words, among all the hypotheses specified by H the learning algorithm will pick the one that minimizes L

AI4BA 2024 - June 25, 2024

46 of 85

Learning f as an optimization problem

  • We define the supervised learning problem as an optimization one
  • By plugging in different loss functions combined with various hypothesis spaces we must solve a specific optimization problem
  • Those choices are usually "mathematically convenient": e.g., convex objective functions are guaranteed to have a unique global minimum
  • Even though closed-form solutions to the optimization problem rarely exist, there are numerical methods which work: e.g., gradient descent

AI4BA 2024 - June 25, 2024

47 of 85

Gradient Descent

AI4BA 2024 - June 25, 2024

for t=0… (until convergence) {

}

48 of 85

Shallow Neural Networks

  • Shallow neural networks are functions y = f[x, ϕ] with parameters ϕ that map multivariate inputs x to multivariate outputs y.
  • We introduce the main ideas using an example network f[x, ϕ] that maps a scalar input x to a scalar output y and has ten parameters�ϕ = {ϕ0, ϕ1, ϕ2, ϕ3, θ10, θ11, θ20, θ21, θ30, θ31}:

AI4BA 2024 - June 25, 2024

49 of 85

Shallow Neural Networks

  • Shallow neural networks are functions y = f[x, ϕ] with parameters ϕ that map multivariate inputs x to multivariate outputs y.
  • We introduce the main ideas using an example network f[x, ϕ] that maps a scalar input x to a scalar output y and has ten parameters�ϕ = {ϕ0, ϕ1, ϕ2, ϕ3, θ10, θ11, θ20, θ21, θ30, θ31}:

AI4BA 2024 - June 25, 2024

50 of 85

Shallow Neural Networks

  • Shallow neural networks are functions y = f[x, ϕ] with parameters ϕ that map multivariate inputs x to multivariate outputs y.
  • We introduce the main ideas using an example network f[x, ϕ] that maps a scalar input x to a scalar output y and has ten parameters�ϕ = {ϕ0, ϕ1, ϕ2, ϕ3, θ10, θ11, θ20, θ21, θ30, θ31}:

AI4BA 2024 - June 25, 2024

a[-] activation func.

51 of 85

Shallow Neural Networks

  • Shallow neural networks are functions y = f[x, ϕ] with parameters ϕ that map multivariate inputs x to multivariate outputs y.
  • We introduce the main ideas using an example network f[x, ϕ] that maps a scalar input x to a scalar output y and has ten parameters�ϕ = {ϕ0, ϕ1, ϕ2, ϕ3, θ10, θ11, θ20, θ21, θ30, θ31}:

AI4BA 2024 - June 25, 2024

Linear layers

52 of 85

ReLU

  • It’s the most common choice

AI4BA 2024 - June 25, 2024

53 of 85

Shallow Neural Networks

  • More usually depicted like this

AI4BA 2024 - June 25, 2024

54 of 85

Shallow Neural Networks

  • Even better like this…

AI4BA 2024 - June 25, 2024

55 of 85

Deep Neural Networks

AI4BA 2024 - June 25, 2024

  • The parameters ϕ of this model comprise all of these weight matrices and bias vectors ϕ = {βk, k}k=0K
  • If the k-th layer has Dk hidden units, then the bias vector βk−1 will be of size Dk.
  • The last bias vector βK has the size Do of the output.
  • The first weight matrix 0 has size D1 × Di where Di is the size of the input.
  • The last weight matrix K is Do × DK, and the remaining matrices k are Dk+1 × Dk

56 of 85

Example: 3 Hidden Layer NN

AI4BA 2024 - June 25, 2024

57 of 85

Decision Boundary

AI4BA 2024 - June 25, 2024

58 of 85

The Effect of Depth in Learning

AI4BA 2024 - June 25, 2024

59 of 85

Inductive Biases

AI4BA 2024 - June 25, 2024

1920x1080

60 of 85

Inductive Biases

  • Imagine we have an input of size Di = 6,220,800
  • Imagine we want to process this with a 3-layer MLP with each hidden layer of sizes respectively D1= 512, D2= 1,024, D3= 512
  • Imagine we have a single output, Do=1
  • Assume we use bias
  • How many parameters, i.e. |ϕ|, does our 3-layer MLP use?
  • Solution: 3,186,100,736 parameters
  • If each parameter is stored in a floating point we will need just to store the model:�12,744,402,944 bytes, i.e., ~13GB!!!!!!!

AI4BA 2024 - June 25, 2024

61 of 85

Graph Learning

Where we show an important inductive bias in Neural Network design …

AI4BA 2024 - June 25, 2024

62 of 85

Data vs. Architectures

AI4BA 2024 - June 25, 2024

63 of 85

Data vs. Architectures

AI4BA 2024 - June 25, 2024

64 of 85

Data vs. Architectures

AI4BA 2024 - June 25, 2024

65 of 85

Data vs. Architectures

AI4BA 2024 - June 25, 2024

66 of 85

Data vs. Architectures

AI4BA 2024 - June 25, 2024

node

edge

67 of 85

Graph-shaped Data

AI4BA 2024 - June 25, 2024

68 of 85

Graph-shaped Data

AI4BA 2024 - June 25, 2024

69 of 85

Graph-shaped Data

AI4BA 2024 - June 25, 2024

70 of 85

Graph-shaped Data

AI4BA 2024 - June 25, 2024

71 of 85

Some Notation

  • Graph G=(V,E) → Vertexes, Edges
  • W ∈ ℝ|V| x |V| → adjacency matrix
  • Goal: learn low-dimensional vector representations {Zi}i∈[|V|] (embeddings) for nodes in the graph {vi}i∈[|V|], such that important graph properties (e.g., local or global structure) are preserved in the embedding space.
    • For example, if two nodes have similar connections in G they should be close in the embedding space.
  • Let Z ∈ ℝ|V| x d denote the node embedding matrix (d << |V|).

AI4BA 2024 - June 25, 2024

72 of 85

Node Features

  • A vertex (edge) field is a function defined on vertices (edges)
    • f: V → ℝ
    • F: E → ℝ
  • Graphs may have node attributes (gender, age, etc.) which can be represented as multiple vertex field, commonly referred to as node features.
    • Node features are denoted as X∈ ℝ|V| x d0
      • d0 is the input feature dimension

AI4BA 2024 - June 25, 2024

73 of 85

Node Representation with Node Features

  • Given
    • An adjacency matrix W ∈ ℝ|V| x |V|
    • a node feature matrix X ∈ ℝ|V| x d0
  • Goal: learn a mapping W, X → Z.
    • We learn a mapping W → Z, when no node features are available (see slide above).
  • Where, Z ∈ ℝ|V| x d denote the node embedding matrix (d << |V|)

AI4BA 2024 - June 25, 2024

74 of 85

AI4BA 2024 - June 25, 2024

75 of 85

AI4BA 2024 - June 25, 2024

76 of 85

Permutations of Node Indices

  • Given a permutation matrix P we can change the ordering of nodes in a graph with adjacency matrix A and data matrix X by
    • X’ = XP
    • A’ = PTAP
  • A permutation matrix is a matrix where exactly 1 element in a row or a column is 1

AI4BA 2024 - June 25, 2024

77 of 85

AI4BA 2024 - June 25, 2024

78 of 85

What is a Graph Neural Network

  • Given graph with adjacency matrix A and data matrix X we pass it through a series of GNN layers
  • Each layer creates an intermediate “Hidden” representation of G, namely Hk for k = 1,…,K.
    • Thus HK is the final representation layer.
  • Each column of HK contains information about the corresponding node and its context.
  • Remember Transformers or CNNs?
    • Each layer outputs a representation modeling the input at different scopes.

AI4BA 2024 - June 25, 2024

79 of 85

Tasks on Graphs

  • Graph Classification
  • Node Classification
  • Edge Classification

AI4BA 2024 - June 25, 2024

80 of 85

Tasks on Graphs

AI4BA 2024 - June 25, 2024

81 of 85

Graph-Level Tasks

  • The network assigns a label or estimates one or more values from the entire graph, exploiting both the structure and node embeddings.
    • For example, we might want to predict the temperature at which a molecule becomes liquid (a regression task) or whether a molecule is poisonous to human beings or not (a classification task).�
  • Where the scalar βK and 1×D vector ωK are learned parameter Post-multiplying the output embedding matrix HK by the column vector 1 that contains ones has the effect of summing together all the embeddings and subsequently dividing by the number of nodes N computes the average

AI4BA 2024 - June 25, 2024

82 of 85

Node-Level Tasks

  • The network assigns a label (classification) or one or more values (regression) to each node of the graph, using both the graph structure and node embeddings.
    • For example, given a graph constructed from a 3D point cloud the goal might be to classify the nodes according to whether they belong to any part of the object in the cloud.�

AI4BA 2024 - June 25, 2024

83 of 85

Edge-Level Tasks

  • The network predicts whether or not there should be an edge between nodes n and m.
    • For example, in the social network setting, the network might predict whether two people know and like each other and suggest that they connect if that is the case.�

AI4BA 2024 - June 25, 2024

84 of 85

Take Home Message

What have we learnt, so far…

AI4BA 2024 - June 25, 2024

85 of 85

Remember Remember…

  • Learning from experience
  • Everything is a function
  • Neural Networks are “flexible” functions
    • Features are automatically extracted from data
  • Inductive biases are important
  • Graphs are ubiquitous
    • Graph Neural Networks are NN applied to Graph-structured data

AI4BA 2024 - June 25, 2024