2 of 18

Task Definition

CORA Dataset

2,708 Research Papers Classified into 7 Subject Categories

Case_Based
Genetic_Algorithms
Neural_Networks
Probabilistic_Methods
Reinforcement_Learning
Rule_Learning
Theory

GOAL: Determine which category the research paper belongs to.

TJ Machine Learning Club

Slide 2

3 of 18

Feature Vector

The CORA dataset has extracted a 1433 length feature vector to represent information from each paper
The 1433 values are based on words that appear in the paper
See CORA Documentation for more details

	Feat 1	Feat 2	Feat 3	…	Feat 1433	Subject
Paper 1	1	0	1	…	1	Case_Based
Paper 2	0	0	1	…	1	Theory
Paper 3	1	1	0	…	0	Neural_Networks

TJ Machine Learning Club

Slide 3

4 of 18

How should we solve this task? What model should we use?

	Feat 1	Feat 2	Feat 3	…	Feat 1433	Subject
Paper 1	1	0	1	…	1	Case_Based
Paper 2	0	0	1	…	1	Theory
Paper 3	1	1	0	…	0	Neural_Networks

TJ Machine Learning Club

Slide 4

5 of 18

Let’s Add One Additional Feature… Citations!

The CORA dataset also provides information of which papers cite each other

Can we represent citations in our table? How else can we represent the citation relationship?

TJ Machine Learning Club

Slide 5

6 of 18

Graph Neural Networks

TJ Machine Learning Club

Slide 6

7 of 18

Can we represent this dataset as a graph?

The nodes are the individual papers, each with its associated feature vector containing whether or not a word exists in the paper
The edges represent connections between papers, where one paper cited the other paper.

How can this graph representation be useful in classifying the topic of a paper?

From Wolfram Mathworld

ML = 1

Biology = 0

Reinforcement = 0

Probability = 0

…

Neural Networks

TJ Machine Learning Club

Slide 7

8 of 18

Review: How do Convolutional Neural Networks Work

Convolutional neural networks use kernels to propagate through each layer

It encodes information from each individual pixel combined with the surrounding pixels to propagate forward through the network

How can we apply this to a graph?

Similar to the way CNN’s use surrounding information to help predict, we can use the neighboring cited papers to help predict the topic of the paper.

TJ Machine Learning Club

Slide 8

9 of 18

Overview of Graph Neural Networks

Each paper/node has an associated feature vector (words in the paper)
Goal of Graph Neural Networks: Generate embeddings for each node that encode features and neighborhood information into low-dimensional vector

This embedding will then be used to classify what the topic of the paper is
Embeddings can also be used for node similarity calculation, clustering, and graph classification

x1
x2
x3
x4
. . .

X_v =

TJ Machine Learning Club

Slide 9

10 of 18

Basic Neighborhood Aggregation

For a single node, we want to aggregate the data from the surrounding nodes, and use them to help us get a spatial understanding of the data

This will create a more accurate embedding vector because we can use information about the cited papers to help us determine the topic of our target paper

TJ Machine Learning Club

Slide 10

11 of 18

Basic Neighborhood Aggregation continued

We define two weight matrices, one to multiply with neighboring embeddings and one to multiply with current node’s previous embeddings

TJ Machine Learning Club

Slide 11

12 of 18

Loss Function

Goal: Develop the most accurate 7-dimensional embedding for each node (i.e. research paper)

Why 7-dimensional?

Categorical Cross-Entropy

TJ Machine Learning Club

Slide 12

13 of 18

Training

Select several nodes for which you have feature vectors for
Train using backprop on these nodes (these group of nodes is akin to batches in traditional deep learning)

TJ Machine Learning Club

Slide 13

14 of 18

Inductive Capabilities of Graph Neural Networks

Don’t need to train separate weights for each computation graph
If new node gets added to graph, we can compute new embeddings on demand
We can also use the same weights for an entirely new graph

TJ Machine Learning Club

Slide 14

15 of 18

Graph Convolutional Networks

Use slightly different aggregation technique that lessens effect of neighbor embeddings with high magnitude
Also uses same weights for both neighboring embeddings and own embedding
Note that any aggregation technique will work as long as its differentiable

TJ Machine Learning Club

Slide 15

16 of 18

Implementing GNNs

PyTorch Geometric (built off PyTorch)
Graph Nets (built off Tensorflow)
StellarGraph
DGL (framework agnostic)

Images from libraries’ respective pages

TJ Machine Learning Club

Slide 16

17 of 18

Applications

Model and understand physical systems
Traffic prediction
Social Network Analysis
Some papers that use GNNs to make headway on traditional graph CS problems (e.g. Travelling salesman problem)

Using GNNs to model glass

Both figures from DeepMind blog posts

TJ Machine Learning Club

Slide 17

18 of 18

Credits

All uncited images from Stanford SNAP lecture slides:

http://snap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf

TJ Machine Learning Club

Slide 18