1 of 18

Predicting the Subject of Research Papers

Sauman Das and Arnav Jain

TJ Machine Learning Club

Slide 1

2 of 18

Task Definition

  • CORA Dataset
    • 2,708 Research Papers Classified into 7 Subject Categories
      • Case_Based
      • Genetic_Algorithms
      • Neural_Networks
      • Probabilistic_Methods
      • Reinforcement_Learning
      • Rule_Learning
      • Theory

GOAL: Determine which category the research paper belongs to.

TJ Machine Learning Club

Slide 2

3 of 18

Feature Vector

  • The CORA dataset has extracted a 1433 length feature vector to represent information from each paper
  • The 1433 values are based on words that appear in the paper
  • See CORA Documentation for more details

Feat 1

Feat 2

Feat 3

Feat 1433

Subject

Paper 1

1

0

1

1

Case_Based

Paper 2

0

0

1

1

Theory

Paper 3

1

1

0

0

Neural_Networks

TJ Machine Learning Club

Slide 3

4 of 18

How should we solve this task? What model should we use?

Feat 1

Feat 2

Feat 3

Feat 1433

Subject

Paper 1

1

0

1

1

Case_Based

Paper 2

0

0

1

1

Theory

Paper 3

1

1

0

0

Neural_Networks

TJ Machine Learning Club

Slide 4

5 of 18

Let’s Add One Additional Feature… Citations!

  • The CORA dataset also provides information of which papers cite each other

Can we represent citations in our table? How else can we represent the citation relationship?

TJ Machine Learning Club

Slide 5

6 of 18

Graph Neural Networks

TJ Machine Learning Club

Slide 6

7 of 18

Can we represent this dataset as a graph?

  • The nodes are the individual papers, each with its associated feature vector containing whether or not a word exists in the paper
  • The edges represent connections between papers, where one paper cited the other paper.

How can this graph representation be useful in classifying the topic of a paper?

From Wolfram Mathworld

ML = 1

Biology = 0

Reinforcement = 0

Probability = 0

Neural Networks

TJ Machine Learning Club

Slide 7

8 of 18

Review: How do Convolutional Neural Networks Work

  • Convolutional neural networks use kernels to propagate through each layer
    • It encodes information from each individual pixel combined with the surrounding pixels to propagate forward through the network

How can we apply this to a graph?

Similar to the way CNN’s use surrounding information to help predict, we can use the neighboring cited papers to help predict the topic of the paper.

TJ Machine Learning Club

Slide 8

9 of 18

Overview of Graph Neural Networks

  • Each paper/node has an associated feature vector (words in the paper)
  • Goal of Graph Neural Networks: Generate embeddings for each node that encode features and neighborhood information into low-dimensional vector
    • This embedding will then be used to classify what the topic of the paper is
    • Embeddings can also be used for node similarity calculation, clustering, and graph classification

x1

x2

x3

x4

.

.

.

Xv =

v

TJ Machine Learning Club

Slide 9

10 of 18

Basic Neighborhood Aggregation

  • For a single node, we want to aggregate the data from the surrounding nodes, and use them to help us get a spatial understanding of the data
    • This will create a more accurate embedding vector because we can use information about the cited papers to help us determine the topic of our target paper

TJ Machine Learning Club

Slide 10

11 of 18

Basic Neighborhood Aggregation continued

  • We define two weight matrices, one to multiply with neighboring embeddings and one to multiply with current node’s previous embeddings

TJ Machine Learning Club

Slide 11

12 of 18

Loss Function

  • Goal: Develop the most accurate 7-dimensional embedding for each node (i.e. research paper)
    • Why 7-dimensional?
  • Categorical Cross-Entropy

TJ Machine Learning Club

Slide 12

13 of 18

Training

  • Select several nodes for which you have feature vectors for
  • Train using backprop on these nodes (these group of nodes is akin to batches in traditional deep learning)

TJ Machine Learning Club

Slide 13

14 of 18

Inductive Capabilities of Graph Neural Networks

  • Don’t need to train separate weights for each computation graph
  • If new node gets added to graph, we can compute new embeddings on demand
  • We can also use the same weights for an entirely new graph

TJ Machine Learning Club

Slide 14

15 of 18

Graph Convolutional Networks

  • Use slightly different aggregation technique that lessens effect of neighbor embeddings with high magnitude
  • Also uses same weights for both neighboring embeddings and own embedding
  • Note that any aggregation technique will work as long as its differentiable

TJ Machine Learning Club

Slide 15

16 of 18

Implementing GNNs

  • PyTorch Geometric (built off PyTorch)
  • Graph Nets (built off Tensorflow)
  • StellarGraph
  • DGL (framework agnostic)

Images from libraries’ respective pages

TJ Machine Learning Club

Slide 16

17 of 18

Applications

  • Model and understand physical systems
  • Traffic prediction
  • Social Network Analysis
  • Some papers that use GNNs to make headway on traditional graph CS problems (e.g. Travelling salesman problem)

Using GNNs to model glass

Both figures from DeepMind blog posts

TJ Machine Learning Club

Slide 17

18 of 18

Credits

All uncited images from Stanford SNAP lecture slides:

http://snap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf

TJ Machine Learning Club

Slide 18