1 of 30

Graph Random Neural Networks for

Semi-Supervised Learning on Graphs

Presented by:

Gauransh Sawhney 2018A3PS0325P

Utkarsh Kumar Singh 2018A3PS0368P

In partial fulfillment of the requirements of the course:

BITS F464 Machine Learning

Submitted to: Dr. Kamlesh Tiwari

2 of 30

Semi-Supervised Learning

  • What is Semi-Supervised Learning
  • Semi-Supervised Learning on Graphs

3 of 30

Semi-Supervised Learning on Graphs

4 of 30

Graph Neural Networks

  • What are Graph Neural Networks(GNNs)
  • Problems with existing implementations

5 of 30

Graph Neural Network

6 of 30

Existing Issues

  • Oversmoothing
    • Stacking multiple GNN layers makes nodes indistinguishable; coupling the feature propagation and non-linear transformation steps, aggravates this problem
  • Not robust to graph attacks
    • Each node is highly dependent on neighbors, making it non-robust to noise
  • Overfitting in case of semi-supervised
    • In the standard setting of semi-supervised training, scarce node label information can be overfit

7 of 30

GRAND: Graph Random Neural Network

  • Architecture
  • Algorithm
  • How does it tackle the issues faced by other GNNs

8 of 30

GRAND

9 of 30

GRAND

10 of 30

Algorithm

11 of 30

Loss Functions

12 of 30

Results

Some of the results presented in the paper:

  • Comparison with existing architectures on benchmarks
  • Generalization analysis
  • Robustness analysis
  • Over-smoothing analysis
  • Results on large datasets

13 of 30

Dataset Description

Dataset

Nodes

Edges

Train/Valid/Test Nodes

Classes

Features

Default Label Rate

Cora

2708

5429

140/500/1000

7

1433

0.052

Citeseer

3327

4732

120/500/1000

6

3703

0.036

Pubmed

19717

44338

60/500/1000

3

500

0.003

3 datasets were used to benchmark results

14 of 30

Comparison with existing architectures

15 of 30

Generalization Analysis

  1. Without RP

(b) Without CR

(c) GRAND(with RP and CR)

16 of 30

Robustness Analysis

17 of 30

Over-smoothing analysis

18 of 30

Other results presented in the paper

Over-smoothness of GRAND and its variants(on Cora)

19 of 30

Other results presented in the paper

Classification Accuracy of GRAND on large datasets

20 of 30

Experiments

The following experiments were conducted:

  • MLP v/s GCN as classification network
  • Classification accuracy v/s {K, S}
  • Sensitivity wrt CR loss coefficient ƛ

21 of 30

  1. MLP v/s GCN

Effect of using an MLP vs GCN as the classification network

GRAND very clearly outperforms

GRAND_GCN in terms of classification accuracy

22 of 30

2. Classification Accuracy v/s {K, S}

(i) Effect of K(propagation order) and S(number of data augmentations) on

Classification Accuracy on GRAND(DropNode data augmentation)

23 of 30

2. Classification Accuracy v/s {K, S}

24 of 30

2. Classification Accuracy v/s {K, S}

(ii) Effect of K(propagation order) and S(number of data augmentations) on

Classification Accuracy on GRAND_dropout and GRAND_dropedge(alternative data augmentation techniques)

25 of 30

3. Sensitivity wrt ƛ

Classification Accuracy v/s ƛ

MLP classification network

GCN classification network

Both using DropNode for data augmentation

Consistency Regularization Loss Coefficient

26 of 30

3. Sensitivity wrt ƛ

Classification Accuracy v/s ƛ

Both using MLPs as classification networks

DropEdge data augmentation

Dropout data augmentation

27 of 30

3. Sensitivity wrt ƛ

28 of 30

Ablation Study

The effect of the absence of the following parameters was studied:

  • w/o consistency regularization(CR)
  • w/o multiple dropnode(mDN)
  • w/o sharpening
  • w/o consistency regularization(CR) and dropnode(DN)

29 of 30

Ablation Study

Method

Cora

Pubmed

Citeseer

w/o CR (λ=0)

0.841

0.811

0.728

w/o mDN (S=1)

0.85

0.80

0.744

w/o sharpening (T=1)

0.844

0.816

0.578

w/o CR & DN (λ=0, δ=0)

0.835

0.787

0.597

these are classification accuracies

30 of 30

END

Thank You!