1 of 30

Graph Random Neural Networks for

Semi-Supervised Learning on Graphs

Presented by:

Gauransh Sawhney 2018A3PS0325P

Utkarsh Kumar Singh 2018A3PS0368P

In partial fulfillment of the requirements of the course:

BITS F464 Machine Learning

Submitted to: Dr. Kamlesh Tiwari

2 of 30

Semi-Supervised Learning

What is Semi-Supervised Learning
Semi-Supervised Learning on Graphs

3 of 30

Semi-Supervised Learning on Graphs

4 of 30

Graph Neural Networks

What are Graph Neural Networks(GNNs)
Problems with existing implementations

5 of 30

Graph Neural Network

6 of 30

Existing Issues

Oversmoothing

Stacking multiple GNN layers makes nodes indistinguishable; coupling the feature propagation and non-linear transformation steps, aggravates this problem

Not robust to graph attacks

Each node is highly dependent on neighbors, making it non-robust to noise

Overfitting in case of semi-supervised

In the standard setting of semi-supervised training, scarce node label information can be overfit

7 of 30

GRAND: Graph Random Neural Network

Architecture
Algorithm
How does it tackle the issues faced by other GNNs

11 of 30

Loss Functions

12 of 30

Results

Some of the results presented in the paper:

Comparison with existing architectures on benchmarks
Generalization analysis
Robustness analysis
Over-smoothing analysis
Results on large datasets

13 of 30

Dataset Description

Dataset	Nodes	Edges	Train/Valid/Test Nodes	Classes	Features	Default Label Rate
Cora	2708	5429	140/500/1000	7	1433	0.052
Citeseer	3327	4732	120/500/1000	6	3703	0.036
Pubmed	19717	44338	60/500/1000	3	500	0.003

3 datasets were used to benchmark results

14 of 30

Comparison with existing architectures

15 of 30

Generalization Analysis

Without RP

(b) Without CR

16 of 30

Robustness Analysis

17 of 30

Over-smoothing analysis

18 of 30

Other results presented in the paper

Over-smoothness of GRAND and its variants(on Cora)

19 of 30

Other results presented in the paper

Classification Accuracy of GRAND on large datasets

20 of 30

Experiments

The following experiments were conducted:

MLP v/s GCN as classification network
Classification accuracy v/s {K, S}
Sensitivity wrt CR loss coefficient ƛ

21 of 30

MLP v/s GCN

Effect of using an MLP vs GCN as the classification network

GRAND very clearly outperforms

GRAND_GCN in terms of classification accuracy

22 of 30

2. Classification Accuracy v/s {K, S}

(i) Effect of K(propagation order) and S(number of data augmentations) on

Classification Accuracy on GRAND(DropNode data augmentation)

23 of 30

2. Classification Accuracy v/s {K, S}

24 of 30

2. Classification Accuracy v/s {K, S}

(ii) Effect of K(propagation order) and S(number of data augmentations) on

Classification Accuracy on GRAND_dropout and GRAND_dropedge(alternative data augmentation techniques)

25 of 30

3. Sensitivity wrt ƛ

Classification Accuracy v/s ƛ

MLP classification network

GCN classification network

Both using DropNode for data augmentation

Consistency Regularization Loss Coefficient

26 of 30

3. Sensitivity wrt ƛ

Classification Accuracy v/s ƛ

Both using MLPs as classification networks

DropEdge data augmentation

Dropout data augmentation

27 of 30

3. Sensitivity wrt ƛ

28 of 30

Ablation Study

The effect of the absence of the following parameters was studied:

w/o consistency regularization(CR)
w/o multiple dropnode(mDN)
w/o sharpening
w/o consistency regularization(CR) and dropnode(DN)

29 of 30

Ablation Study

Method	Cora	Pubmed	Citeseer
w/o CR (λ=0)	0.841	0.811	0.728
w/o mDN (S=1)	0.85	0.80	0.744
w/o sharpening (T=1)	0.844	0.816	0.578
w/o CR & DN (λ=0, δ=0)	0.835	0.787	0.597

these are classification accuracies

1 of 30

2 of 30

3 of 30

4 of 30

5 of 30

6 of 30

7 of 30

8 of 30

9 of 30

10 of 30

11 of 30

12 of 30

13 of 30

14 of 30

15 of 30

16 of 30

17 of 30

18 of 30

19 of 30

20 of 30

21 of 30

22 of 30

23 of 30

24 of 30

25 of 30

26 of 30

27 of 30

28 of 30

29 of 30

30 of 30