1 of 24

Challenges & Thinking �in Go-production of GNN

AWS Shanghai AI Lab

AWS Machine Learning Solution Lab

Dr. Jian Zhang, Senior Data Scientist

2 of 24

Data

Model

Architecture

Explainability

Agenda

3 of 24

Does your graph data contain enough information?

4 of 24

Common Graphs used in the academia

Cora

Citeseer

PubMed

Data and Graph Visualization are from https://gnnvis.github.io/

5 of 24

Real World Graph Data

In a very sparse graph, GNN models only achieve 1.4% gain than existing xGboost models

6 of 24

Real World Graph Data

Only have 0.009% labeled nodes

7 of 24

48%

Data (features) determines the upbound of models’ performance, and models just approach the upbound as close as possible.

Information of Graph

8 of 24

What is the information of graph?

Can and how it guide the use of GNN models?

How to quantify it?

When should we use graphs?

Challenges and Thinking

9 of 24

In what cases are GNN models better than other ML models?

10 of 24

Dr. Zhang, what GNN models should we use for our graphs?

11 of 24

Design Space is huge

Design decision

specific business cases

Only Message Passing?

One more harsh question

The diagram was cited from the paper https://cs.stanford.edu/people/jure/pubs/gnndesign-neurips20.pdf

12 of 24

Dr. Zhang, our xGboost models out-perform your GNN models !!!

13 of 24

GNN works if connected

xGboost

Traditional (feature-based) ML models require strong signals as inputs. Feature engineering is a must-have. But if there is no feature …

Many are featureless

14 of 24

When use GNN models?

How to use features of nodes/edges?

Combine GNN with other ML models?

Challenges and Thinking

A GNN models <=> Biz Cases mapping sheet?

15 of 24

Can and how GNN models perform real-time inference?

16 of 24

Real-time Inference

1st, save new nodes/edges into existing graph; 2^nd, extract a N-hop subgraph, and then send it to models

Batch Inference

With a time-window, aggregate new nodes/edges into a new graph and send it to models

inference

GNN models

inference

GNN

models

N-hop subgraph

17 of 24

Data Extraction

Results analysis

Inference

Data Persistence

Data Pipeline

18 of 24

Existing Data Pipeline good for real-time GNN?

Architecture design for real-time GNN inference?

Existing GraphDBs fast enough for insertion and extraction?

Graph-based streaming tools/solutions?

Challenges and Thinking

19 of 24

Explainability become a must-have

20 of 24

“Wait a moment, how to explain the results?”

“Let’s go online first”

91%

21 of 24

Academia

Semi-Industrial

Diagrams come from：https://arxiv.org/pdf/1903.03894.pdf 和 https://arxiv.org/pdf/2011.12193.pdf

and DGL’s GNNExplainer example：https://github.com/dmlc/dgl/tree/master/examples/pytorch/gnn_explainer

22 of 24

Explanation of real world data

Explore and Hypothesis Generation

Visually overwhelmed

A 2-hop subgraph from a heterogeneous graph, fanout = [20, 20]

23 of 24

Can GNN models and results be explained?

Who explain? Biz Analysts, Data Scientists, Algorithm Engineers?

Any dedicate tools for interactive exploration ?

……

Challenges and Thinking

1 of 24

2 of 24

3 of 24

4 of 24

5 of 24

6 of 24

7 of 24

8 of 24

9 of 24

10 of 24

11 of 24

12 of 24

13 of 24

14 of 24

15 of 24

16 of 24

17 of 24

18 of 24

19 of 24

20 of 24

21 of 24

22 of 24

23 of 24

24 of 24