Challenges & Thinking �in Go-production of GNN
AWS Shanghai AI Lab
AWS Machine Learning Solution Lab
Dr. Jian Zhang, Senior Data Scientist
Data
Model
Architecture
Explainability
Agenda
Does your graph data contain enough information?
01
Common Graphs used in the academia
Cora
Citeseer
PubMed
Data and Graph Visualization are from https://gnnvis.github.io/
Real World Graph Data
In a very sparse graph, GNN models only achieve 1.4% gain than existing xGboost models
Real World Graph Data
Only have 0.009% labeled nodes
48%
Data (features) determines the upbound of models’ performance, and models just approach the upbound as close as possible.
Information of Graph
What is the information of graph?
Can and how it guide the use of GNN models?
How to quantify it?
When should we use graphs?
Challenges and Thinking
In what cases are GNN models better than other ML models?
02
Dr. Zhang, what GNN models should we use for our graphs?
Design Space is huge
Design decision
vs
specific business cases
Only Message Passing?
One more harsh question
The diagram was cited from the paper https://cs.stanford.edu/people/jure/pubs/gnndesign-neurips20.pdf
Dr. Zhang, our xGboost models out-perform your GNN models !!!
GNN works if connected
xGboost
Traditional (feature-based) ML models require strong signals as inputs. Feature engineering is a must-have. But if there is no feature …
Many are featureless
When use GNN models?
How to use features of nodes/edges?
Combine GNN with other ML models?
Challenges and Thinking
A GNN models <=> Biz Cases mapping sheet?
Can and how GNN models perform real-time inference?
03
Real-time Inference
1st, save new nodes/edges into existing graph; 2nd, extract a N-hop subgraph, and then send it to models
Batch Inference
With a time-window, aggregate new nodes/edges into a new graph and send it to models
inference
GNN models
inference
GNN
models
N-hop subgraph
Data Extraction
Results analysis
Inference
Data Persistence
Data Pipeline
Existing Data Pipeline good for real-time GNN?
Architecture design for real-time GNN inference?
Existing GraphDBs fast enough for insertion and extraction?
Graph-based streaming tools/solutions?
Challenges and Thinking
Explainability become a must-have
04
“Wait a moment, how to explain the results?”
“Let’s go online first”
91%
91%
Academia
Semi-Industrial
Diagrams come from:https://arxiv.org/pdf/1903.03894.pdf 和 https://arxiv.org/pdf/2011.12193.pdf
and DGL’s GNNExplainer example:https://github.com/dmlc/dgl/tree/master/examples/pytorch/gnn_explainer
Explanation of real world data
Explore and Hypothesis Generation
Visually overwhelmed
A 2-hop subgraph from a heterogeneous graph, fanout = [20, 20]
Can GNN models and results be explained?
Who explain? Biz Analysts, Data Scientists, Algorithm Engineers?
Any dedicate tools for interactive exploration ?
……
Challenges and Thinking
THANK YOU!