1 of 19

DGNN-Booster: A Generic FPGA Accelerator Framework For�Dynamic Graph Neural Network Inference

Hanqiu Chen and Cong Hao

Georgia Institute of Technology

School of Electrical and Computer Engineering

Presenter: Hanqiu Chen

Email: hchen799@gatech.edu

Sharc Lab

Code: [https://github.com/sharc-lab/DGNN-Booster]

2 of 19

Dynamic Graph Neural Network

2

A dynamic social network

3 of 19

Dynamic Graph Neural Network

3

dynamic graph modeling

A dynamic social network

A dynamic graph evolves over time

(snapshots at different time steps)

4 of 19

Different types of DGNNs

4

Stacked DGNN

Dependent GNN and RNN in the same time step.

Independent GNN at different time steps.

5 of 19

Different types of DGNNs

5

Stacked DGNN

Dependent GNN and RNN in the same time step.

Independent GNN at different time steps.

Integrated DGNN

Dependent GNN and RNN in adjacent time step.

Dependent GNN at different time steps.

6 of 19

Different types of DGNNs

6

Stacked DGNN

Dependent GNN and RNN in the same time step.

Independent GNN at different time steps.

Integrated DGNN

Dependent GNN and RNN in adjacent time step.

Dependent GNN at different time steps.

Weights-evolved DGNN

RNN evolves GNN weights.

Independent GNN at different time steps.

7 of 19

Motivations

7

High energy

consumption

Low parallelism

Low GPU utilization

1

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

8 of 19

Motivations

8

High energy

consumption

Low parallelism

Low GPU utilization

GNN

RNN

GNN

RNN

1

2

…

Sequential computing

GNN

RNN

GNN

RNN

Parallel computing

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

9 of 19

Motivations

9

High energy

consumption

Low parallelism

Low GPU utilization

GNN

RNN

Previous works optimize GNN or RNN separately

GNN

RNN

We optimize GNN and RNN together

GNN

RNN

GNN

RNN

1

2

3

…

Sequential computing

GNN

RNN

GNN

RNN

Parallel computing

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

10 of 19

Innovations

10

Better hardware performance with high flexibility

Lower latency and energy consumption than CPU and GPU
HLS allows easy integration of GNNs and RNNs

11 of 19

Innovations

11

Better hardware performance with high flexibility

Lower latency and energy consumption than CPU and GPU
HLS allows easy integration of GNNs and RNNs

Multi-level parallelism

High level: GNN and RNN in adjacent time steps are pipelined
Low level: Computation inside GNN and RNN is in data streaming

12 of 19

Innovations

12

Better hardware performance with high flexibility

Lower latency and energy consumption than CPU and GPU
HLS allows easy integration of GNNs and RNNs

Multi-level parallelism

High level: GNN and RNN in adjacent time steps are pipelined
Low level: Computation inside GNN and RNN is in data streaming

Hardware-efficient architecture design

Task scheduling, graph renumbering and different types of RAMs

13 of 19

DGNN-Booster V1

13

Can be applied to

Stacked DGNN

Weights-evolved DGNN

14 of 19

DGNN-Booster V2

14

Node queues: implemented using FIFOs to overlap GNN and RNN in the same time step (node-level parallelism)

Message passing and node transformation in GNN are in data streaming; different stages in RNN are also in data streaming

Can be applied to

Stacked DGNN

Integrated DGNN

15 of 19

Experiment results

15

End-to-end on-board evaluation

Dataset: BC-Alpha and UCI

Base model: EvolveGCN (V1) and GCRN-M2 (V2)

Metric: latency, total energy and runtime energy

16 of 19

Design space exploration and ablation study

16

Baseline: Without applying optimizations

Pipeline-O1: Pipelines different stages inside RNN

Pipeline-O2: Overlaps GNN and RNN

17 of 19

Design space exploration and ablation study

17

Baseline: Without applying optimizations

Pipeline-O1: Pipelines different stages inside RNN

Pipeline-O2: Overlaps GNN and RNN

Balance GNN and RNN computation with limited DSP resource on ZCU102 board

18 of 19

Future work

18

Compare similarity between snapshots in adjacent time steps to avoid redundant data communication and computation.

Automatic dynamic design configuration to balance computation resources balancing for spatial and temporal encoding parts based on the task.

19 of 19

Summary & Thanks!

19

Contact: hchen799@gatech.edu Sharc-lab @ Georgia Tech (https://sharclab.ece.gatech.e du/)

DGNNs require acceleration and real-time processing

Complexity of time-evolving graphs and hardware bottlenecks

DGNN-Booster: our DGNN acceleration framework on FPGA

Generic: supports a wide range of GNN and RNN models
Two versions: support different dataflows
Open-source: High-Level Synthesis (HLS) based

Beats CPU and GPU and achieve computation balance

Future direction1: avoid redundant computation

on similar snapshots

Future direction2: dynamic runtime reconfiguration

for resource and workflow balancing