1 of 19

DGNN-Booster: A Generic FPGA Accelerator Framework For�Dynamic Graph Neural Network Inference

Hanqiu Chen and Cong Hao

Georgia Institute of Technology

School of Electrical and Computer Engineering

Presenter: Hanqiu Chen

Email: hchen799@gatech.edu

2 of 19

Dynamic Graph Neural Network

2

A dynamic social network

3 of 19

Dynamic Graph Neural Network

3

dynamic graph modeling

A dynamic social network

A dynamic graph evolves over time

(snapshots at different time steps)

4 of 19

Different types of DGNNs

4

Stacked DGNN

  • Dependent GNN and RNN in the same time step.

  • Independent GNN at different time steps.

5 of 19

Different types of DGNNs

5

Stacked DGNN

  • Dependent GNN and RNN in the same time step.

  • Independent GNN at different time steps.

Integrated DGNN

  • Dependent GNN and RNN in adjacent time step.

  • Dependent GNN at different time steps.

6 of 19

Different types of DGNNs

6

Stacked DGNN

  • Dependent GNN and RNN in the same time step.

  • Independent GNN at different time steps.

Integrated DGNN

  • Dependent GNN and RNN in adjacent time step.

  • Dependent GNN at different time steps.

Weights-evolved DGNN

  • RNN evolves GNN weights.

  • Independent GNN at different time steps.

7 of 19

Motivations

7

High energy

consumption

Low parallelism

Low GPU utilization

1

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

8 of 19

Motivations

8

High energy

consumption

Low parallelism

Low GPU utilization

GNN

RNN

GNN

RNN

1

2

Sequential computing

GNN

RNN

GNN

RNN

Parallel computing

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

9 of 19

Motivations

9

High energy

consumption

Low parallelism

Low GPU utilization

GNN

RNN

Previous works optimize GNN or RNN separately

GNN

RNN

We optimize GNN and RNN together

GNN

RNN

GNN

RNN

1

2

3

Sequential computing

GNN

RNN

GNN

RNN

Parallel computing

[IISWC’22] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

10 of 19

Innovations

10

  • Better hardware performance with high flexibility
    • Lower latency and energy consumption than CPU and GPU
    • HLS allows easy integration of GNNs and RNNs

11 of 19

Innovations

11

  • Better hardware performance with high flexibility
    • Lower latency and energy consumption than CPU and GPU
    • HLS allows easy integration of GNNs and RNNs
  • Multi-level parallelism
    • High level: GNN and RNN in adjacent time steps are pipelined
    • Low level: Computation inside GNN and RNN is in data streaming

12 of 19

Innovations

12

  • Better hardware performance with high flexibility
    • Lower latency and energy consumption than CPU and GPU
    • HLS allows easy integration of GNNs and RNNs
  • Multi-level parallelism
    • High level: GNN and RNN in adjacent time steps are pipelined
    • Low level: Computation inside GNN and RNN is in data streaming
  • Hardware-efficient architecture design
    • Task scheduling, graph renumbering and different types of RAMs

13 of 19

DGNN-Booster V1

13

 

Can be applied to

Stacked DGNN

Weights-evolved DGNN

14 of 19

DGNN-Booster V2

14

  • Node queues: implemented using FIFOs to overlap GNN and RNN in the same time step (node-level parallelism)

  • Message passing and node transformation in GNN are in data streaming; different stages in RNN are also in data streaming

Can be applied to

Stacked DGNN

Integrated DGNN

15 of 19

Experiment results

15

End-to-end on-board evaluation

  • Dataset: BC-Alpha and UCI
  • Base model: EvolveGCN (V1) and GCRN-M2 (V2)

  • Metric: latency, total energy and runtime energy

16 of 19

Design space exploration and ablation study

16

Baseline: Without applying optimizations

Pipeline-O1: Pipelines different stages inside RNN

Pipeline-O2: Overlaps GNN and RNN

17 of 19

Design space exploration and ablation study

17

Baseline: Without applying optimizations

Pipeline-O1: Pipelines different stages inside RNN

Pipeline-O2: Overlaps GNN and RNN

Balance GNN and RNN computation with limited DSP resource on ZCU102 board

18 of 19

Future work

18

  • Compare similarity between snapshots in adjacent time steps to avoid redundant data communication and computation.

  • Automatic dynamic design configuration to balance computation resources balancing for spatial and temporal encoding parts based on the task.

19 of 19

Summary & Thanks!

19

Contact: hchen799@gatech.edu Sharc-lab @ Georgia Tech (https://sharclab.ece.gatech.e du/)

  • DGNNs require acceleration and real-time processing
    • Complexity of time-evolving graphs and hardware bottlenecks

  • DGNN-Booster: our DGNN acceleration framework on FPGA
    • Generic: supports a wide range of GNN and RNN models
    • Two versions: support different dataflows
    • Open-source: High-Level Synthesis (HLS) based

  • Beats CPU and GPU and achieve computation balance
    • Future direction1: avoid redundant computation

on similar snapshots

    • Future direction2: dynamic runtime reconfiguration

for resource and workflow balancing