BNN-PYNQ overlays
Yaman Umuroglu, NTNU
QPYNQ workshop at RISE SICS
Stockholm, Sweden
Outline
2
Part 1: Introduction to BNN-PYNQ
3
What is BNN-PYNQ?
4
What can BNN-PYNQ do?
Overlay | Performance | Network | Accuracy | Examples |
LFC fully connected 28x28 monochr. | 168 kFPS (974 GOPS) 102 us latency | MNIST | 98.4% | |
Fashion-MNIST | 85.5% | | ||
NIST SD-19 | 79.2% | | ||
CNV VGG-like, convolutional 32x32 RGB | 3 kFPS (341 GOPS) 1541 us latency | CIFAR-10 | 80.1% | |
SVHN | 96.7% | | ||
GTSRB | 97.7% | |
At 2-2.5 W of power consumption
(less if you don’t need so much performance)
5
Typical HW Architecture for Inference
main memory
maximum-sized
compute array
for all layers
off-chip
on-chip
homogeneous
processing
elements
on-chip feedback path
6
FINN: Heterogeneous Streaming Architecture
Layer 0
Layer 1
Layer N
…
image
result
FPGA
BNN topology
1M ops
10M ops
1x PE
10x PE
1x FPS
10x FPS
10x PE
100x PE
More gory hardware details?
8
Whirlwind Tour of Folder Structure
All Jupyter notebooks will be copied to:
/home/xilinx/jupyter_notebooks/bnn
All source code and prebuilt overlays will be copied to:
/opt/python3.6/lib/python3.6/site-packages/bnn/
9
Part 2: Existing Overlays and Networks
10
Overlay vs Network?
The Overlay is a...
The Network is a...
11
The LFC Overlay
Fully binarized, including inputs and outputs
168 kFPS, 0.1 ms latency
available
networks:
input:
28x28
binary
fc:1024
threshold
fc:1024
threshold
fc:1024
threshold
fc:64
output:
64
binary
NIST SD-19
handwritten letters & digits
Fashion-MNIST
clothing
MNIST
handwritten digits
your network!
...
threshold
12
Clone and pip install BNN-PYNQ
From a Jupyter terminal (root by default), run:
pip3.6 install --upgrade git+https://github.com/maltanar/BNN-PYNQ.git
(already installed! may take a few minutes to reinstall)
Note: this is the “workshop version” with some extras, mainstream:
https://github.com/Xilinx/BNN-PYNQ
13
Hands-on: Minimal MNIST
Let’s go through the following Jupyter notebook:
bnn/minimal_mnist.ipynb
14
Under the Hood
Cortex-A9 CPU
FPGA
DRAM
Python libraries
MLBP
Jupyter notebook
15
Load Overlay Bitstream
Cortex-A9 CPU
FPGA
DRAM
Python libraries
MLBP
Jupyter notebook
fc:1024
threshold
fc:1024
threshold
fc:1024
threshold
fc:64
threshold
DMA in
DMA out
parameter memory banks
control/status
16
Load Network Parameters
Cortex-A9 CPU
FPGA
DRAM
Python libraries
MLBP
Jupyter notebook
fc:1024
threshold
fc:1024
threshold
fc:1024
threshold
fc:64
threshold
DMA in
DMA out
parameter memory banks
control/status
17
Resize and Pack Input Images
Cortex-A9 CPU
FPGA
DRAM
Python libraries
MLBP
Jupyter notebook
fc:1024
threshold
fc:1024
threshold
fc:1024
threshold
fc:64
threshold
DMA in
DMA out
parameter memory banks
control/status
pack
resize
N input images
18
Run Accelerator with N Images
Cortex-A9 CPU
FPGA
DRAM
Python libraries
MLBP
Jupyter notebook
fc:1024
threshold
fc:1024
threshold
fc:1024
threshold
fc:64
threshold
DMA in
DMA out
parameter memory banks
control/status
N input images
N output vectors
19
The CNV Overlay
Binarized except first layer: 8-bit inputs
and last layer: 16-bit outputs
3k FPS, 1.5 ms latency
available
networks:
input:
32x32
3x8-bit RGB
cnv:3x3:64
threshold
cnv:3x3:64
threshold
cnv:3x3:128
threshold
cnv:3x3:128
threshold
maxpool:2x2
maxpool:2x2
cnv:3x3:256
threshold
cnv:3x3:256
threshold
fc:512
threshold
fc:512
threshold
fc:64
output:
64
16-bit
CIFAR-10
animals, vehicles..
GTSRB
traffic signs
SVHN
street view house numbers
your network!
...
20
Hands-on: CIFAR-10
Let’s go through the following Jupyter notebook:
bnn/Cifar10.ipynb
21
Part 3: A New Network on an Existing Overlay
22
Topology
New network topology must match overlay exactly
When training new nets, use lfc.py and cnv.py in bnn/src/training to guarantee correct topology for the overlays
23
Input and labels
CNV uses inputs in the range [-1, +1]
LFC uses binarized {-1, +1} inputs and outputs
Remember to rescale inputs, maybe enhance contrast
Use mnist.py and cifar10.py as templates
24
Parameter Generation
Once network is trained, convert npz to packed weights
Almost identical procedure for same overlay, examples:
bnn/src/training/mnist-gen-binary-weights.py
bnn/src/training/cifar10-gen-binary-weights.py
Copy packed weight folder into bnn/params
25
Hands-on: Fashion-MNIST on LFC
Let’s go through the following Jupyter notebook:
bnn/new_params_for_overlay.ipynb
26
Part 4: Making New Overlays
27
Warning: Here Be Dragons
28
Where to Get Started?
Study the source code for existing overlays
29
New Overlay Tips for Current Version
30
Resources and Further Reading
31
Thank you for listening!
32