1 of 12

hls4ml Demo @ DEFCON30

FastML 2022

Ben Hawks et al. for the hls4ml team

@quantized_bits, and @hls4ml on Twitter!

2 of 12

Using the Pynq Software stack

(Python API to interact with & program FPGA, hosts Jupyter directly on Pynq-Z2 Board)��Have a live webcam running inferences via HLS4ML accelerator, outputting to an HDMI display

Demo #1 - Live Pokémon Inference

Class: PIkachu

Confidence 78.23%

3 of 12

DEFCON 30

DEFCON is one of the worlds largest annual technology & information/cybersecurity conferences, is open to the public.

25,000 attendees this year!�

Not exclusively information/cybersecurity based, though has a strong emphasis on it and related topics (Privacy, cryptography, hardware, etc.)�
Contains multiple “Villages” (tracks w/ dedicated demo & booth space) for things such as AI, Quantum Computing, Aerospace, etc.�
Very open, inclusive atmosphere of a large number of technical professionals and hobbyists from all fields/backgrounds, attending for professional and personal interests�
Lots of “bleeding edge” technology, discussion, and experts present �
Large US Govt presence to interact, recruit, and engage with the community

US MIL/DDS, DoD, DHS, NSA, and a number of US National Labs (INL, PNNL, etc.)
Some there “officially” with a booth etc., some attending (personally and professionally)

4 of 12

DEFCON 30 Demo Labs - hls4ml Live Demonstration

We submitted & were selected to present a live demo of hls4ml in the�“Demo Labs” portion of DEFCON 30
Showed a live real time image classification demo, along with �background and how to use the tool to generate FPGA firmware

Using the Pynq Software stack

(Python API to interact with & program FPGA, hosts Jupyter directly on Pynq-Z2 Board)��Have a live webcam running inferences via HLS4ML accelerator, outputting to an HDMI display

Class: PIkachu

5 of 12

DEFCON 30 Demo Labs - hls4ml Live Demonstration

Reception was good! Lots of interest from the attendees (~25-30, full room) with follow up questions, good feedback on future directions to take the tool, what users applications and interests are, what perceived “strengths” hls4ml has

Industrial, IoT applications
Edge/Low Power
Interest in developing a fully open source toolchain (vs use of vendor tools)
Custom/flexible/open solutions are a main strength vs other tools

“Bring your own model” is appealing vs other “black box” (application specific) solutions�

Lots of meaningful “chance” encounters waiting in line, at villages, etc. with people from industry & government, leading to demo attendees and follow up contact post-conference!

6 of 12

“RN07” (v0.7):

58,115 parameters

83.5% acc. on CIFAR-10*

(note: removed activations)

Example Model - Image Classification

This is a 2D Convolutional Neural Network

Originally based on Resnet-8…
…but we removed the residual connections and changed the architecture a bit
Quantized weights, biases, and inputs to 8b (via QKeras)

Trained to distinguish between 10 Classes, originally from CIFAR-10 (32x32 px, 24b RGB images)

��etc.

…but we also retrained it on Pokémon for this live demo

7 of 12

Dataset - Pokémon

Training Dataset, ~23k images for 151 pokemon

using 0.25 Train/Val split during training
~2600 images for 10 class set

Bulbasaur, Charmander, Eevee, Gengar, Jigglypuff, Mewtwo, Onix, Pikachu, Snorlax, Squirtle

https://www.kaggle.com/datasets/unexpectedscepticism/11945-pokemon-from-first-gen
https://www.kaggle.com/datasets/thedagger/pokemon-generation-one
https://www.kaggle.com/datasets/lantian773030/pokemonclassification

Test Dataset, ~525 Images for 10 classes, downloaded pokemon card images from online, light processing (cropping)

Code to reproduce here: https://github.com/ben-hawks/pokedex_scraper

Example images of each (test) class

hls4ml tutorial

Aug 13, 2022

8 of 12

TUL Pynq-Z2 w/ Xilinx Zynq XC7Z020

ARM Cores (PS)�

* Run OS (Ubuntu), Network, USB, etc.

�* Host Jupyter Server w/ Python Code

�* Image Capture & processing

FPGA (PL)�

* Perform NN Inference

�* Output HDMI

* Image Preprocessing**

AXI DMA

** Capable of accelerating some OpenCV operations, but we ran out of time :)

Demo Hardware - Pynq Z2

Zynq XC7Z020 Block Diagram

9 of 12

Neural Network

HDMI Out

Add Text�(Prediction)

Crop

Resize to�32x32px�(Bilinear)

Image is natively 640x480,

24 bit (3x8b) RGB

Image to Display

Pred. Class

Demo #1 - Image Processing Flow

FPGA

CPU

Class: PIkachu

10 of 12

Using the Pynq Software stack

(Python api to interact with & program FPGA, hosts Jupyter directly on Pynq)��Run a sample model on the accelerator & MCU with live representation on screen to demo speed of accelerator vs a regular MCU

DDR3 RAM

Power

ACCELERATOR�Inferences/Second: XX�Progress: 45/100 Images

MICROCONTROLLER�Inferences/Second: XX�Progress: 32/100 Images

Demo #2 - Inference Race

11 of 12

Demo #2 - Inference Race

https://www.youtube.com/watch?v=YnBcGMjU_bE

12 of 12

Dataset - CIFAR 10

The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images.
The test batch contains exactly 1000 randomly-selected images from each class.
The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another.
Between them, the training batches contain exactly 5000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Example images of each class

Dataset, images, and text from: �https://www.cs.toronto.edu/~kriz/cifar.html

hls4ml tutorial

Aug 13, 2022