1 of 12

1

hls4ml Demo @ DEFCON30

FastML 2022

Ben Hawks et al. for the hls4ml team

@quantized_bits, and @hls4ml on Twitter!

2 of 12

Using the Pynq Software stack

(Python API to interact with & program FPGA, hosts Jupyter directly on Pynq-Z2 Board)��Have a live webcam running inferences via HLS4ML accelerator, outputting to an HDMI display

Demo #1 - Live Pokémon Inference

Class: PIkachu

Confidence 78.23%

“Pokémon” © 1995–2022 Nintendo/Creatures Inc./GAME FREAK inc

3 of 12

DEFCON 30

  • DEFCON is one of the worlds largest annual technology & information/cybersecurity conferences, is open to the public.
    • 25,000 attendees this year!�
  • Not exclusively information/cybersecurity based, though has a strong emphasis on it and related topics (Privacy, cryptography, hardware, etc.)�
  • Contains multiple “Villages” (tracks w/ dedicated demo & booth space) for things such as AI, Quantum Computing, Aerospace, etc.�
  • Very open, inclusive atmosphere of a large number of technical professionals and hobbyists from all fields/backgrounds, attending for professional and personal interests�
  • Lots of “bleeding edge” technology, discussion, and experts present �
  • Large US Govt presence to interact, recruit, and engage with the community
    • US MIL/DDS, DoD, DHS, NSA, and a number of US National Labs (INL, PNNL, etc.)
    • Some there “officially” with a booth etc., some attending (personally and professionally)

4 of 12

DEFCON 30 Demo Labs - hls4ml Live Demonstration

  • We submitted & were selected to present a live demo of hls4ml in the�“Demo Labs” portion of DEFCON 30
  • Showed a live real time image classification demo, along with �background and how to use the tool to generate FPGA firmware

Using the Pynq Software stack

(Python API to interact with & program FPGA, hosts Jupyter directly on Pynq-Z2 Board)��Have a live webcam running inferences via HLS4ML accelerator, outputting to an HDMI display

Class: PIkachu

5 of 12

DEFCON 30 Demo Labs - hls4ml Live Demonstration

  • Reception was good! Lots of interest from the attendees (~25-30, full room) with follow up questions, good feedback on future directions to take the tool, what users applications and interests are, what perceived “strengths” hls4ml has
    • Industrial, IoT applications
    • Edge/Low Power
    • Interest in developing a fully open source toolchain (vs use of vendor tools)
    • Custom/flexible/open solutions are a main strength vs other tools
      • “Bring your own model” is appealing vs other “black box” (application specific) solutions�
  • Lots of meaningful “chance” encounters waiting in line, at villages, etc. with people from industry & government, leading to demo attendees and follow up contact post-conference!

6 of 12

6

“RN07” (v0.7):

58,115 parameters

83.5% acc. on CIFAR-10*

(note: removed activations)

Example Model - Image Classification

  • This is a 2D Convolutional Neural Network
    • Originally based on Resnet-8…
    • …but we removed the residual connections and changed the architecture a bit
    • Quantized weights, biases, and inputs to 8b (via QKeras)
  • Trained to distinguish between 10 Classes, originally from CIFAR-10 (32x32 px, 24b RGB images)
    • ����etc.
  • …but we also retrained it on Pokémon for this live demo

“Pokémon” © 1995–2022 Nintendo/Creatures Inc./GAME FREAK inc

7 of 12

7

Dataset - Pokémon

Example images of each (test) class

“Pokémon” © 1995–2022 Nintendo/Creatures Inc./GAME FREAK inc

hls4ml tutorial

Aug 13, 2022

8 of 12

8

TUL Pynq-Z2 w/ Xilinx Zynq XC7Z020

ARM Cores (PS)�

* Run OS (Ubuntu), Network, USB, etc.

�* Host Jupyter Server w/ Python Code

�* Image Capture & processing

FPGA (PL)�

* Perform NN Inference

�* Output HDMI

* Image Preprocessing**

AXI DMA

** Capable of accelerating some OpenCV operations, but we ran out of time :)

Demo Hardware - Pynq Z2

Zynq XC7Z020 Block Diagram

9 of 12

Neural Network

HDMI Out

Add Text�(Prediction)

Crop

Resize to�32x32px�(Bilinear)

Image is natively 640x480,

24 bit (3x8b) RGB

Image to Display

Pred. Class

Demo #1 - Image Processing Flow

FPGA

CPU

Class: PIkachu

“Pokémon” © 1995–2022 Nintendo/Creatures Inc./GAME FREAK inc

10 of 12

Using the Pynq Software stack

(Python api to interact with & program FPGA, hosts Jupyter directly on Pynq)��Run a sample model on the accelerator & MCU with live representation on screen to demo speed of accelerator vs a regular MCU

DDR3 RAM

Power

ACCELERATOR�Inferences/Second: XX�Progress: 45/100 Images

MICROCONTROLLER�Inferences/Second: XX�Progress: 32/100 Images

Demo #2 - Inference Race

11 of 12

Demo #2 - Inference Race

12 of 12

12

Dataset - CIFAR 10

  • The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
  • The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
    • The dataset is divided into five training batches and one test batch, each with 10000 images.
    • The test batch contains exactly 1000 randomly-selected images from each class.
    • The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another.
    • Between them, the training batches contain exactly 5000 images from each class.
    • The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Example images of each class

Dataset, images, and text from: �https://www.cs.toronto.edu/~kriz/cifar.html

hls4ml tutorial

Aug 13, 2022