1 of 171

Wakey-Wakey

A Low-Power Reconfigurable Wake Word Accelerator�{eldrick, mjpauly}@stanford.edu

Overview

2 of 171

Wake Word Recognition

Wake words are certain words or phrases that activate a system

“Hey Siri”, “Ok Google”, “Alexa”�

Can be a standalone interface or paired with automatic speech recognition systems��Enables:

simple command parsing
isolation of commands
power-saving

“Cooper, go fetch!”

3 of 171

Motivation

Human speech interfaces enable...

Accessible, hands-free interaction
Use of natural speech and language
Many IoT / smart home applications

But face challenges in...

Power-consumption
Latency
Accuracy
Configurability

4 of 171

System Goals

Months-Long Longevity

on AA / AAA / Coin Cell
Nominal Usage (Toggling Lights)

01

02

03

Conversational Robustness

< 1s Response Time
Works in “office noise”

Accuracy & Reconfigurability

Can change wake word
> 90% accuracy on test set

We’re building a wake-word recognition system targeting:

5 of 171

System Architecture

And this is our proposal for an accelerator that can achieve those system-level goals
Let’s walk through this first from left to right
Our system interfaces with a microphone and ADC to listen and digitize audio
The digital audio signal goes throught our 3 stage processing pipeline starting with the Digital Front End which conditions the output of the ADC for speech processing
We then run through our Acoustic Featurization block which processes the conditioned audio signal into Mel-Frequency Cepstral Coefficients or MFCCs,
which are then fed into our DNN accelerator for classification.
If the audio matches someone speaking the wake word, the WAKE pin here is asserted high for the length of that window (about 100 ms).
There are two extra features to note - first is that our DNN accelerator can have its weight reconfigured via an I2C interface shown in red on the top right corner
Second is that we can use an external pin to gate our processing pipeline - many microphones on the market today include a voice activity detection pin that we can optionally use to improve our energy efficiency
Matthew will talk more on that later

6 of 171

Signal Processing Pipeline

Digital Front End

Conditions incoming digital audio signal with FIR filter.�Contains gating controller

Acoustic Featurization

Transforms conditioned audio signal into Mel-Frequency Cepstral Coefficients (MFCC), a type of acoustic feature

Word Recognition

Accelerates a 1D CNN network for recognizing the chosen wake word

DFE

ACO

WRD

7 of 171

External I/O

I2C to/from Caravel or Arduino�Read/Write Model Parameters

2) I2C / SPI / PDM to ADC + Microphone

Digitized Audio�(VM3011:“Zero-Power Listening” Microphone + ADC)

8 of 171

Software Backend�Edge Impulse

check it out at https://www.edgeimpulse.com/

Targets Embedded Platforms
Full ML Pipeline
Exports as C++ Gold Model

9 of 171

Neural Network Architecture

Estimate for Cortex-M4F @ 80MHz

1347 Parameters

10 of 171

11 of 171

Potential Extensions

Additional NN arch or MFCC parameter configurability

Audio passthrough after wake word detection

01

02

TBD! Drop us a suggestion :)

03

12 of 171

Timeline

Spring Break - Software Modeling (Gold Model), Existing IP Search

Spring Week 1 - WRD RTL + TB, ACO RTL + TB, Initial flow up to Simulation and Compilation

Spring Week 2 - WRD RTL + TB, ACO RTL + TB, Initial flow up to Synthesis, Order ADC/MIC

Spring Week 3 - WRD RTL + TB, ACO Verified, DFE RTL + TB, ADC/MIC Physical Testing

Spring Week 4 - WRD Verified, CFG RTL + TB, DFE Verified, ADC/MIC Verified

Spring Week 5 - CFG Verified, Initial Full System Synthesis, initial flow up to floorplan

Spring Week 6 - Full system verified, full system synthesis, initial flow up to signoff

Spring Week 7 - Floorplanning, power design, clocking and STA, physical design iteration

Spring Week 8 - Physical design iteration, RTL improvements

Spring Week 9 - Pre-sign off checks and improvements

Spring Week 10 - Sign-off, Send to Foundry, Tapeout!

13 of 171

Questions?

{eldrick, mjpauly}@stanford.edu

14 of 171

Motivation

- One slide recap of the motivation for your project, application areas

Your Design

- This is your complete verilog/schematic design

- Show top level block diagram and component level diagrams

- Explain how each component works

Functional Verification

- Describe how you created your gold model

- Show a list of tests you ran and a plot/screenshot showing that each test passed. This includes component level basic unit tests, end to end application tests, tests considering noise, PVT corners, variation etc.

- What further tests will you run in the remaining weeks, such as post layout validation, integration testing with caravel, more applications etc.

Design Space Exploration

Show any experiments you did to make design choices, such as sweeps of component values, parameters etc.

Evaluation

Evaluation metrics include estimates of

- area (must fit in caravel user project area)

- max frequency (with no timing violations), throughput, latency

- power, energy

- circuit specific figures of merit (efficiency, accuracy/error, temperature/voltage sensitivity etc)

For digital designs, post synthesis numbers are okay.

Plan

Plan for the remaining 5 weeks.

We will be happy to review your presentations during office hours!

15 of 171

Wakey-Wakey

A Low-Power Reconfigurable Wake Word Accelerator�{eldrick, mjpauly}@stanford.edu

Design Review 2021-05-03

16 of 171

Motivation

Human speech interfaces enable...

Accessible, hands-free interaction
Use of natural speech and language
Many IoT / smart home applications

But face challenges in...

Power-consumption
Latency
Accuracy

17 of 171

Wake Word Recognition

Wake words are certain words or phrases that activate a system

“Hey Siri”, “Ok Google”, “Alexa”�

Can be a standalone interface or paired with automatic speech recognition systems��Enables:

simple command parsing
isolation of commands
power-saving

“Cooper, go fetch!”

18 of 171

System Goals

Months-Long Longevity

on AA Battery
Nominal Usage (Toggling Lights)

01

02

03

Conversational Robustness

< 250 ms Response Time
Works in “office noise”

Accuracy & Reconfigurability

Can change wake word
> 90% accuracy on test set

We’re building a wake-word recognition system targeting:

19 of 171

System Goals - Targets

Metric	Target	Achieved	Notes
Area	< 10 mm²		MPW-TWO User Project Area Constraint
Latency	< 250 ms		Word Utterance to Wake Pin Assert
Freq.	4 MHz		Determined by sampling rate needed for PDM
Inference Energy Efficiency	? pj/Op
Idle Power Consumption	0.89 mW		6 Months on Alkaline AA Battery (3.9 Watt Hrs)
Test Set Accuracy	> 90%		Google Speech Commands Dataset + Microsoft Scalable Noisy Speech Dataset �(Yes vs No/Unknown/Noise - 25 mins. per class)
Model Size			Target set by Area & Test Set Accuracy

20 of 171

Dev Notes

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

cocotb - testbenches in Python��
DFFRAM memories��
Analog Frontend (ADC, Voice Activity Detector) scrapped due to time limits. Off-chip microphone with voice activity detector used instead��
AXI-Stream Interfaces for Datapath

Data / Valid / Last / Ready

cocotb block diagram

Xilinx AXI Stream tutorial - Part 1 (fpgasite.blogspot.com)

21 of 171

System Architecture

22 of 171

Design Progress

RTL Complete and Verified

RTL Work In Progress

Revised schedule at end of presentation
PD flow is simple: all standard cells

Software Gold Models Complete

23 of 171

DFE: The Digital Front-End!

24 of 171

DFE Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Pulse Density Modulation (PDM) Signal

Microphone output: 1b data @ 4MHz

DFE output: 8b data @ 16kHz

Vesper VM3011

25 of 171

DFE Model

Software model

PDM microphone
Filtering
Wake word detection accuracy impact: -1.0%

Processing Quality

Audible difference in quality is small
Some higher frequency noise

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

Original Audio Sample

PDM modeled + filtered

26 of 171

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

Microphone output: 1b data @ 4MHz

DFE output: 8b data @ 16kHz

Vesper VM3011

DFE Architecture Detail

27 of 171

DFE Verification Plan

RTL work in progress
Verify RTL accuracy to gold model
Capture waveform from mic sample boards, test DFE pipeline on it using FPGAs

show passing test

show tests ran

talk about edge cases

talk about unit and integration testing

talk about future testing

28 of 171

DFE: The Digital Front-End!

29 of 171

ACO: The Acoustic Featurizer!

30 of 171

ACO Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Mel-frequency cepstral coefficients (MFCCs)
Collect 1-second of audio features

20ms long frames (50 per sec)
13 coefficients per frame

Mel Scale

31 of 171

ACO Model

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

Started with SpeechPy MFCCs
Then quantized pipeline to get our bit-accurate gold model
Impact on detection accuracy: -1%

32 of 171

ACO Architecture Detail

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

33 of 171

ACO Architecture Detail

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

34 of 171

ACO Architecture Detail

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

35 of 171

ACO Verification

RTL work in progress
Verify RTL accuracy to gold model
Test with mic sample waveform

show passing test

show tests ran

talk about edge cases

talk about unit and integration testing

talk about future testing

36 of 171

ACO: The Acoustic Featurizer!

37 of 171

WRD: The DNN Accelerator!

38 of 171

WRD Model

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

NN Architecture Exploration using EdgeImpulse�
Built our own PyTorch/Numpy model for bit accuracy

Google Speech Commands Dataset + Microsoft Scalable Noisy Speech Dataset

(Yes vs No/Unknown/Noise - 25 mins. per class)�

Self-Quantized�
Google Colab Notebook is publicly available! Train your own wake word :)

https://colab.research.google.com/drive/11s4RKhQOqi4lxJz2K83RSuqHdArnLfA0?usp=sharing

39 of 171

WRD Architecture

3 Layer NN

2 Conv Layers
1 Fully Connected Layer�

2 Output Classes

Wake Word
Not Wake Word

Noise
Other Words
Silence

40 of 171

WRD Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

41 of 171

WRD Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

42 of 171

WRD Input

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

WRD 1D Convolution

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

WRD Convolution Layer HW

Cycles through 8 filters to complete

64 of 171

WRD Convolution Layer HW

Cycles through 8 filters to complete

“Recycles” the input MFCCs 8 times

65 of 171

WRD Convolution Layer HW

Cycles through 8 filters to complete

“Recycles” the input MFCCs 8 times

8 Filters * 50 Frames = 400 total cycles

66 of 171

WRD 1D Convolution

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

67 of 171

WRD Max Pool

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

68 of 171

WRD Max Pool

detailed block diagram here

RTL level considerations

interesting wave diagrams here

describe constituent modules

describe input values

describe output values

Output of Conv -> Max Pool is serial data

Next conv layer expects 25 frames, each frame with 8 output channels - need to reshape data from serial to parallel

69 of 171

WRD Architecture

design considerations�

78 of 171

WRD Fully Connected Layer

Upon completion, compare 2 class values.

Class 1 > Class 2 = WAKE!

79 of 171

WRD Fully Connected Layer

Sustain WAKE for N cycles

80 of 171

WRD Detailed Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

81 of 171

WRD Detailed Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

82 of 171

WRD Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

83 of 171

WRD Verification

show passing test

show tests ran

talk about edge cases

talk about unit and integration testing

talk about future testing

Significant unit and full WRD testing with random inputs, directed tests, and real audio examples�
Direct integration with our Python model using cocotb!

84 of 171

WRD Verification

show passing test

show tests ran

talk about edge cases

talk about unit and integration testing

talk about future testing

Significant unit and full WRD testing with random inputs, directed tests, and real audio examples�
Direct integration with our Python model using cocotb!

85 of 171

WRD: The DNN Accelerator!

86 of 171

CFG: The DNN Configurator!

87 of 171

CFG Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Wishbone Compliant Interface�
Controlled via Caravel SoC�
Considered external interfaces (I2C / SPI / UART)�
6 Wishbone Addressable 32b Registers

4 Data
1 Address
1 Control

4 Data Words Needed due to Conv 1 Memory Size (13 weights x 8b -> 104b -> 4 32b words)

Load / Store Operation via Control Register (self clearing)

88 of 171

CFG Write Transaction

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Write to Conv 1 Filter 0 Weight 0

Write Addr Register [0x3000_0000]

Parameter Address: 0x0000�

Write Data Register 0 [0x3000_0008]
Write Data Register 1 [0x3000_000C]
Write Data Register 2 [0x3000_0010]
Write Data Register 3 [0x3000_0014]�
Write Ctrl Register [0x3000_0004]

Store Command: 0x1

89 of 171

CFG Read Transaction

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Read Conv 1 Filter 0 Weight 0�

Write Addr Register [0x3000_0000]

Parameter Address: 0x0000�

Write Ctrl Register [0x3000_0004]

Load Command: 0x02

Read Data Register 0 [0x3000_0008]
Read Data Register 1 [0x3000_000C]
Read Data Register 2 [0x3000_0010]
Read Data Register 3 [0x3000_0014]

90 of 171

CFG Model

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

Synthetic wishbone transaction testing is ongoing�
Requires in-depth integration testing with Caravel and the PicoRV32

91 of 171

CFG: The DNN Configurator!

92 of 171

Let’s Zoom Back Out...

93 of 171

System Goals - Achieved

Metric	Target	Achieved	Notes
Area	< 10 mm²	WRD = 2 mm², Total = 4 mm² (est)	MPW-TWO User Project Area Constraint
Latency	< 250 ms	~ 152 us	Word Utterance to Wake Pin Assert, ~ 2447 Cycles
Freq.	4 MHz	16 MHz	Determined by sampling rate needed for PDM
Inference Energy Efficiency	? pj/Op		TODO: GL Sim via PrimeTime
Idle Power Consumption	0.89 mW	WRD = 0.794 mW	6 Months on Alkaline AA Battery (3.9 Watt Hrs), OpenSTA Leakage Estimate
Test Set Accuracy	> 90%	96%	Google Speech Commands Dataset + Microsoft Scalable Noisy Speech Dataset �(Yes vs No/Unknown/Noise - 25 mins. per class)
Model Size		1,140 Parameters (1,168 Bytes)	Target set by Area & Test Set Accuracy

94 of 171

Microcontroller Comparison

Processing time on a Cortex-M4F @ 80MHz

MFCCs:

DNN:

Power comparison coming

95 of 171

System Goals - Post Route Power for WRD

15mW 4.8mW 0.79mW 20mW

96 of 171

Fri, May 7 - CFG RTL Completed & Verified
Wed, May 12 - DFE RTL Completed & Verified
Fri, May 21 - ACO RTL Completed & Verified�
Wed, May 26 - Caravel Integration Complete

CFG Wishbone Transactions Verified
Clock Gating Implementation
Logic Analyzer Probe Integration

Fri, May 28 - GDS Streaming of Full Design�
Fri, Jun 4 - Tapeout�
???, Dec ? - Post-Silicon Testing and Debug

Compare to microcontroller-only solution
Deploy to various use cases

Upcoming Development Schedule

talk about future RTL dev

talk about future test

talk about caravel integration

talk about physical design

talk about clock or power gating - ask what to go for here

talk about debug

talk about comparing to microcontroller

latency and power

WRD, post-route!

97 of 171

{eldrick, mjpauly}@stanford.edu

Questions?

98 of 171

Wakey-Wakey

A Low-Power Reconfigurable Wake Word Accelerator�{eldrick, mjpauly}@stanford.edu

Final Presentation 2021-05-25

99 of 171

High Level Overview

100 of 171

Recap: Where we were 3 weeks ago...

RTL Complete and Verified

RTL Work In Progress

Revised schedule at end of presentation
PD flow is simple: all standard cells

Software Gold Models Complete

101 of 171

Now

102 of 171

Our Progress to Date

103 of 171

Architecture Progress To Date

Custom Model Training + Quantization

104 of 171

Architecture Progress To Date

Custom Training + Quantization Pipeline

So Many Block Diagrams!

105 of 171

Architecture Progress To Date

Custom Training + Quantization Pipeline

So Many Block Diagrams!

Microphone Part Selection + Acquisition

106 of 171

Implementation Progress to Date

39 custom verilog modules

107 of 171

Implementation Progress to Date

6,362 lines of custom RTL code

39 custom verilog modules

108 of 171

Verification Progress to Date

System Goals

Metric	Target	Achieved	Notes
Area	< 10 mm²	4.5mm²	MPW-TWO User Project Area Constraint
Latency	< 250 ms	~ 152 us	Word Utterance to Wake Pin Assert, ~ 2447 Cycles
Freq.	4 MHz	16 MHz	Determined by sampling rate needed for PDM
Inference Energy Efficiency	? pj/Op		TODO: GL Sim via PrimeTime
Idle Power Consumption	0.89 mW		6 Months on Alkaline AA Battery (3.9 Watt Hrs), OpenSTA Leakage Estimate
Test Set Accuracy	> 90%	96%	Google Speech Commands Dataset + Microsoft Scalable Noisy Speech Dataset �(Yes vs No/Unknown/Noise - 25 mins. per class)
Model Size		1,140 Parameters (1,168 Bytes)	Target set by Area & Test Set Accuracy

116 of 171

Fri, May 7 - CFG RTL Completed & Verified
Wed, May 12 - DFE RTL Completed & Verified
Fri, May 21 - ACO RTL Completed & Verified�
Wed, May 26 - Caravel Integration Complete

CFG Wishbone Transactions Verified
Clock Gating Implementation
Logic Analyzer Probe Integration (DFT)

Fri, May 28 - GDS Streaming of Full Design�
Fri, Jun 4 - Tapeout�
???, Dec ? - Post-Silicon Testing and Debug

Compare to microcontroller-only solution
Deploy to various use cases

Recap: Development Schedule

talk about future RTL dev

talk about future test

talk about caravel integration

talk about physical design

talk about clock or power gating - ask what to go for here

talk about debug

talk about comparing to microcontroller

latency and power

117 of 171

Fri, May 7 - CFG RTL Completed & Verified
Wed, May 12 - DFE RTL Completed & Verified
Fri, May 21 - ACO RTL Completed & Verified�
Wed, May 26 - Caravel Integration Complete

CFG Wishbone Transactions Verified
Clock Gating Implementation
Logic Analyzer Probe Integration (DFT)

Fri, May 28 - GDS Streaming of Full Design�
Fri, Jun 4 - Tapeout�
???, Dec ? - Post-Silicon Testing and Debug

Compare to microcontroller-only solution
Deploy to various use cases

Recap: Development Schedule

talk about future RTL dev

talk about future test

talk about caravel integration

talk about physical design

talk about clock or power gating - ask what to go for here

talk about debug

talk about comparing to microcontroller

latency and power

{eldrick, mjpauly}@stanford.edu

Questions?

127 of 171

1. Start with a brief recap of what your project was about, show the system block diagram of the chip you taped out, and how it works.

2. Describe your chip bringup process: what software you wrote, tests you ran and how you debugged the board and the chip. Include some pictures of the working board + chip showing signs of life.

3. Present your measured results, and compare them with the results from your pre-tapeout simulations. If you were not able to test your chip fully, describe the partial testing you were able to do, and what issues you are running into.

4. Finally, summarize the key contributions and takeaways of the project, including things you think you should have done differently pre-tapeout to make bringup easier.

128 of 171

Wakey-Wakey

A Low-Power Reconfigurable Wake Word Accelerator�{eldrick, mjpauly}@stanford.edu

Bring-Up Presentation 2022-06-01

129 of 171

Wake Word Recognition

Wake words are certain words or phrases that activate a system

“Hey Siri”, “Ok Google”, “Alexa”�

Can be a standalone interface or paired with automatic speech recognition systems��Enables:

simple command parsing
isolation of commands
power-saving

“Cooper, go fetch!”

130 of 171

Motivation

Human speech interfaces enable...

Accessible, hands-free interaction
Use of natural speech and language
Many IoT / smart home applications

But face challenges in...

Power-consumption
Latency
Accuracy
Configurability

131 of 171

System Goals

Months-Long Longevity

on AA / AAA / Coin Cell
Nominal Usage (Toggling Lights)

01

02

03

Conversational Robustness

< 1s Response Time
Works in “office noise”

Accuracy & Reconfigurability

Can change wake word
> 90% accuracy on test set

We’re building a wake-word recognition system targeting:

132 of 171

Hardware <> Software Co-Design

HW Conscious Quantization and Network Design

Fully Custom Verilog Modules

Low Power Microphone HW

133 of 171

High Level Overview

134 of 171

The Bring Up Process - Challenges

Only 1 Copy of Testboard

Received Test Board on Feb 1

Distributed / Remote Bring Up

Reliable GPIO Config

It’s Alive!

139 of 171

CFG Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Wishbone Compliant Interface�
Controlled via Caravel SoC�
Considered external interfaces (I2C / SPI / UART)�
6 Wishbone Addressable 32b Registers

4 Data
1 Address
1 Control

4 Data Words Needed due to Conv 1 Memory Size (13 weights x 8b -> 104b -> 4 32b words)

Load / Store Operation via Control Register (self clearing)

140 of 171

CFG Write Transaction

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Write to Conv 1 Filter 0 Weight 0

Write Addr Register [0x3000_0000]

Parameter Address: 0x0000�

Write Data Register 0 [0x3000_0008]
Write Data Register 1 [0x3000_000C]
Write Data Register 2 [0x3000_0010]
Write Data Register 3 [0x3000_0014]�
Write Ctrl Register [0x3000_0004]

Store Command: 0x1

141 of 171

CFG Read Transaction

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Read Conv 1 Filter 0 Weight 0�

Write Addr Register [0x3000_0000]

Parameter Address: 0x0000�

Write Ctrl Register [0x3000_0004]

Load Command: 0x02

Read Data Register 0 [0x3000_0008]
Read Data Register 1 [0x3000_000C]
Read Data Register 2 [0x3000_0010]
Read Data Register 3 [0x3000_0014]

142 of 171

CFG Model

how did you model this in software?

code / plots figures here

clever tricks for approximating or modeling behavior

talk about any code or test infrastructure here

Synthetic wishbone transaction testing is ongoing�
Requires in-depth integration testing with Caravel and the PicoRV32

143 of 171

It’s Alive!

144 of 171

The Bring Up Process

Vesper Microphone voice activity detection

145 of 171

Current Challenge: MPRJ GPIO Config

We have 2 places where we should be able to verify successful GPIO configuration

VAD input -> triggers PDM clock output
Write WAKE output using our logic analyzer / debug utilities�

However, we haven’t been able to configure the IOs such that either of these can be confirmed to work

Latest attempted lead was to use a slower clock, as per the ChipIgnite One Silicon Notes:https://github.com/efabless/caravel_board/blob/main/docs/chipignite_1_silicon_notes.md

Code: https://github.com/eldrickm/caravel_board/blob/main/firmware/wakey/wakey.c

any ideas? 🥺

146 of 171

DFE Architecture

design considerations�

parameter space exploration

experiments conducted

interesting technical hurdles

put component level block diagram here

Pulse Density Modulation (PDM) Signal

Microphone output: 1b data @ 4MHz

DFE output: 8b data @ 16kHz

Vesper VM3011

147 of 171

Current Challenge: GPIO Config

Our attempted slow clock workaround:

Used a 5V Arduino Uno with a voltage divider running a sketch that toggles a GPIO pin at ½ MHz (Eldrick doesn’t have a lot of equipment at his apartment)�

Boots but still can’t configure IO - suggests that slower system clock speed will not fix our issue�
Another board?

148 of 171

Current Challenge: GPIO Config

149 of 171

System Goals - ALL TBD!

Metric	Target	Achieved	Observed In Practice:	Notes
Area	< 10 mm²	4.5mm²	N/A	MPW-TWO User Project Area Constraint
Latency	< 250 ms	~ 152 us	TBD	Word Utterance to Wake Pin Assert, ~ 2447 Cycles
Freq.	4 MHz	16 MHz	TBD	Determined by sampling rate needed for PDM
Inference Energy Efficiency	? pj/Op		TBD
Idle Power Consumption	0.89 mW		TBD	6 Months on Alkaline AA Battery (3.9 Watt Hrs), OpenSTA Leakage Estimate
Test Set Accuracy	> 90%	96%	TBD	Google Speech Commands Dataset + Microsoft Scalable Noisy Speech Dataset �(Yes vs No/Unknown/Noise - 25 mins. per class)
Model Size		1,140 Parameters (1,168 Bytes)	N/A	Target set by Area & Test Set Accuracy

150 of 171

Key Contributions and Takeaways

Big endeavor in hw/sw co-design

Neural network architecture adapted to hw constraints
Fully custom neural network accelerator in Verilog

Experimenting with new tools to make dev experience better (cocotb)
Tried to push through the full open source flow

Takeaway: large gap between open source and proprietary tools

Bringup takeaway: the importance of being able to see inside

Did a pretty good job at making GPIO config testable, but could have made it even easier

Codesign

Hardware

Software

151 of 171

1 Year Rewind: What to Do Differently

Make GPIO config even more easy to test

simple input/output loopback

More floor planning to make the layout prettier

152 of 171

Glamour Shots

153 of 171

Glamour Shots

Matthew’s housewarming gift for Eldrick :)

154 of 171

eldrick@alumni.stanford.edu

mjpauly@stanford.edu

Questions?

155 of 171

Wakey-Wakey

156 of 171

157 of 171

158 of 171

159 of 171

These exciting new inventions present endless possibilities

of how technology can shape the future for the better.

160 of 171

Health

& Wellness

161 of 171

CUSTOM ANTIBIOTICS

& VACCINES

Presentations are tools that can be used as reports, and more. It is mostly presented before an audience. It serves

a variety of purposes, making presentations powerful tools for convincing and teaching.

Presentations are tools that can be used as reports, and more. It is mostly presented before an audience. It serves

a variety of purposes, making presentations powerful tools for convincing and teaching.

Presentations are tools that can be used as reports, and more. It is mostly presented before an audience. It serves

a variety of purposes, making presentations powerful tools for convincing and teaching.

01

02

03

162 of 171

SMART CLOTHING

& FIXTURES

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.

163 of 171

Computer Science and Manufacturing

164 of 171

ADVANCED

ARTIFICIAL INTELLIGENCE

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.

165 of 171

ADVANCED

3D PRINTING

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience.

01

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience.

02

03

166 of 171

Energy and Agriculture

167 of 171

FLOATING OR

HIGH RISE FARMS

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.

168 of 171

SOLAR POWER INNOVATIONS

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.

169 of 171

Transportation

& Space Research

170 of 171

HYPERSPEED TRANSPORTATION

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.

171 of 171

SPACE TOURISM

& COLONIZATION

Presentations are communication tools that can be used as demonstrations, lectures, speeches, reports, and more. It is mostly presented before an audience. It serves a variety of purposes, making presentations powerful tools for convincing and teaching.