CS-773 Paper Presentation��Hardware-Software Co-Design for
Brain-Computer Interfaces
Prakhar Diwan �Misfits (#2)
180100083@iitb.ac.in
1
Pictures adapted from Hardware-Software Co-Design for Brain-Computer Interfaces else mentioned
Outline
2
Introduction
3
What are BCIs?
4
Non-invasive BCIs
5
Invasive BCIs
6
Advancing research on Brain Functions
7
Prosthesis Control using Motor Cortex
8
Prosthesis Control using Motor Cortex
9
Prosthesis Control using Motor Cortex
10
Treatment of Neurological Diseases
FDA approves treatment via BCIs for:
And this list is growing quickly
11
Treatment of Neurological Diseases
12
BCI Research: A promising field
13
Relevance for CS 773
14
*DRAM
Software
Architecture
Relevance for CS 773
15
*DRAM
Software
Architecture
Relevance for CS 773
16
*DRAM
Software
Architecture
Ultra low-power multi-accelerator SoC
Background & BCI Tasks
17
BCI: High-Level View
18
*DRAM
BCI: High-Level View
19
*DRAM
~ 1 cm
~ 1 cm
BCI: High-Level View
20
*DRAM
~ 1mm
BCI: High-Level View
21
*DRAM
Microelectrode
BCI: High-Level View
22
*DRAM
Record/stimulate 5-10 neurons
Microelectrode
BCI: High-Level View
23
*DRAM
Microelectrode Array
BCI: High-Level View
24
*DRAM
Record/stimulate hundreds of neurons
Microelectrode Array
BCI: High-Level View
25
*DRAM
8-16 bits/sample at
20-50 KHz
BCI: High-Level View
26
*DRAM
2.4Ghz
BCI: High-Level View
27
*DRAM
Non-rechargeable
Rechargeable
BCI: High-Level View
28
*DRAM
Non-rechargeable
Rechargeable
15 years
BCI: High-Level View
29
*DRAM
Non-rechargeable
Rechargeable
15 years
Wireless, low power
Supported BCI Tasks
30
HALO
Movement intent
Seizure prediction
Spike detection
Compression
Encryption
Supported BCI Tasks
31
HALO
Seizure prediction
Analysis of neuronal firing patterns
FFTs, cross-correlation, and bandpass filters
Supported BCI Tasks
32
HALO
Movement intent
Neuronal firing pattern indicates use of limb
Stimulate motor cortex
Supported BCI Tasks
33
HALO
Spike detection
Detect spike in BCI itself
Lesser transmitted data
Supported BCI Tasks
34
HALO
Compression
Lossless compression
LZ4, LZMA, DWT
Supported BCI Tasks
35
HALO
Encryption
Future proof
AES 128 bits
BCI Requirements
36
Power consumption < 15mW
Safety critical
BCI Requirements
37
Power consumption < 15mW
Sensor data rate ~ 46 Mbps
Safety critical
High processing rate
BCI Requirements
38
Power consumption < 15mW
Sensor data rate ~ 46 Mbps
Response time ~ 5-10 ms
Safety critical
High processing rate
Real-time
Low Flexibility for High Data BW: current
39
*DRAM
Why is flexibility a concern for BCIs?
40
Tradeoff between Data BW and Flexibility
41
*DRAM
Ideal Scenario
42
*DRAM
HALO
43
HW Architecture for LOw Power BCIs
44
Flexibility
Performance
Power-Efficiency
HALO
Supporting common workloads
Hence total 8 distinct flows
45
*DRAM
Observations on two ends
46
*DRAM
Key Observations
47
Two ideas to meet requirements
48
HALO Architecture
49
*DRAM
HALO: Compression (LZ4)
50
*DRAM
HALO: Compression (LZMA)
51
*DRAM
HALO: Compression (DWTMA)
52
*DRAM
HALO: Spike Detection (DWT)
53
*DRAM
HALO: Spike Detection (NEO)
54
*DRAM
HALO: Movement Intent
55
*DRAM
HALO: Seizure Prediction
56
*DRAM
HALO: Encryption (AES)
57
*DRAM
HW-SW Codesign Techniques Applied
58
*DRAM
Technique | Direction |
Kernel PE Decomposition | SW -> HW |
PE Reuse Generalization | SW -> HW |
PE Locality Refactoring | HW -> SW |
Spatial Reprogramming | SW -> HW |
Counter Saturation | HW <-> SW |
NoC Route Selection | SW -> HW |
Kernel PE decomposition : Two ends
59
COARSER GRANULARITY
Monolithic ASICs
Each Operator receives its PE
Kernel PE decomposition : Two ends
60
COARSER GRANULARITY
Monolithic ASICs
Each Operator receives its PE
HALO
Example: Seizure Prediction Task
61
function seizure_prediction (input):
fft_out = fft(input, NUM_FFT_POINTS)
bbf_out = bbf(input, LOW_FREQ, HIGH_FREQ)
xcorr_out = xcorr(input)
p1 = svm_prediction(fft_out)
p2 = svm_prediction(bbf_out)
p3 = svm_prediction(xcorr_out)
return ((p1+p2+p3)>0)
Example: Seizure Prediction Task
62
function seizure_prediction (input):
fft_out = fft(input, NUM_FFT_POINTS)
bbf_out = bbf(input, LOW_FREQ, HIGH_FREQ)
xcorr_out = xcorr(input)
p1 = svm_prediction(fft_out)
p2 = svm_prediction(bbf_out)
p3 = svm_prediction(xcorr_out)
return ((p1+p2+p3)>0)
Sig-proc kernels -> natural boundaries
63
function seizure_prediction (input):
fft_out = fft(input, NUM_FFT_POINTS)
bbf_out = bbf(input, LOW_FREQ, HIGH_FREQ)
xcorr_out = xcorr(input)
p1 = svm_prediction(fft_out)
p2 = svm_prediction(bbf_out)
p3 = svm_prediction(xcorr_out)
return ((p1+p2+p3)>0)
Sig-proc kernels -> natural boundaries
64
function seizure_prediction (input):
fft_out = fft(input, NUM_FFT_POINTS)
bbf_out = bbf(input, LOW_FREQ, HIGH_FREQ)
xcorr_out = xcorr(input)
p1 = svm_prediction(fft_out)
p2 = svm_prediction(bbf_out)
p3 = svm_prediction(xcorr_out)
return ((p1+p2+p3)>0)
Benefits by 3x lower power consumption
65
PE Locality Refactoring
66
Example: LZMA Compression Task
67
function LZMA_COMPRESSION(input):
output = list(lzma_header)
while data = input.get() do
best_match = find_best_match(data)
match_prob = count(match_table,best_match)
/count_total(match_table)
r1 = range_encode(match_prob)
output.push(r1)
increment_counter(match_table, best_match)
end while
return output
Example: LZMA Compression Task
68
function LZMA_COMPRESSION(input):
output = list(lzma_header)
while data = input.get() do
best_match = find_best_match(data)
match_prob = count(match_table,best_match)
/count_total(match_table)
r1 = range_encode(match_prob)
output.push(r1)
increment_counter(match_table, best_match)
end while
return output
Data locality for kernel boundaries works
69
function LZMA_COMPRESSION(input):
output = list(lzma_header)
while data = input.get() do
best_match = find_best_match(data)
match_prob = count(match_table,best_match)
/count_total(match_table)
r1 = range_encode(match_prob)
output.push(r1)
increment_counter(match_table, best_match)
end while
return output
PE Reuse Generalization
70
PE Reuse Generalization
71
function seizure_prediction (input):
fft_out = fft(input, NUM_POINTS=1024)
…………
return ……
function movement_intent (input):
fft_out = fft(input, NUM_POINTS=25)
…………
return ……
Summary
72
Evaluation
73
74
“Lower the better”
HALO obeys the 15mW power budget unlike RISC-V and ASICs.
Safe and Chronic use possible.
Impact of HW-SW
Co-Design
Techniques on
XCOR and LZMA
workloads
75
“Lower the better”
For LZMA & DWTMA
Compression optimal
block size chosen
76
“Higher the better”
LZMA gives better
compression ratio
but also consumes
more power
77
Conclusion
78
Conclusion
79
80
HALO outperforming other devices
Discussion
81
Discussion
82
BACKUP
83
Spatial reprogramming Helps! Eg:XCOR
84
Block-based processing leads to lethal burst mode computation
Spatial reprogramming Helps! Eg:XCOR
85
Burst mode computation avoided by processing data as it arrives; by spatially reprogramming algorithm
86
87
Detailed Task-wise power split into PEs
88
89
90
PE Data-Locality based Refactoring