WES 237B: Master of Advanced Study in Wireless Embedded Systems
Pat Pannuto
Department of Computer Science and Engineering
University of California, San Diego
Summer Session 2023
WELCOME to WES 237B!
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
2
Designing and Optimizing Embedded Systems (Signal/Image/Video Processing) Applications on SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
3
Designing and Optimizing Embedded Systems (Signal/Image/Video Processing) Applications on SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
4
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
5
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Properties of embedded systems
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Properties of embedded systems
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A domain-specific + high-performance system
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A domain-specific + high-performance system
Achieving high-performance while reducing power consumption is a key, and how to do that?
There is no single answer! Usually, involves multiple factors (tradeoff) latency, throughput, power, cost,...
What is an embedded system?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A domain-specific + high-performance system
Achieving high-performance while reducing power consumption is a key, and how to do that?
There is no single answer! Usually, involves multiple factors (tradeoff) latency, throughput, power, cost,... What is ”performance” anyway??
How to design an embedded system for this app?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
void matrix_vector_multiply(int in1[SIZE][SIZE], int in2[SIZE], int out[SIZE]) {
for(int i = 0; i < SIZE; i++){
for(int j = 0; j < SIZE; j++){
acc = acc + in1[i][j] * in2[j];
}
out[i] = acc;
}
}
void prefix_sum(int in[SIZE], int out[SIZE]) {
for(int i = 1; i < SIZE; ++i){
out[i] = out[i-1] + in[i];
}
}
int main() {
matrix_vector_multiply(A, B, C);
prefix_sum(C, D);
return 0;
}
0
2
1
6
3
1
2
5
0
2
3
9
12
13
15
20
Embedded systems design 🡪 Trade-offs
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Matrix Vector Multiplication
Prefix sum
CPU
Embedded systems design 🡪 Trade-offs
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Matrix Vector Multiplication
Prefix sum
Parallel
Sequential
FPGA + CPU
Embedded systems design 🡪 Trade-offs
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Matrix Vector Multiplication
Prefix sum
Parallel
Sequential
FPGA + CPU
What if memory is bottleneck?
Embedded systems design 🡪 Trade-offs
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Matrix Vector Multiplication
Prefix sum
Parallel
Sequential
GPU ?
Embedded systems design 🡪 Trade-offs
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Matrix Vector Multiplication
Prefix sum
Parallel
Sequential
GPU ?
Can you re-write the Prefix sum (seq code) to tailor parallel architecture?
Designing and Optimizing Embedded Systems (Signal/Image/Video Processing) Applications on SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
17
Why study signal/image/video processing?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Why study signal/image/video processing?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Designing and Optimizing Embedded Systems (Signal/Image/Video Processing) Applications on SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
20
What is a System-on-chip (SoC)?
System on a Chip (SoC)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
The Zynq Book http://www.zynqbook.com/
System on a Chip (SoC)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
The Zynq Book http://www.zynqbook.com/, http://www.pynq.io/
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
UART
DSP
Memory Controller
DMA
USB
ADC/DAC
CPU
Softcore
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
UART
DSP
Memory Controller
DMA
USB
ADC/DAC
Softcore
CPU
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
UART
DSP
Memory Controller
DMA
USB
ADC/DAC
Softcore
CPU
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
UART
DSP
Memory Controller
DMA
USB
ADC/DAC
Softcore
CPU
Vendors buy ARM IP Cores and develop SoCs
A protocol
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
UART
DSP
Memory Controller
DMA
USB
ADC/DAC
Softcore
CPU
Different protocols…
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
CPU
Co-processor
Peripherals
SoC
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
CPU/GPU
Co-processor (Accelerator Design)
Peripherals
WES 237B
WES 237C
ARM
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Apple A8 (iPhone 6) … Apple A15 (iPhone 13)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
ARM Technology
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
ARM Technology
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
ARM Processor Families
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
ARM Technology
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
ARM Technology
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Specification
e.g., ARMv7
ARM IP core
Cortex-A9
SoC
e.g., Zynq
Hardware Platform
e.g., Pynq, Zedboard
WES237B Hardware
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
WES237B Hardware
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
WES 237B
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
What to expect from WES 237B?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Grading and Grading Policy
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Group Research Presentation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Student Group Presentation Topics (Old list…)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Assignments
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
WES 237B does not teach you C!
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Questions ?
Let’s take a short break
Stretch your legs, get some water, meet your neighbors…
48
Embedded Compute
Processors
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Software
Hardware
ISA
An Instruction Set Architecture is an abstraction of a computational machine
I/O system
Instr. Set Proc.
Compiler
Operating
System
Application
Digital Design
Circuit Design
Instruction Set
Architecture
Computers do not speak English�And they do not speak C or Java or Python or Haskell (or…) either
Compiler
High Level Language
(C, Java, Rust, etc)
Assembly Language
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
int temp = array[index];
array[index] = array[index + 1];
array[index + 1] = temp;
ALUOP[0:3] <= InstReg[9:11] & MASK
Machine Language
1000110001100010000000000000000
1000110011110010000000000000100
1010110011110010000000000000000
1010110001100010000000000000100
Control Signal Spec
Assembler
Machine Interpretation
Specification
Programmer
“Swap two array elements.”
The Instruction Set Architecture
Embedded Compute Units
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Processor | Description | Applications |
Microcontroller | Microcontrollers is a small computer (CPU, memory I/O) | Suitable for application that require small memory and less computation |
DSP | Processors specifically designed for signal processing Repetitive numeric computations High memory bandwidth (arrays) Real-time processing DSP objectives Power Development time Throughput (Processing) | Audio Coding/Decoding Video/Image/ Computer Vision Convolution Communications FIR, FFT |
GPU | Graphics processors (GPGPU) | General (embedded 🡪 data center) |
FPGA | Programmable Custom Hardware | General (embedded 🡪 data center) |
Design tradeoffs…
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
CPU
DSP
GPU
FPGA
ASIC
Efficiency, performance (given workload..)
Generality, flexibility
CPU (Latency) vs. GPU (Throughput)
A field-programmable gate array (FPGA)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
What is an FPGA?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A field-programmable gate array (FPGA)
What is an FPGA?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A field-programmable gate array (FPGA)
FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Why FPGAs?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
A field-programmable gate array (FPGA)
Vision (image) 🡪 High performance and low (er) power
Motivation: Applications of FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Data centers[1]
[1] Andrew Putnam et al. “ A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,”, ISCA, 2014
Motivation: Applications of FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Data centers[1]
Database[2]
[1] Andrew Putnam et al. “ A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,”, ISCA, 2014
[2] Casper et al. "Hardware acceleration of database operations", FPGA''14
Motivation: Applications of FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Data centers[1]
Database[2]
CNN [3]
[4] Rouhani et al. "SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA.", FCCM 2015
[1] Andrew Putnam et al. “ A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,”, ISCA, 2014
[2] Casper et al. "Hardware acceleration of database operations", FPGA''14
[3] Zhang et al. "Optimizing fpga-based accelerator design for deep convolutional neural networks.", FPGA 2015
Big Data[4]
Motivation: Applications of FPGA
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Data centers[1]
Database[2]
Big Data[4]
[4] Rouhani et al. "SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA.", FCCM 2015
[1] Andrew Putnam et al. “ A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,”, ISCA, 2014
[2] Casper et al. "Hardware acceleration of database operations", FPGA''14
[3] Zhang et al. "Optimizing fpga-based accelerator design for deep convolutional neural networks.", FPGA 2015
CNN [3]
Applications
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Designing an FPGA?
You will learn in WES 237C!�
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Designing an FPGA ?
VHDL,
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Designing an FPGA?
VHDL,
Verilog,
SystemVerilog,
….
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
VHDL???
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
High-Level Synthesis: A way to program FPGA from C
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Magic
VHDL (Verilog)
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
HLS
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
VHDL (Verilog)
HLS
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
VHDL (Verilog)
HLS
void HelloWorld(
int A[8],
int B[8]) {
#pragma unroll
for(int i=0;i<8;i++){
B[i]=A[i]*3;
}
}
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
VHDL (Verilog)
HLS
void HelloWorld(
int A[8],
int B[8]) {
#pragma unroll
for(int i=0;i<8;i++){
B[i]=A[i]*3;
}
}
High-Level Synthesis
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
C/C++
Directives
VHDL (Verilog)
Vivado HLS
Vivado
IP
Bitstream
void HelloWorld(
int A[8],
int B[8]) {
#pragma unroll
for(int i=0;i<8;i++){
B[i]=A[i]*3;
}
}
Summary of Embedded Systems and Trends
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
System on a Chip (SoC): Zynq
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
System on a Chip (SoC)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
System on a Chip (SoC): Zynq
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
PL Special Resources: BRAM & DSP48E1
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
PS and PL interface
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
SoC Design Flow
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Defining Requirements
Defining Specification
Software/ Hardware Partition
Software Development & Testing
Hardware Development & Testing
System Integration & Testing
Application
Let’s take a short break
Stretch your legs, get some water, meet your neighbors…
87
Compression: Huffman Encoding
Compression
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Encoder
Decoder
Huffman Encoding
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Data Compression: Huffman Encoding
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Histogram
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Input:
AABACDCDDDDEEEEEF
Histogram
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
A | 3 |
B | 1 |
C | 2 |
D | 5 |
E | 5 |
F | 1 |
Input:
AABACDCDDDDEEEEEF
Histogram
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
A | 3 |
B | 1 |
C | 2 |
D | 5 |
E | 5 |
F | 1 |
Input:
AABACDCDDDDEEEEEF
Huffman Key Idea: Use the fewest number of bits to encode the most common symbols
Sorting by frequencies
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Radix Sort
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
1 | 2 | 3 |
0 | 0 | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
1 | 1 | 1 |
0 | 0 | 2 |
1 | 2 | 3 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 2 | 3 |
| | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
Radix Sort
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
1 | 2 | 3 |
0 | 0 | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
1 | 1 | 1 |
0 | 0 | 2 |
1 | 2 | 3 |
9 | 9 | 9 |
6 | 0 | 9 |
6 | 0 | 9 |
0 | 0 | 2 |
1 | 1 | 1 |
1 | 2 | 3 |
9 | 9 | 9 |
1 | 2 | 3 |
| | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
Radix Sort
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
1 | 2 | 3 |
0 | 0 | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
1 | 1 | 1 |
0 | 0 | 2 |
1 | 2 | 3 |
9 | 9 | 9 |
6 | 0 | 9 |
6 | 0 | 9 |
0 | 0 | 2 |
1 | 1 | 1 |
1 | 2 | 3 |
9 | 9 | 9 |
0 | 0 | 2 |
1 | 1 | 1 |
1 | 2 | 3 |
6 | 0 | 9 |
9 | 9 | 9 |
1 | 2 | 3 |
| | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
Radix Sort
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Stage 1
Stage 2
Stage 3
1 | 1 | 1 |
0 | 0 | 2 |
1 | 2 | 3 |
9 | 9 | 9 |
6 | 0 | 9 |
6 | 0 | 9 |
0 | 0 | 2 |
1 | 1 | 1 |
1 | 2 | 3 |
9 | 9 | 9 |
0 | 0 | 2 |
1 | 1 | 1 |
1 | 2 | 3 |
6 | 0 | 9 |
9 | 9 | 9 |
1 | 2 | 3 |
0 | 0 | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
1 | 2 | 3 |
| | 2 |
9 | 9 | 9 |
6 | 0 | 9 |
1 | 1 | 1 |
Histogram
Prefixsum
H
P
H
Counting Sort
CS
CS
CS
Radix Sort
RS
RS
New Radix Sort
There are a number of ways of designing parallel algorithms.
Counting Sort
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
[0, 3, 2, 2, 3, 2, 0, 3, 2, 1]
(C0, C1, C2, C3) = [2, 1, 4, 3]
(O0, O1, O2, O3) = (0, 2, 3, 7)
Calculate occurrence of each number. C0 is the count of 0
We take the prefix sum of the counts transforms it to an offset table.
[0, 0, 1, 2, 2, 2, 2, 3, 3, 3]
Input array
Re-order (copy out) step where values are
Copied into the right location using prefix table.
Counting sort/Radix sort are (parallel) architecture friendly algorithms
🡺 Suitable for parallel architectures such as GPU (including multi core parallelization)
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Create Tree
Sort
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
0
0
0
1
1
1
D
E
2
17
F
A
1
0
1
B
C
4
7
10
0
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
Huffman’s Algorithm:
Caveat: Huffman is underspecified!
(Indeed, the ‘real’ algorithm only says make-and-merge trees)
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
B
1
n1
2
F
1
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
| |
| |
| |
| |
n1
2
B
1
F
1
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
| |
| |
| |
| |
n1
2
B
1
F
1
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
| |
| |
| |
| |
B
1
n1
2
F
1
n2
4
C
2
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
n2 | 4 |
| |
| |
| |
n1
2
n2
4
B
1
F
1
C
2
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
n2 | 4 |
| |
| |
| |
B
1
n1
2
F
1
n2
4
C
2
A
3
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
n2 | 4 |
n3 | 7 |
| |
| |
n1
2
n2
4
n3
7
B
1
F
1
C
2
A
3
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
n2 | 4 |
n3 | 7 |
n4 | 10 |
| |
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
n1 | 2 |
n2 | 4 |
n3 | 7 |
n4 | 10 |
n5 | 17 |
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
n5
17
Huffman’s Algorithm:
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
Create Tree
Sort
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
n5
17
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
Create Tree
Sort
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
n5
17
0
0
0
0
0
1
1
1
1
Huffman Tree Creation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
Create Tree
Sort
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
n5
17
0
0
0
0
0
1
1
1
1
Symbol | Code |
A | 00 |
F | 0100 |
B | 0101 |
C | 011 |
D | 10 |
E | 11 |
1
Huffman Tree Creation is Sequential Process
🡺 Suitable for CPU architecture.
Huffman Trees Must Save Chosen Encoding!
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Caveat: Huffman is underspecified!
(Indeed, the ‘real’ algorithm only says make-and-merge trees)
Text compression with Huffman Encoding
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Encoder
Decoder
Huffman Encoding
Codebook
Encoded Data
Canonical Huffman Encoding
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
Encoder
Decoder
Bit
Length
Parallel Bit Length Calculation
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
S | F |
F | 1 |
B | 1 |
C | 2 |
A | 3 |
D | 5 |
E | 5 |
Sort
Compute Bit Len
2
4
4
3
2
2
Create Tree
1 | 2 | 4 | 4 | 0 |
F | n1 | A | D | n3 |
B | C | n2 | E | n4 |
0 | 1 | 2 | 3 | 4 |
Parent Address
Left
Right
Address
B
1
n1
2
F
1
n2
4
C
2
n3
7
A
3
E
5
n4
10
D
5
n5
17
Software Optimization
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
119
Optimizations
Optimize 🡺 Performance
Measuring Performance
The bottom line: Performance
Ferrari
Bus
Speed
160 mph
65 mph
Time to Bay Area
3.1 hours
7.7 hours
Passengers
2
60
Throughput (pmph)
320
3900
Measures of “Performance”
Why Study Computer Performance?
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
125
Performance is Affected by,…
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
126
Performance is Affected by,…
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
127
Performance is Affected by,…
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
128
Performance is Affected by,…
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
129
Principles of Measuring Performance
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
130
Performance Metrics
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
131
There are many ways to measure program execution time
$ time make # cargo build
Compiling hail v0.1.0 (/tock/boards/hail)
Finished release [optimized + debuginfo] target(s) in 19.96s
real 0m21.146s
user 0m30.388s
sys 0m2.032s
Execution Time
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
133
How architects define Performance
PerformanceX =
1
Execution TimeX
, for program X
Relative Performance
A runs in 12 seconds
B runs in 20 seconds
Relative Performance (Speedup), the Definition
PerformanceX
Execution TimeX
PerformanceY
Speedup
Execution TimeY
=
=
=
n
(X/Y)
Example
What is Time?
CPU Execution Time = CPU clock cycles * Clock cycle time
Cycle Time
MHz = millions of cycles/second, GHz = billions of cycles/second
X MHz = 1000/X nanoseconds cycle time
Y GHz = 1/Y nanoseconds cycle time
CPU time
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
139
How many clock cycles?
Number of CPU clock cycles =
[Instruction count] * [Average Clock Cycles per Instruction (CPI)]
Exercise:
Computer A runs program C in 3.6 billion cycles.
Program C requires 2 billion dynamic instructions.
What is the CPI?
Clock Cycles Per Instruction (CPI)
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
141
The CPI of ith class
The number of instructions of ith class
Putting it all together
CPU Execution Time = [CPU clock cycles] * [Clock cycle time]
CPU clock cycles = [Instruction count] * [Average Clock Cycles per Instruction (CPI)]
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time
=
X
X
instructions
cycles/instruction
seconds/cycle
seconds
CPU Execution Time
= NI*CPI*Clock Period
CC BY-NC-ND Pat Pannuto – Many slides adapted from Janarbek Matai
143
Cycle Time/Clock Rate is no longer fixed
Who Affects Performance? How?
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time
=
X
X
Which programs are best, are “most fair”, to run when measuring performance?
Amdahl’s Law
Execution time
after improvement
=
Execution Time Affected
Amount of Improvement
Execution Time Unaffected
+
Amdahl’s Law and Massive Parallelism
.9
.1
Amdahl’s Law and Massive Parallelism
.9
.1
.1
.45
Speedup
1.0
1/.55 = 1.82
Amdahl’s Law and Massive Parallelism
.9
.1
.1
.1
.45
.225
Speedup
1.0
1/.55 = 1.82
1/.325 = 3.07
Amdahl’s Law and Massive Parallelism
.9
.1
.1
.1
.1
.45
.225
Speedup
1.0
1/.55 = 1.82
1/.325 = 3.07
< 10
Key Points