2022 IEEE International Conference on Artificial Intelligence Circuits and Systems
Virtual & Hybrid Conference
Optimizing Accelerator Configurability for
Mobile Transformer Networks
(*) Dept of Electrical Engineering, MICAS-ESAT, KULeuven, Belgium
(**) OPPO Electronics
Steven Colleman*,Peter Zhu**,Wei Sun**,Marian Verhelst*
2
Outline
3
Outline
4
What is SU&TU?
PE array
I
mem
O
mem
…
…
FSM
5
Impact of SU: spatial utilization
6
Impact of SU: spatial utilization
C
K
C
K
C
K
C
K
C 1-40
K 1-40
C 1-40
K 41-64
C 41-64
K 1-40
C 41-64
K 41-64
7
Impact of SU: spatial utilization
8
Impact of SU: spatial utilization
9
Impact of SU: spatial utilization, illustration
| | |
ResNet101 layer K = 256, C = 1024, Ox = Oy = 14 | 96.2% Presence of C, K in layer | 0.7% Absence of G, Fx, Fy in layer |
MobileNetv2 layer G = 32, Fx = Fy = 3 Ox = Oy = 112 | 0.7% Absence of C, K in layer | 100% Presence of G, Fx, Fy in layer |
10
Impact of TU: temporal utilization
required memory BW 🡪 equations in the paper
W mem
I mem
O mem
PE array
Are BWs large enough
to provide data every clock cycle?
11
Problem statement
12
Outline
13
Used framework
to derive optimal set of SUs
(*): S. Colleman, M. Shi, and M. Verhelst,
“Hyper-flexible single core cnn execution,” in arXiv.
14
Two extensions
network
15
Outline
16
Used hardware architecture
bigger PE arrays?
1 memory level
17
Results
combining 2 SUs (flexible PE array), for all BWs
Why? The more PEs, the more difficult to keep them all busy
PEs | BW I/O memory [bits] | | Optimal individual SU |
256 | 128 | 1.557 | |
256 | 512 | 1.409 | |
256 | 2048 | 1.415 | |
8192 | 2048 | 4.124 | |
8192 | 8192 | 3.839 | |
18
Outline
19
Used hardware architecture
their impact
| BW W | BW I | BW O |
Architecture 1 | 2048 | 2048 | 2048 |
Architecture 2 | 4096 | 1024 | 1024 |
Architecture 3 | 1024 | 4096 | 1024 |
Architecture 4 | 1024 | 1024 | 4096 |
20
Results & selected SUs
EDP lowest for all BWs equal, makes sense 🡪 more options
Gain highest for BW I highest, but all gains lower than for 1 shared memory level
21
Results & selected SUs
For big PE array size
1 SU:
Combination of 2 SUs:
22
Outline
23
Results
SU | | | | | | non-DW | DW |
1 SU | 1 | 1 | 16 | 4 | 128 | For C | For Fx |
2 SUs, nr 1 | 1 | 64 | 2 | 1 | 64 | For K | For Fx |
2 SUs, nr 2 | 1 | 2 | 4 | 16 | 64 | For C | For Ox |
24
Outline
25
Results