CS-773 Paper Presentation���REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute
�
Pranshu S Negi�ArchMages (#0)
180050076@iitb.ac.in
1
Contents
2
Motivation
3
CPU
GPU
DSA
FPGA
Motivation
4
Characterization of DNN Primitives
Convolution
5
Characterization of DNN Primitives
Inner-product
6
Performance Characterization
7
REDUCT : Overview
8
REDUCT Support Extensions (rSX) ISA
New set of rSX instructions
Kernel instructions decoded and allocated new TFU Code Registers
9
REDUCT Support Extensions (rSX)
10
Example :
parallel for :
Loadrsx Weight R1 <- [A1 + Δ1]
Loadrsx Weight R2 <- [A2 + Δ1]
Loadrsx Weight R3 <- [A3 + Δ1]
Loadrsx Weight R4 <- [A4 + Δ1]
Loadrsx Input R5 <- [A5 + Δ2 + Δ3]
MACrsx R{6 + δ + θ} += R1, R5
MACrsx R{7 + δ + θ} += R2, R5
MACrsx R{8 + δ + θ} += R3, R5
MACrsx R{9 + δ + θ} += R4, R5
Loop
Loop
end parallel for
Store Outputs (R6, R7 … )
11
REDUCT Support Extensions (rSX)
12
Tensor Functional Units
13
Leveraging SMT
14
Micro-Architectural Support
15
Micro-Architectural Support (contd.)
16
Programming Model Support
17
Simulation Parameters
18
REDUCT Configurations
19
Convolution
20
Inner Product
21
Performance and Power
22
DNN Inference Summary
23
Simple ISA enhancements
Near-Cache Compute
REDUCT
Better Performance/Watt
��Thank you for listening!
Any Questions?
�
24