1 of 5

(B) K=0

(C) K=1

(E) Cin=0Wf =0

MAC

MAC

MAC

MAC

MAC

MAC

(F) Cin=0,Wf =1

(G) Cin=0,Wf =2

(H) Cin=1,Wf =0

MAC

(F) Cin=1,Wf =1

MAC

(G) Cin=1,Wf =2

GEMM

mGEMM

2 of 5

Cin

co

wf

co

wo

Filter

Input

Output

Cin

(B)

(C)

(D)

(A)

co

co

Filter

Input

Output

Cin

(A) GEMM

(B) mGEMM

co

(a) Im2col + GEMM

(b) GEMM Plus

wf

Input

Filter

Output

n

wo + wf - 1

Input

Filter

Output

m

k

Duplication & Shift

k

ci

ci

ci

ci

Duplication & Shift

ci

co

ci

(b) k=0

Im2col + GEMM

mGEMM

co

wo + wf - 1

(c) k=1

(d) k=2

(e) k=3

(a)

Input

Output

(g) ci=0, wf=0

(h) ci=0, wf=1

(i) ci=0, wf=2

(j) ci=1, wf=0

(f)

wf

ci

ci

Filter

MAC

MAC

MAC

m

n

k

k

wf

wf

MAC

MAC

MAC

MAC

MAC

Duplicated

& Shifted

3 of 5

눈금자, 눈금선을 .33” 에 맞추면 그리기 수월합니다

M

N

K

Output (C)

Filter (A)

Max Flop/s

Ours

Operational Intensity (Flop/Byte)

Throughput (Flop/s)

XNNPACK

OpenBlas

ARMNN

Cout

Wfil

Cin

Cin

Hfil

Hout

Cout

Output

Hin

Win

Wout

Input

Filter

Filter

Main

Memory

L2

L1

Register

Wfil

Hfil

Cin

Cout

Wout

Hout

Cout

Cin

Win

Hin

Wout Iteration

Input

Output

Filter

K

Input (B)

N Iteration

i,k

i,j

k,j

4 of 5

눈금자, 눈금선을 .33” 에 맞추면 그리기 수월합니다

ci

Input

Output

ci

Memory

Shared Last Level Cache

L1 Cache

(CPU0)

Hout xWout

Streamed from memory

Filter

Streamed into memory

Input

5 of 5

눈금자, 눈금선을 .33” 에 맞추면 그리기 수월합니다

Cout

B x Hout xWout

Output (C)

Filter (A)

Input (B)

Cin x Hfil xWfil

Cout x Hfil xWfil

B x Hout xWout

Output (C)

Filter (A)

Input (B)

Cin

Cin

kn2col

im2col

Cout

B x Hout xWout

Cin

Cin

Hfil xWfil

Cin x Hfil xWfil

Filter (A)

Input (B)

Output (C)

Cout (M)

B x Hout xWout (N)

Cin x Hfil xWfil (K)

Output

Input

Cin x Hfil xWfil (K)

Filter

Wfil

Hfil

Cin

Cout (M)

B x Hout xWout (N)

Filter

Cin x Hfil xWfil (K)

Wfil

Hfil

Cin

Output

Input

(A)

(B)

(C)

Wout

Cin x Hfil xWfil

Cin x Hfil xWfil

Cin x Hfil xWfil (K)