1 of 114

Mobile Vision Learning� TensorflowLite, Model compression, Efficient convolution �

Jaewook Kang, Ph.D.

jwkang10@gmail.com

June. 12th 2018

1

MoT Labs

누구나 TensorFlow!

J. Kang Ph.D.

2 of 114

소 개

GIST EEC Ph.D. (2015)
연구팀 리더 (~2018 5)
모두의 연구소 MoT 연구실 리더
https://www.facebook.com/jwkkang
좋아하는 것:

통계적 신호처리 / 무선통신 신호처리
C++ Native 라이브러리 구현
Mobile Machine learning
수영 덕력 6년

2

대표논문:

Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015

Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015

Jaewook Kang (강재욱)

누구나 TensorFlow!

J. Kang Ph.D.

3 of 114

1. 모바일에서 머신러닝을 한다는것��- Why on-device ML?��- 해결해줘야 하는 부분��

3

누구나 TensorFlow!

J. Kang Ph.D.

4 of 114

모바일 머신러닝

Why on-device ML?

Cloud ML의 제약

UX 측면

4

누구나 TensorFlow!

J. Kang Ph.D.

5 of 114

모바일 머신러닝

Why on-device ML?

Cloud ML의 제약

UX 측면

서비스 반응 속도=

입력 데이터 업로드시간
+클라우드 Inference 시간
+결과 다운로드 시간

5

누구나 TensorFlow!

J. Kang Ph.D.

6 of 114

모바일 머신러닝

Why on-device ML?

Cloud ML의 제약

UX 측면

서비스 반응 속도
오프라인 상황

6

누구나 TensorFlow!

J. Kang Ph.D.

7 of 114

모바일 머신러닝

Why on-device ML?

Cloud ML의 제약

데이터 소모 측면

Inference할때마다

server call 필요
입력데이터 업로드 필요

큰 데이터 소비 APP → 순삭 ㅠ

7

누구나 TensorFlow!

J. Kang Ph.D.

8 of 114

모바일 머신러닝

Why on-device ML?

Cloud ML의 제약

프라이버시 측면

개인화 ← → 프라이버시
개인화 서비스는 받고 싶은데 내데이터를 주는 건 싫다?

데이터 퓨젼의 어려움

한 클라우드 서비스에서 다양한 개인정보를 수집하기 어려움

모바일: 위치정보 / 사진 / 영상 /오디오

8

누구나 TensorFlow!

J. Kang Ph.D.

또 중요하게 생각되어 측면이 프라이버시 입니다.

지금 서비스 되고 있는 대부분의 머신러닝 서비스는 클라우드 서비스죠

클라우드에서 머신러닝을 통한 개인화 서비스를 받기 위해서는 프라이버시는 포기해야합니다.
다시말해서 클라우드 ML에서는 개인화 의 개념과 프라이버시의 개념은 상충되는 개념입니다.

또 클라우드 ML에서는 데이터 퓨젼이 굉장이 어려운데요. 그 이유는 하나의 클라우드 서비스에서 다양한 개인정보를 수집하기 어렵기 때문에죠

모바일에는 여러가지 개인 데이터가 있어서 권한만 얻으면 가능하지만, 네이버 / 카카오 / 구글 정도 되서 다양한 서비스를 가지고 있어야 데이터 퓨젼을 하는 머신러닝 서비스가 가능합니다.

9 of 114

모바일 머신러닝

해결해줘야 하는 부분

UX + 비용 측면 → On-device inference

반응 속도 (Fast Response)
배터리 (Efficient Computation)
모델 사이즈
메모리 제한?

프라이버시 측면→ On-device training

다른 사람 말고 내 얼굴을 잘 인식해라 이놈아
Transfer learning?
Personal Data fusion

9

누구나 TensorFlow!

J. Kang Ph.D.

10 of 114

모바일 머신러닝

해결해줘야 하는 부분

UX + 비용 측면 → On-device inference

반응 속도 (Fast Response)
배터리 (Efficient Computation)
모델 사이즈
메모리 제한?

프라이버시 측면→ On-device training

다른 사람 말고 내 얼굴을 잘 인식해라 이놈아
Transfer learning?
Personal Data fusion

10

누구나 TensorFlow!

J. Kang Ph.D.

11 of 114

모바일 머신러닝

11

- 하이퍼커넥트 신범준님 발표자료 중 -

누구나 TensorFlow!

J. Kang Ph.D.

12 of 114

모바일 머신러닝

Google’s Big Picture

Google’s Approach in Mobile ML env. (utblink)

Pre-training model in cloud

Local fine-Tuning with private data

in mobile APP

Run prediction engine in mobile APP

Do the same things in IoT

12

+

NN API

+

누구나 TensorFlow!

J. Kang Ph.D.

13 of 114

모바일 머신러닝

Google’s Big Picture

Google’s Approach in Mobile ML env. (utblink)

Pre-training model in cloud

Local fine-Tuning with private data

in mobile APP

Run prediction engine in mobile APP

Do the same things in IoT

13

+

NN API

+

(2016)

(2015)

(2018)

(2017)

누구나 TensorFlow!

J. Kang Ph.D.

14 of 114

2. Tensorflow Lite Preview��- About Tensorflow Lite�- Android Neural Network API �- Model conversion to tflite��

14

누구나 TensorFlow!

J. Kang Ph.D.

15 of 114

About

A lightweight ML library and tool or mobile devices

https://www.tensorflow.org/mobile/tflite/
지원 플랫폼:

Android Mobile
Raspberry Pi 3 (Android Things)
iOS

지원 ops: Tensorflow >= Tensorflow Lite
사이즈: Core Interpreter (+supp. Ops) 70kB ( 400kB)
버전: Developer preview (2017 Nov, w/ TF v1.5)

15

누구나 TensorFlow!

J. Kang Ph.D.

16 of 114

About

16

이미지 출처: https://www.tensorflow.org/mobile/tflite/

누구나 TensorFlow!

J. Kang Ph.D.

17 of 114

About

A lightweight ML library and tool or mobile devices

17

Tflite모델을

각 플랫폼의 커널에서

사용할 수 있도록

번역하는 Api

플랫폼 별 tflite

모델을 최적화용 op set

on device HW

계산 자원 할당

최적화

이미지출처: https://www.youtube.com/watch?v=FAMfy7izB6A

Android NNAPI
iOS CoreML

Tensorflow lite framework

누구나 TensorFlow!

J. Kang Ph.D.

18 of 114

About

A lightweight ML library and tool or mobile devices

18

Run on device!

이미지출처: https://www.youtube.com/watch?v=FAMfy7izB6A

Android NNAPI

Tensorflow lite framework

누구나 TensorFlow!

J. Kang Ph.D.

19 of 114

About

A lightweight ML library and tool or mobile devices

iOS develop has another option!
coreML converter 따로 있음: tfcoreml github repo

(tflite+coreml >>10배 속도>> tflite+nnapi)

19

이미지출처: https://www.youtube.com/watch?v=FAMfy7izB6A

누구나 TensorFlow!

J. Kang Ph.D.

20 of 114

About

Why TensorFlow Lite is faster?

1) FlatBuffer (https://google.github.io/flatbuffers/ ):

A new file format (data structure) to handle networks model .
Memory efficient and speedy

2) Operation kernels optimization to NEON on ARM with combination of NNAPI.
3) Hardware acceleration support

+ Android NNAPI (Android Oreo)

Qualcomm Hexagon DSP SDK (Android P)
Direct GPU support

4) Quantization: Integer-arithmetic only support

Quantize both weights and activation as 8-bit integers
Just a few parameters(bias vectors) as 32-bit integers
가장 범용적인 multiply-add instruction HW에서의 개선
용량 줄이기 보다 Inference 속도개선에 집중

20

누구나 TensorFlow!

J. Kang Ph.D.

21 of 114

Android Neural Network API

Android NN API 개요

On-deivce에서 계산효율적 ML을 위해서 설계된 Android C/C++ API

TensorFlow Lite 모델은 Android NN API의 Kernel Interpreter로 재해석 + 최적화 되어 계산 하드웨어에 연결됨.

Hardware-specific processing을 통한 inference 속도 개선!

Android 에서 잘 돌아가도록 tflite모델을 재구성 + 계산 자원 분배
디바이스가 보유하는 계산 유닛(CPU/CPU/DSP)에 효율적으로 계산 workload를 할당 할 예정
현재는 CPU만 지원됨 (2018 Mar)

Supporting Android 8.1 (API level 27 + NDK level 14) or higher

- tflite + nnapi : api level >= 27, ndk level > 14 (neon arm processor 에 최적화)
- tflite only : api level >=21 (안빠름)

tflite는 nnapi가 없어도 돌지만 그 경우 전혀 빠르지 않다!

21

This slide is

powered by J. Lee

누구나 TensorFlow!

J. Kang Ph.D.

22 of 114

Android Neural Network API

Android NN API 개요

�

22

This slide is

powered by J. Lee

.tflite파일를 JAVA wrapper API을 통해서 로드해서

2) C++ tflite Kernal Interpreter를 통해서 NNAPI 클래스로 넘겨주고

3) C++ NNAPI Op set을 이용해서 내부에서 tflite 모델을 low-level로 내부적으로 빌드한다.

4) low-level tflite 모델을 NNAPI를 통해서 실행한다.

NNAPI

누구나 TensorFlow!

J. Kang Ph.D.

23 of 114

Android Neural Network API

Android NN API 개요

�

23

누구나 TensorFlow!

J. Kang Ph.D.

24 of 114

Hardware acceleration via Android NN API

Android NN API 개요

https://developer.android.com/ndk/guides/neuralnetworks/index.html �

24

This slide is

powered by J. Lee

Android NN API Op set

nnapi_delegate→BuildGraph()

에서 생성되는 Low-level 모델은

다음과 같은 NNAPI op set을 사용해서

구성된다.

누구나 TensorFlow!

J. Kang Ph.D.

25 of 114

Android Neural Network API

Android NNAPI 프로그래밍 flow

https://developer.android.com/ndk/guides/neuralnetworks/index.html �

25

This slide is

powered by J. Lee

모델생성

Building and Compiling an NNAPI model into lower-level code

Inference실행

종료대기

누구나 TensorFlow!

J. Kang Ph.D.

26 of 114

From TF model to Android APP build

26

이미지 출처: https://www.tensorflow.org/mobile/tflite/

누구나 TensorFlow!

J. Kang Ph.D.

27 of 114

From TF model to Android APP build

27

이미지 출처: https://www.tensorflow.org/mobile/tflite/

누구나 TensorFlow!

J. Kang Ph.D.

28 of 114

Model converting to tflite

전체 Tf model to Tflite 변환 과정

28

Get a Model

Exporting the Inference Graph

Freezing the exported Graph

Conversion

to TFLITE

Model Design or Downloading
Training with training graph
Fine Tunning
Evaluate the performance

with Inference graph

Convert

Graph def

(.pb)

Check point

(.ckpt)

Frozen graph

(.pb)

Tensorflow lite

(.tflite)

This slide is

powered by S. Yang

누구나 TensorFlow!

J. Kang Ph.D.

29 of 114

Tensorflow Lite Remark

29

Tensorflow로 구현한 모델을 쉽게 모바일에 deploy 할 수 있게 하는 것이 목적

TF 구현 → tflite 변환 → android 의 NNAPI설정 + 지원 API/OS 버전의 제한

Tflite 변환이 아직 어렵고 불편함

실제로 돌렸을 때 Google pixel 2이 외에서 좋은 성능이 확인 되지 않음 (적어도 Mot에서는..)

ML first를 지향하는 Android P에서 많이 개선 될 듯!�

누구나 TensorFlow!

J. Kang Ph.D.

30 of 114

2. CNN 모델 경량화 1�- Model compression��모델을 압축하는 방법론!�

30

누구나 TensorFlow!

J. Kang Ph.D.

31 of 114

Model Compression

목표: 어떻게 하면 성능을 유지 하면서 네트워크 용량을 줄이고 inference속도를 개선할 수 있을까?

네트워크 구현에 의존적 기법

Network pruning

With weight regularization

Low rank approximation

네트워크 구현에 독립적? 기법

Weight sharing

With weight quantization

Weight quantization

31

누구나 TensorFlow!

J. Kang Ph.D.

32 of 114

Network Pruning

네트워크 구현에 의존적 기법

Network pruning
weight값이 작은 뉴런 연결을 잘라냄

Step1) training entire weights
Step2) weight thresholding
Step3) fine-training the net

with remaining weights

32

Train entire weight in net

Weight thresholding

Making net sparse!

Fine-training on the remaining weights

Satisfied performance ?

Released!

Tuning

threshold

누구나 TensorFlow!

J. Kang Ph.D.

33 of 114

Network Pruning

네트워크 구현에 의존적 기법

Network pruning

Step1) training entire weights
Step2) weight thresholding
Step3) fine-training the net

with remaining weights

장점:

압축된 모델의 성능이 보장됨

단점:

“Training → tuning → testing”

의 파이프라인이 길다. (오래걸린다)

다소 노가다 성 튜닝 ㅠ
Sparse matrix ops의 별도

의 구현이 필요하고 성능이

크게 의존한다.

최신 mobile CNN 모델에 적용되어

성능이 확인되지 않음

33

Image source:

https://arxiv.org/pdf/1506.02626.pdf

누구나 TensorFlow!

J. Kang Ph.D.

34 of 114

Network Pruning

네트워크 구현에 의존적 기법

Network pruning
weight값이 작은 뉴런 연결을 잘라냄

Step1) training entire weights
Step2) weight thresholding
Step3) fine-training the net

with remaining weights

Step4) Iterative pruning

34

Train entire weight in net

Weight thresholding

Making net sparse!

Fine-training on the remaining weights

Satisfied performance ?

Released!

Tuning

threshold

Iterative

Pruning

누구나 TensorFlow!

J. Kang Ph.D.

35 of 114

Network Pruning

네트워크 구현에 의존적 기법

Song Han etal., „Learning both Weights and Connections for Efficient Neural Networks,” 2015

35

누구나 TensorFlow!

J. Kang Ph.D.

36 of 114

Network Pruning

네트워크 구현에 의존적 기법

Song Han etal., „Learning both Weights and Connections for Efficient Neural Networks,” 2015

36

Model : VGG-16

누구나 TensorFlow!

J. Kang Ph.D.

37 of 114

Network Pruning

네트워크 구현에 의존적 기법

장점:

압축된 모델의 성능이 보장됨

단점:

“Training → tuning → testing”

의 파이프라인이 길다. (오래 걸린다)

다소 노가다 성 튜닝 ㅠ

Sparse matrix ops의 별도

의 구현이 필요하고 성능이

크게 의존한다.

AlexNet / VGG과 같은

over-parameterized된 모델에서 성능

이 잘나오는 것을 보이는 논문이 대다수

37

Train entire weight in net

Weight thresholding

Making net sparse!

Fine-training on the remaining weights

Satisfied performance ?

Released!

Tuning

threshold

누구나 TensorFlow!

J. Kang Ph.D.

38 of 114

Weight Sharing

네트워크 구현에 독립적 기법

Weight Sharing

Step1) 레이어 안에서 한번 training 된 weights를 k-mean clustering하여 mean값을 가지고k-weights 구성
Step2) cluster안의 weights를 the k-weights로 mapping하고 공유함

quantization 기법과 함께 사용하면 좋음

Step3) Fine-training!

장점:

weight개수를 획기적으로

줄일 수 있음

단점

성능 보장X
Weight Mapping table을

이용하는 별도의 구현 필요

38

Image source: Deep Compression

https://arxiv.org/pdf/1510.00149.pdf

Clustered

weight

Weight Mapping

table

Original weights

누구나 TensorFlow!

J. Kang Ph.D.

39 of 114

Weight Quantization

네트워크 구현에 독립적 기법
“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

기존 quantization기법들은 네트워크 사이즈 경량화에만 고려함

Float32 → unit8
레이어 별 내부 weight값에만 quantization적용
레이어 간 값 전달, 즉 activation값은 float32 값 사용

→ computational bottleneck

본 기법은 사이즈 + 연산 속도 동시 개선

Weight, bias, activation 모든 파이프라인을 integer modeling
Integer-arithmetic-only matrix mul을 정의

Qualcomm Hexagon과같은 mobile DSP에서 효율적 연산 가능!

일반 모바일 환경에서 속도 개선 가능성!

39

Image source:

https://arxiv.org/pdf/1712.05877.pdf

누구나 TensorFlow!

J. Kang Ph.D.

40 of 114

Weight Quantization

네트워크 구현에 독립적 기법
“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

40

Image source:

https://arxiv.org/pdf/1712.05877.pdf

float32

Quantize

Dequantize

uint8

Input

float32

기존 quantization 기법들

Weight 만 quantization!

ReLu

uint8

누구나 TensorFlow!

J. Kang Ph.D.

41 of 114

Weight Quantization

네트워크 구현에 독립적 기법

“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

41

Image source:

https://arxiv.org/pdf/1712.05877.pdf

Scale down:

logit+ activation은

uint8 표현

Cast down:

Conv + biases연산결과는
uint32 표현

제안 quantization 기법
All quantization!
ReLu6!

Integer-only arithmetic을 통해서 연산효율을 대폭 개선!

누구나 TensorFlow!

J. Kang Ph.D.

42 of 114

Weight Quantization

네트워크 구현에 독립적 기법

“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

42

Image source:

https://arxiv.org/pdf/1712.05877.pdf

Scale down:

logit+ activation은

uint8 표현

Cast down:

Conv + biases연산결과는

uint32 표현

제안 quantization 기법
All quantization!
ReLu6!

Integer-only arithmetic을 통해서 연산효율을 대폭 개선!

6

누구나 TensorFlow!

J. Kang Ph.D.

43 of 114

Weight Quantization

네트워크 구현에 독립적 기법

“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

ImageNet Latency-vs-accuracy tradeoff (over channel nums)

43

Google pixel 2

Snapdragon 835 LITTLE

Mobilenet v1, Tflite

Google pixel 1

Snapdragon 821 LITTLE

Mobilebet v1, Tflite

누구나 TensorFlow!

J. Kang Ph.D.

44 of 114

Weight Quantization

네트워크 구현에 독립적 기법

“Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, 2017

ImageNet Latency-vs-accuracy tradeoff (over channel nums)

44

Google Fixel 2

Snapdragon 835 LITTLE

Mobilenet v1, Tflite

Google Fixel 1

Snapdragon 821 LITTLE

Mobilebet v1, Tflite

결국 integer-only arithmetic 속도가 날려면

하드웨어가 받쳐줘야 한다.

누구나 TensorFlow!

J. Kang Ph.D.

45 of 114

Model Compression Remark

45

네트워크 구현과 모델 압축이 최대한 분리 해야함

대부분의 기법이 simulation 기반 이기 때문에 파이프라인을 자동화하고 단축하는게 중요
Network pruning은 simulation iteration 마다 구현이 변화

→자동화가 어려움 (물론 할 수는 있음)

Network pruning의 성능은 external sparse matrix lib에 매우 의존함

Weight sharing

Network pruning보다 파이프라인 측면에서 유리
마찬가지로 weight mapping 구현에 의존적

→ tf.variable_scope()+get_variable()!!

적절한 Cluster 개수 ”k”를 찾는 것은 노가다

현실적으로 weight quantization이 가장 유망함

네크워크 구현과의 독립성
속도 개선을 위해서는 integer-arithmetic only operation이 필수
하드웨어 의존성이 높음

모든 기법 적용 이후에 fine-training하는 것은 매우 중요! �

누구나 TensorFlow!

J. Kang Ph.D.

46 of 114

3. CNN 모델 경량화 2�- Efficient Convolution Layer ��Convolution 연산량과 파라미터 수 �좀 줄여보자!��

46

누구나 TensorFlow!

J. Kang Ph.D.

47 of 114

작고 강한 convolutional layer!

목표: 계산 복잡도/사이즈를 줄이면서 conv layer의 표현력을 유지 할 수 있는가?

아직까지 많은 CNN모델들은 over-parameterized 되어 있음

AlexNet & VGG

Model compression방식은 근본적인 한계가 있음

데이터 셋에 따라서 다른 성능을 보일 수 있음
기존 논문 결과들이 AlexNet & VGG와 같은 모델을 ref로 많이 했었기 때문에 신뢰도가 떨어짐
Model compression 방식은 말그대로 쥐어 짜는 것이지 근본적인 해결이 아님

근본적으로 convolution 연산의 계산 효율을 높일 필요가 있음!

47

누구나 TensorFlow!

J. Kang Ph.D.

48 of 114

작고 강한 convolutional layer!

목표: 계산 복잡도/사이즈를 줄이면서 conv layer의 표현력을 유지 할 수 있는가?

Cross channel pooing: Network in Network (2014)
Combination of parallel conv path: Inception (2014)
Residual learning: ResNet (2015)
Depthwise separable conv: Mobilenet_v1 (2017), Xception(2017)
Linear bottleneck: Mobilenet_v2 (2018)
Groupwise conv + shuffling: ShuffleNet (2017)
Network search: NasNet (2017)

48

누구나 TensorFlow!

J. Kang Ph.D.

49 of 114

Cross Channel Pooling

핵심질문: 더 압축적으로 spatial local correlation을 뽑아낼 수 있는 filter를 설계할 수 는 없는가?

MLP is a very good universal function approximator!

Ref: Min Lin et al. “Network in Network” (2014)

1) Nonlinear filtering: multiplayer perceptron을 filter로 사용하자!
2) Cross channel pooling: input feature map간에 filter weight를 다르게 줘서 결합 (sum-up)

49

누구나 TensorFlow!

J. Kang Ph.D.

50 of 114

Cross Channel Pooling

MLP is a very good universal function approximator!

Conv operation은 기본적으로 local patch와 filter weight사이의 spatial local correlation을 뽑아내는 것이다.

50

Fig from standford n231 material

Output map

28 X 28 X 1

+

X

W

Y

누구나 TensorFlow!

J. Kang Ph.D.

51 of 114

Cross Channel Pooling

MLP is a very good universal function approximator!

Linear conv layer에서는 filtering tile(위에서 5x5) 안에 들어오는 input feature에 대해서 weighted-sum해서 output map의 한 pixel을 생성

2d filtering인 경우 모든 input feature image (위에서 3개)에 대해서 동일한 weight사용
3d filtering인 경우 각 input feature image (위에서 3개) 에 대해서 다른 weights를 사용

51

Fig from standford n231 material

Output map

28 X 28 X 1

+

X

W

Y

누구나 TensorFlow!

J. Kang Ph.D.

52 of 114

Cross Channel Pooling

MLP is a very good universal function approximator!

52

Fig from standford n231 material

Output map

28 X 28 X 1

+

X

W

Y

누구나 TensorFlow!

J. Kang Ph.D.

53 of 114

Cross Channel Pooling

MLP conv filter (=1x1 conv)는 어떻케 동작하는가?

X : input (L=3개채널), Z: logit with M=2, 1x1xL conv filters,
Y: activation,

53

X: 3x3xL

Input features

W: 1x1xL Filter

→ Single 1x1xL conv filters (L=3,M=1)

Num of input ch

누구나 TensorFlow!

J. Kang Ph.D.

54 of 114

Cross Channel Pooling

MLP conv filter (=1x1 conv)는 어떻케 동작하는가?

X : input (L=3개채널), Z: logit with M=2, 1x1xL conv filters,
Y: activation,

54

Z1

X: 3x3xL

Input features

+

w11

w12

w13

W: 1x1xL Filter

→ Single 1x1xL conv filters (L=3,M=1)

Num of input ch

누구나 TensorFlow!

J. Kang Ph.D.

55 of 114

Cross Channel Pooling

MLP conv filter (=1x1 conv)는 어떻케 동작하는가?

X : input (L=3개채널), Z: logit with M=2, 1x1xL conv filters,
Y: activation,

55

Z1

X: 3x3xL

Input features

Z: 3x3x1

Logit features

+

W: 1x1xL Filter

→ Single 1x1xL conv filters (L=3,M=1)

w11

w12

w13

Num of input ch

누구나 TensorFlow!

J. Kang Ph.D.

56 of 114

Cross Channel Pooling

MLP conv filter (=1x1 conv)는 어떻케 동작하는가?

X : input (L=3개채널), Z: logit with M=2, 1x1xL conv filters,
Y: activation,

56

Z1

Z2

X: 3x3xL

Input features

Z: 3x3x2 (M=2)

Logit features

+

W: 1x1xL Filter

→ Two 1x1xL conv filters (L=3,M=2)

w11

Z2

w12

w13

w21

w22

w23

누구나 TensorFlow!

J. Kang Ph.D.

57 of 114

Cross Channel Pooling

MLP conv filter (=1x1 conv)는 어떻케 동작하는가?

X : input (L=3개채널), Z: logit with M=2, 1x1xL conv filters,
Y: activation,

57

Z1

Z2

X: 3x3xL

Input features

Y1

Z: 3x3xM

Logit features

Y: 3x3xM

Output features

+

W: 1x1xL Filter

→ Two 1x1xL conv filters (L=3,M=2)

w11

Z2

Y2

w12

w13

w21

w22

w23

Relu nonlinearity

Relu activation

누구나 TensorFlow!

J. Kang Ph.D.

58 of 114

Cross Channel Pooling

Cross channel Pooling

mlpconv reduces dimension of output channels!
Filter수가 입력 채널수 보다 작은 경우 (L<M)
mlpconv filter는 a 1x1 patch에 걸쳐서 모든 input channels을 weighted sum한다.

Cross Channel pooling이라고 하는 이유!

Local receptive field를 인식할 때 모든 Input feature의 특성을 동시에 고려

58

Z1

X: 3x3xL

Input features

+

w11

w12

w13

W: 1x1xL Filter

→ Single 1x1xL conv filters (L=3,M=1)

Num of input ch

누구나 TensorFlow!

J. Kang Ph.D.

59 of 114

Cross Channel Pooling

Cross channel Pooling

1x1 conv filter의 개수 만큼 output channel 출력
Filter 개수 < input channel 수 인 경우 channel pooling

59

Fig from standford n231 material

누구나 TensorFlow!

J. Kang Ph.D.

60 of 114

Combination of Parallel Conv Path

Room: Lower layers와 higher layers에서 spatial local correlation 통계 특성이 다름

At Lower layers:

Role: Parts abstraction of features
Outputs: Part features (edges, dots,…)
Filtering결과 high local correlation이 smaller local patches에서 나타남

At higher layers:

Role: Object parts abstraction by made of the lower layer outputs
Outputs: part-object features (eyes, nose, mouse..)
Filtering결과 high local correlation이 larger local patches에서 나타남

60

Area V1

Area V2

Area V3

Area V4

Retina

Edges

Object parts

Entire objects

누구나 TensorFlow!

J. Kang Ph.D.

61 of 114

Combination of Parallel Conv Path

핵심질문: 모든 layer level에서 spatial local correlation을 범용적으로 잘 뽑아내기 위한 저복잡도 layer설계는?

Ref: Szegedy etal, Going deeper with convolutions, 2014
Inception module!

Combination of parallel conv paths:

Layer input을 1x1, 3x3, 5x5 conv path로 병렬적으로 처리 한 후 concatenation해서 출력

Dimensionality reduction by 1x1 conv:

1x1 conv를 통해서 cross channel pooling을 수행하여 3x3, 5x5 conv에서의 연산량을 줄임

61

누구나 TensorFlow!

J. Kang Ph.D.

62 of 114

Combination of parallel conv paths

62

이미지 출처: https://www.slideshare.net/aurot/googlenet-insights

Role of each conv path

1x1 conv : Capture correlation from very local clusters
3x3 conv: Capture correlation from spread-out local clusters
5x5 conv: Capture correlation from more spread-out clusters

누구나 TensorFlow!

J. Kang Ph.D.

63 of 114

Combination of parallel conv paths

63

이미지 출처: https://www.slideshare.net/aurot/googlenet-insights

Then, concatenate them all into a single vector!

Which is a naïve version of the inception module

누구나 TensorFlow!

J. Kang Ph.D.

64 of 114

Combination of parallel conv paths

The Naïve inception module의 문제점

1) 5x5 conv의 계산량이 부담

2) 계속적으로 늘어나는 output feature map의 개수

Pooling path의 output feature map 수는 input feature map수와 같음
다른 conv path들의 output feature map를 아무리 줄여도 concatenate 과정에서 output feature maps의 개수는 피할 수 없이 증가함

64

누구나 TensorFlow!

J. Kang Ph.D.

65 of 114

Combination of parallel conv paths

65

이미지 출처: https://www.slideshare.net/aurot/googlenet-insights

1x1 conv (NIN) 를 dimensionality reduction용으로 사용

Cross channel pooling!
문제1 해결: conv layer 입력 feature map의 개수를 줄임
문제2 해결: pooling layer 출력 feature map의 개수를 줄임

누구나 TensorFlow!

J. Kang Ph.D.

66 of 114

Combination of parallel conv paths

Inception module v1

66

Dimensionality

reduction

Capturing correlation

From local clusters

이미지 출처: Szegedy etal, Going deeper with convolutions, 2014

Dimensionality

reduction

누구나 TensorFlow!

J. Kang Ph.D.

67 of 114

Residual Learning

목적: 네트워크 모델을 계층을 더 깊게 하고 싶다!

Does depth matter for deep learning ?

계층을 더 깊게 쌓을 수록 모델의 expressiveness는 좋아지겠지만 학습 시키기 더 어려워진다!

67

누구나 TensorFlow!

J. Kang Ph.D.

68 of 114

Residual Learning

목적: 네트워크 모델을 계층을 더 깊게 하고 싶다!
ResNet Microsoft팀의 실험 결과

깊은 모델이 오히려 Test error 가 높음!

68

이미지출처: https://arxiv.org/abs/1512.03385

누구나 TensorFlow!

J. Kang Ph.D.

69 of 114

Residual Learning

목적: 네트워크 모델을 계층을 더 깊게 하고 싶다!

Does depth matter for deep learning ?

계층을 더 깊게 쌓을 수록 모델의 expressiveness는 좋아지겠지만 학습 시키기 더 어려워진다!
두가지 문제!

Vanishing/Exploding Gradient 문제:

모델 학습에서 Backpropagation 통해서 출력 계층으로 부터 전달되는 gradient값이 계층이 깊을 수록 지수적으로 작아지는 문제
→ 모델의 기형적 학습의 원인
Batch normalization / dropout / Relu activation과 같은 기법으로 역부족

파라미터 개수 증가 → Degradation문제:

층을 쌓을 수록 학습시켜야 하는 파라미터 개수가 늘어나는 문제
기존 훈련방식으로는 깊은 층을 가지는 모델은 오히려 성능 저하를 야기함

69

누구나 TensorFlow!

J. Kang Ph.D.

70 of 114

Residual Learning

Degradation problem 정의

모델의 계층 수를 단순하게 증가시켰을때 발생하는 성능 저하문제
Overfitting 문제와는 달리 training error와 validation/test error가 동시에 열화됨

70

이미지출처: https://arxiv.org/abs/1512.03385

누구나 TensorFlow!

J. Kang Ph.D.

71 of 114

Residual Learning

Degradation problem 정의

모델의 계층 수를 단순하게 증가시켰을때 발생하는 성능 저하문제
Overfitting 인경우 계층 수를 증가시켜도

Validation/test error 는 증가
Training error는 감소 또는 유지

계층 수

증가

Overfitting problem

Validation-Training err gap 증가!

누구나 TensorFlow!

J. Kang Ph.D.

72 of 114

Residual Learning

Degradation problem 정의

모델의 계층 수를 단순하게 증가시켰을때 발생하는 성능 저하문제

ReLu / Dropout / Batch norm은 degradation문제를 해결 하지 않음!

다른 방법이 필요!

72

누구나 TensorFlow!

J. Kang Ph.D.

73 of 114

Residual Learning

Analogy: 도로교통 문제

73

출발지

이미지 출처: http://www.cs.toronto.edu/~fidler/teaching/2015/slides/CSC2523/renjie_highwayNNs.pdf

목적지

출발지

A:

B:

출발지에서 목적지로 가는 길이 있다.
A는 가능길에 빵사고 세탁물도 맡겨야 한다.
B는 그냥 빨리가서 자고 싶다.
목적이 다른 두차가 같은 길을 달린다

누구나 TensorFlow!

J. Kang Ph.D.

74 of 114

Residual Learning

Analogy: 도로교통 문제

74

출발지

이미지 출처: http://www.cs.toronto.edu/~fidler/teaching/2015/slides/CSC2523/renjie_highwayNNs.pdf

목적지

출발지

A:

B:

출발지에서 목적지로 가는 길이 있다.
A는 가능길에 빵사고 세탁물도 맡겨야 한다.
B는 그냥 빨리가서 자고 싶다.
목적이 다른 두차가 같은 길을 달린다

결론: 차가막힌다.

누구나 TensorFlow!

J. Kang Ph.D.

75 of 114

Residual Learning

Analogy: 도로교통 문제

75

출발지

이미지 출처: http://www.cs.toronto.edu/~fidler/teaching/2015/slides/CSC2523/renjie_highwayNNs.pdf

목적지

출발지

결론: 아니죠!

고속도로를 만들어야죠!

출발지에서 목적지로 가는 길이 있다.
A는 가능길에 빵사고 세탁물도 맡겨야 한다.
B는 그냥 빨리가서 자고 싶다.
목적이 다른 두차가 같은 길을 달린다

누구나 TensorFlow!

J. Kang Ph.D.

76 of 114

Residual Learning

Analogy를 본 Bob 아저씨의 질문

Conv계층의 목적은 입력채널에 대한 feature extraction!
근데 이미 입력 자체가 feature-like한 채널도 있지 않은가?

그럼 모든 입력 채널에 대해서 weight를 학습시켜야 하는가?

이미 feature-like한 채널이 다른 채널의 feature extraction에 악 영향을 주지 않을까?

76

누구나 TensorFlow!

J. Kang Ph.D.

77 of 114

Residual Learning

핵심가설:

It is easier to optimize to residual mapping than do the original.

Residual networks: (Residual learning) :

K. He etal. “Deep Residual Learning for Image Recognition,” CVPR 2016.
레이어 입출력의 변화 (Residual) 에 대해서만 학습

77

누구나 TensorFlow!

J. Kang Ph.D.

78 of 114

Residual Learning

Residual Networks (2016)

핵심질문1:

비선형 계층을 쌓아서 어떤 복잡한 함수 “Y= H(X)”를 학습 할 수 있다면…,
입출력 차이 “F(X)=H(X) – X” 도 학습 할 수 있지 않을까?

핵심질문2:

입력 채널 자체가 이미 feature인 경우 (혹은 거의 근접하게 feature인경우),…
다시말해서 identity mapping이 optimal이거나 근접한 경우,..
기존 “Y=H(X)” 보다 “F(X)=H(X) – X”로 학습하는게 쉽지 않을까?

78

누구나 TensorFlow!

J. Kang Ph.D.

79 of 114

Residual Learning

Residual Networks (2016)

입출력 함수 “Y=H(X)”을 학습하는 것이 아니고
Residual “F(X)= H(X) – X”을 학습!

79

Residual

path

Shortcut

path

누구나 TensorFlow!

J. Kang Ph.D.

80 of 114

Residual Learning

Bob 아저씨 질문에 대한 대답

이미 입력 자체가 이미 feature-like한 채널도 있지 않은가?

있죠. 그런 채널들을 Residual network에서 shortcut path를 타고 가도록 학습이 되죠.

그럼 모든 입력 채널에 대해서 weight를 학습시켜야 하는가?

Residual network에서는 자연스럽게 필요한 채널에 대해서만 weight 연산하도록 학습이 됩니다.

이미 feature-like 한 채널이 다른 채널의 feature extraction에 악영향을 주지 않을까?

기존 plain network에서는 그럴 가능성이 있습니다.
그것이 degradation 문제의 원인이라고 생각합니다.

80

누구나 TensorFlow!

J. Kang Ph.D.

81 of 114

Depthwise Separable Conv

목적: 모바일에도 올릴 수 있는 효율적인 모델을 만들고 싶다!

Layer width와 network depth를 무작정 늘리는 것은 가야 할 방향이 아니다!

핵심가설: The mapping of “cross-channels correlation” and “spatial correlation” can be entirely decoupled!

무슨 말인가?

81

Recent advances are not necessarily making network more efficient with respect to size and speed!!!

누구나 TensorFlow!

J. Kang Ph.D.

82 of 114

Depthwise Separable Conv

Cross-channel correlation:

conv layer에 입력되는 채널 간의 비슷한 정도

82

High cross-channel correlation!

누구나 TensorFlow!

J. Kang Ph.D.

83 of 114

Depthwise Separable Conv

Cross-channel correlation:

conv layer에 입력되는 채널 간의 비슷한 정도

83

Very? Low cross-channel correlation!

누구나 TensorFlow!

J. Kang Ph.D.

84 of 114

Depthwise Separable Conv

Cross-channel correlation: conv layer의 입력 채널 간의 비슷한 정도

High cross-channel correlation:

입력 채널 간의 상관도 높다.
입력 채널간 비슷한 특징을 가진다.

Low cross-channel correlation:

입력 채널 간의 상관도 낮다.
입력 채널간 구별되는 특징을 가진다.

84

누구나 TensorFlow!

J. Kang Ph.D.

85 of 114

Depthwise Separable Conv

Spatial correlation:

conv filter와 입력 채널 사이의 상관도

85

Conv filter

이미지출처: http://user-image.logdown.io/user/13673/blog/12890/post/302641/V21DnAAeTKiOirZRFKhT_dl2.png

Low correlation

High correlation

누구나 TensorFlow!

J. Kang Ph.D.

86 of 114

Depthwise Separable Conv

Spatial correlation:

conv filter와 입력 채널 사이의 상관도

High spatial correlation: 특정 conv filter로 feature extraction이 잘된다.

Low spatial correlation: 특정 conv filter로 feature extraction이 잘 안된다.

86

Conv filter

누구나 TensorFlow!

J. Kang Ph.D.

87 of 114

Depthwise Separable Conv

Depthwise separable conv의 핵심가설:

The mapping of “cross-channels correlation” and “spatial correlation” can be entirely decoupled!

그러니깐 “cross-channels correlation”과 “spatial correlation” 분리해서 다루겠다는 것!

87

누구나 TensorFlow!

J. Kang Ph.D.

88 of 114

Depthwise Separable Conv

Depthwise separable conv

Dwise conv + Pwise conv로 분할 계산 (L개 채널입력)

Input channel : N x N x L
Dwise Filter size (weight) : K x K x 1 x(L)
Pwise 1x1 filter: 1 x 1 x L X (M)

88

Depthwise Conv

Pointwise Conv

Dwise Filter Size : K x K x 1x(M)

(K=3)

Pwise Filter Size : 1 x 1 x L (x M)

(L=3)

+

누구나 TensorFlow!

J. Kang Ph.D.

89 of 114

Depthwise Separable Conv

Depthwise separable conv

1) Depthwise Convolution:

Extract spatial correlation from

a NxNx1 2D input channel

2D convolution with a single

K x K x 1 x(L) 2D filter

2) Pointwise Convolution:

Compress NxNxL input channel to

from NxNxM output channel (M < L)

Weighted linear mixing using

2D convolution with 1x1xLx(M)

1D conv filters

89

누구나 TensorFlow!

J. Kang Ph.D.

90 of 114

90

Depthwise Convolution

누구나 TensorFlow!

J. Kang Ph.D.

91 of 114

Linear Bottleneck

Cross Channel Pooling Revisit!

Is 1x1 conv filtering equivalent to matrix multiplication ?
We can look the convolution at a 1x1 pixel point (i=i*,j=j*).
Given (M=2, L=3 case)

Lx1x1 local patch vector, [X1,X2,X3]^T from X (3x3xL, L=3)
two 1x1 conv filters (1x1xLxM, M=2,L=3)
we have two different logits scalars: Z1, Z2

91

Z1

X: 3x3xL

Input features

+

w11

w12

w13

W: 1x1xL Filter

→ Single 1x1xL conv filters (L=3,M=1)

Num of input ch

누구나 TensorFlow!

J. Kang Ph.D.

92 of 114

Linear Bottleneck

Cross Channel Pooling Revisit!

Is 1x1 conv filtering equivalent to matrix multiplication ?
We can look the convolution at a 1x1 pixel point (i=i*,j=j*).
Given (M=2, L=3 case)

Lx1x1 local patch vector, [X1,X2,X3]^T from X (3x3xL, L=3)
two 1x1 conv filters (1x1xLxM, M=2,L=3)
we have two different logits scalars: Z1, Z2

92

누구나 TensorFlow!

J. Kang Ph.D.

93 of 114

Linear Bottleneck

Cross Channel Pooling Revisit!

Is 1x1 conv filtering equivalent to matrix multiplication ?
We can look the convolution at a 1x1 pixel point (i=i*,j=j*).
Given (M=2, L=3 case)

Lx1x1 local patch vector, [X1,X2,X3]^T from X (3x3xL, L=3)
two 1x1 conv filters (1x1xLxM, M=2,L=3)
we have two different logits scalars: Z1, Z2

These conv operations recast into a matrix-vector form:

93

Lx1x1 local

patch vector

Two different

logits scalars

누구나 TensorFlow!

J. Kang Ph.D.

94 of 114

Linear Bottleneck

Cross Channel Pooling Revisit!

Is 1x1 conv filtering equivalent to matrix multiplication ?
We can look the convolution at a 1x1 pixel point (i=i*,j=j*).
Given (M=2, L=3 case)

Lx1x1 local patch vector, [X1,X2,X3]^T from X (3x3xL, L=3)
two 1x1 conv filters (1x1xLxM, M=2,L=3)
we have two different logits scalars: Z1, Z2

These conv operations recast into a matrix-vector form:

94

Lx1x1 local

patch vector

Two different

logits scalars

1x1xL conv1

1x1xL conv2

누구나 TensorFlow!

J. Kang Ph.D.

95 of 114

Linear Bottleneck

Cross Channel Pooling Revisit!

Is 1x1 conv filtering equivalent to matrix multiplication ?
Consider a M=2 case

95

1x1xL conv1

1x1xL conv2

.

Input channels

1x1xNxK conv filters

Output logit

Before activation

Lx1x1 vector X

1x1xLxM

filter matrix, W

�

MX1 output logit Z

�

=

output

채널방향

input�채널방향

누구나 TensorFlow!

J. Kang Ph.D.

96 of 114

Linear Bottleneck

Room1: 과도한 cross-channel pooling

Given 1x1xL(xM) conv filters
1x1xL conv filter 개수, M → linear transform W의 row 개수

“cross-channel pooling”이 되려면 → M < L
과도한 cross-channel pooling 경우 → M << L

M<<L 경우 linear transform W이 충분한 개수의 independent basis을 가질 수 없음

W가 span하는 feature space의 dimensionality가 X의 정보를 보존하기에 충분하지 않음

96

누구나 TensorFlow!

J. Kang Ph.D.

97 of 114

Linear Bottleneck

Room1: 과도한 cross-channel pooling

97

Where “dim” indicates the dimension of activation space span by W.

이미지출처: https://arxiv.org/abs/1801.04381

Note: Activation space- (선형변환 후 feature space)

누구나 TensorFlow!

J. Kang Ph.D.

98 of 114

Linear Bottleneck

Room2: ReLu non-linearity에 의한 정보손실

ReLu는 0보다 작은 값을 출력하는 1x1 conv filtering 결과를 중요하지 않은 정보라고 판단하고 zero-mapping한다.
An 1x1 conv example:

�

98

1x1xL conv1

1x1xL conv2

1x1xL conv3

1x1xL conv4

=

.

Features

After Dwise conv

Set of 1x1xL conv filters

Output logit

Before activation

1X1XLXM

filter matrix, W

�

M X 1 output logit Z

(M=4)�

누구나 TensorFlow!

J. Kang Ph.D.

99 of 114

Linear Bottleneck

Room2: ReLu non-linearity에 의한 정보손실

ReLu는 0보다 작은 값을 출력하는 1x1 conv filtering 결과를 중요하지 않은 정보라고 판단하고 zero-mapping한다.
An 1x1 conv example:

�

99

1x1xL conv1

1x1xL conv2

1x1xL conv3

1x1xL conv4

=

.

Features

After Dwise conv

Set of 1x1xL conv filters

Output logit

Before activation

1X1XLXM

filter matrix, W

�

M X 1 output logit Z

(M=4)�

ReLu !!!!

누구나 TensorFlow!

J. Kang Ph.D.

100 of 114

Linear Bottleneck

Room2: ReLu non-linearity에 의한 정보손실

ReLu는 0보다 작은 값을 출력하는 1x1 conv filtering 결과를 중요하지 않은 정보라고 판단하고 zero-mapping한다.
An 1x1 conv example:

�

100

1x1xL conv1

1x1xL conv2

1x1xL conv3

1x1xL conv4

=

.

Features

After Dwise conv

Set of 1x1xL conv filters

Output

After activation

1X1XLXM

filter matrix, W

�

M X 1 output Y

(M=4)�

누구나 TensorFlow!

J. Kang Ph.D.

101 of 114

Linear Bottleneck

Room2: ReLu non-linearity에 의한 정보손실

ReLu는 0보다 작은 값을 출력하는 1x1 conv filtering 결과를 중요하지 않은 정보라고 판단하고 zero-mapping한다.

Activation space의 차원수 (dimensionality) 를 제한할 수 있음
ReLu에 의해서 다음이 일어난때 정보손실이 발생

�

101

X manifold 차원수 <= activation space (WX) 차원수

누구나 TensorFlow!

J. Kang Ph.D.

102 of 114

Linear Bottleneck

Room2: ReLu non-linearity에 의한 정보손실

ReLu는 0보다 작은 값을 출력하는 1x1 conv filtering 결과를 중요하지 않은 정보라고 판단하고 zero-mapping한다.

Activation space의 차원수 (dimensionality) 를 제한할 수 있음
ReLu에 의해서 다음이 일어난때 정보손실이 발생

�

102

X manifold 차원수 <= activation space (WX) 차원수

누구나 TensorFlow!

J. Kang Ph.D.

103 of 114

Linear Bottleneck

Mobilenet v2!
핵심질문: X의 정보를 효과적으로 보존하기 위해서는 conv layer의 구조를 어떻게 구성해야 하는가?

- Mark Sandler et al. “MobileNetV2: Inverted Residuals and Linear Bottlenecks”, CoRR, 2017.

103

누구나 TensorFlow!

J. Kang Ph.D.

104 of 114

Linear Bottleneck

핵심질문: Linear bottleneck의 삽입은 어떻케 ReLu non-linearity에 의한 정보손실을 해결하는가?
기존 Depthwise-separable conv (Mobilenet v1):

Q1) 마지막 Relu를 수행 전에 channel pooling을 수행해서 activation space를 먼저 제한할 필요 없지 않은가?

104

Dwise Conv

3x3x1

Pwise Conv

1x1xLxM

(M < L)

BN →ReLu6

Ch in

X

NxNxL

Ch out

Y

NxNxM

Feature maps

NxNxL

Feature map

NxNxM

Spatial

Feature extraction

Channel

Pooling

누구나 TensorFlow!

J. Kang Ph.D.

105 of 114

Linear Bottleneck

핵심질문: Linear bottleneck의 삽입은 어떻케 ReLu non-linearity에 의한 정보손실을 해결하는가?
Depthwise-separable conv (Mobilenet v2):

Linear Bottleneck: Channel pooling을 Depthwise separable conv 밖으로 !

Depthwise-separable conv 내부 Pwise conv에서 channel pooling 수행X
ReLu non-linearity를 linear bottleneck에서 보완할 수 있도록 학습됨 (soft pooling)

105

Dwise Conv

3x3x1

Linear Bottleneck

1x1xLxM

(L>M)

Ch in

X

NxNxL

Ch out

Y

NxNxM

Feature maps

NxNxL

Feature map

NxNxL

Spatial

Feature extraction

Channel

Pooling

BN →ReLu6

Pwise Conv

1x1xLxL

BN

BN →ReLu6

누구나 TensorFlow!

J. Kang Ph.D.

106 of 114

Linear Bottleneck in MobileNetv2

Channel pooling →ReLu non-linearity 구조가 activation space를 제한하여 정보손실을 야기하는 것에 주목

Linear bottleneck을 추가하여 relu non-linearity를 보완하면서 channel pooling을 할 수 있게 함

Linear bottleneck block:
Depthwise separable conv → Relu → linear bottleneck (channel pooling)

Per-layer parameters는 증가하나 channel 수 ( 1x1 conv filter개수)를 linear bottleneck구조를 가지고 현저하게 줄일 수 있음!

연산량과 사이즈를 절감 가능!

대세인 shortcut connection + grid reduction with stride conv2 사용!

Bottleneck residual block!
Expansion → squeezing 구조 사용

106

누구나 TensorFlow!

J. Kang Ph.D.

107 of 114

Efficient Convolution Remarks

107

CNN 모델은 2016을 기점으로 사이즈를 늘려서 성능을 개선시키는 방향보다, 효율적인 작은 모델을 연구 하는 방향으로 가는 추세

초창기에 제안된 AlexNet & VGG와 같은 모델은 과도하게 많은 파라미터를 포함하는 모델로 인식되어 앞으로 사용되지 않을 것임

모델들이 서로의 장점을 적극적으로 도입하여 성능을 개선

결국 모든 모델안의 conv layer 구조가 서로 닮아가는 추세
결국 함께 conv layer라는 것을 알아가는게 아닐까..

Cross channel pooling
Depthwise separable conv
Residual learning
Linear bottleneck

추가적으로 모델의 전체 accuracy-size+comp의 trade off를 조절할 수 있는 meta parameter를 설정하는 것은 매우 중요

채널 수
Feature resolution�

누구나 TensorFlow!

J. Kang Ph.D.

108 of 114

모두연 MoT랩 소개

딥러닝을 활용하여 세상을 이롭게 할 IoT/Mobile App 개발에 대한 연구를 같이 해봐요!!
후원, 참여 환영합니다!
https://motlabs.github.io/
https://www.facebook.com/lab4all/posts/761099760749661
jwkang10@gmail.com 로 메일

Keywords:

Thin CNN Model
Model optimization
Tensorflow + lite
Embedded Sys. (IoT)
Android Mobile/Things

108

누구나 TensorFlow!

J. Kang Ph.D.

109 of 114

109

누구나 TensorFlow!

J. Kang Ph.D.

110 of 114

Google Deep learning Jeju Camp 2018

24 accepted from 562 and 50+ countries
경쟁율 1 대 24
MoT에서 3명 참가 !

110

Seoyoen Yang (SNU)

Taekmin Kim (SNU)

Jaewook Kang (Modulabs)

누구나 TensorFlow!

J. Kang Ph.D.

111 of 114

MoT Contributors

Jaewook Kang

(Modulabs)

Joon ho Lee (Neurophet)

Yonggeun Lee

Jay Lee

(KakaoPay)

SungJin Lee

(DU)

Seoyoen Yang

(SNU)

Taekmin Kim

(SNU)

Jihwan Lee

(SNU)

Doyoung Kwak (PU)

Yunbum Beak

(신호시스템)

Joongwon Jwang

(위메프)

Jeongah Shin

(HanyangUniv)

누구나 TensorFlow!

J. Kang Ph.D.

112 of 114

모두의 거북목을 지켜줘 프로젝트

https://github.com/motlabs/dont-be-turtle/blob/master/readme.md

112

누구나 TensorFlow!

J. Kang Ph.D.

113 of 114

113

누구나 TensorFlow!

J. Kang Ph.D.

114 of 114

114

The End

Mobile Vision Learning 2018

누구나 TensorFlow!

J. Kang Ph.D.