1 of 79

거북아 거북아 일하~~니?

거북목 프로젝트 beyond Google Camp 2018

https://github.com/motlabs/dont-be-turtle�

Jaewook Kang

Nov. 23th 2018

1

MoT Labs

J. Kang Ph.D. presents

2 of 79

2

J. Kang Ph.D. presents

3 of 79

Bio.

GIST EEC Ph.D. (2015)
Research Lead (~2018 5)
MoTLab Director (2018.1~)
Jeju DL Camp (2018,6-7)

(2018, 10-~)

I recently like :

Mobile Machine learning
Pose Estimation
Tensorflow / Cloud TPU
Swimming!

최근에는 자연어 처리도 쬐금씩...

3

Jaewook Kang (강재욱)

누구나 TensorFlow!

J. Kang Ph.D.

4 of 79

Proj. Contributors

4

Jaewook Kang

(Principal 삽질러)

Taekmin Kim

(해커톤 중독자)

Doyoung Kwak (iOS, coreml 갓)

Jeongah Shin

(Android tflite 갓)

DongSeok Yang

(야망개발자2)

Yonggeun Lee

(야망 개발자1)

J. Kang Ph.D. presents

5 of 79

오늘의 이야기

단순한 논문 구현은 그만! 프로젝트를 시작하자!

거북목 프로젝트의 시작과 구글캠프

Pose estimation 10 min Tour

모바일 앱을 위한 모델 사이즈를 줄이기

5

6 of 79

단순한 논문 구현은 그만! 프로젝트를 시작하자!�- 거북목 프로젝트의 시작과 딥러닝 캠프�

��

세상에 가치를 주는 일을 하자!�

6

7 of 79

딥러닝 역량 성장법

교과서 스터디

PRML, Hands on ML, 이안굿펠로우 DL책

논문 읽고 구현 or Kaggle

논문 읽고 기 구현된 repo보면서 구현 (Google official 코드 추천)
논문 읽고 스스로 구현

개인 or 팀 프로젝트
스스로 하고 싶은 프로젝트를 만들어서 구현 + 서비스화
Real world의 dirty 데이터 맛 좀 봐야 진성

7

8 of 79

딥러닝 역량 성장법

교과서 스터디

PRML, Hands on ML, 이안굿펠로우 DL책

→ 딥러닝을 위한 기본 지식 습득

논문 읽고 구현 or Kaggle

논문 읽고 기 구현된 repo보면서 구현 (Google official 코드 추천)
논문 읽고 스스로 구현

→ 딥러닝 최신 동향 + 딥러닝 도구 사용법 숙달

개인 or 팀 프로젝트
스스로 하고 싶은 프로젝트를 만들어서 구현 + 서비스화

→ 딥러닝을 통한 문제 해결 경험

8

9 of 79

거북목 프로젝트의 시작

논문 그만 보고

세상에 가치를 주는 프로젝트를 하자!

9

10 of 79

거북목 프로젝트의 시작

A typical person working in IT field

여러분을 위한 프로젝트 였으...

10

11 of 79

Project Outline

11

Goal: A solution for keeping right posture!

12 of 79

Approach Sketch

12

Estimate the coordinate of four body parts:

Head
Neck
Right shoulder
Left shoulder

13 of 79

Approach Sketch

13

14 of 79

14

15 of 79

일단 놀고

16 of 79

낮에 미친듯이 일하고

17 of 79

밤에 잠안자고 놀고 (민폐)

18 of 79

주말에 놀고

19 of 79

개발하느라 힘들었던 한달

(사실 두달)

19

6월

7월

20 of 79

Frameworks:

- Tensorflow

iPhone X +CoreML
FPS: 25~30
Model size (mlmodel) : 1.42 MB
이 모델은 870장으로 훈련한거

Jaewook Kang et al.

20

Don’t Be Turtle Proj

https://github.com/motlabs/dont-be-turtle

Prototype App v0.4 (18.Aug)

J. Kang Ph.D. presents

21 of 79

Frameworks:

- Tensorflow

iPhone X +CoreML
FPS: 25~30
Model size (mlmodel) : 1.42 MB
이 모델은 870장으로 훈련한거

Jaewook Kang et al.

21

Don’t Be Turtle Proj

https://github.com/motlabs/dont-be-turtle

Prototype App v0.4 (18.Aug)

J. Kang Ph.D. presents

22 of 79

앱은 봄이 오기전에 마켓에 나올꺼예요!

일하자! MoT 모바일 개발팀

22

J. Kang Ph.D. presents

23 of 79

Pose Estimation 10min Tour

for 거북목 프로젝트

거북목 프로젝트도 결국 RnD

23

Jaewook Kang

누구나 TensorFlow!

J. Kang Ph.D.

24 of 79

프로젝트의 핵심

24

저현고 인공지능 특강

J. Kang Ph.D.

25 of 79

프로젝트의 핵심: Pose Estimation!

25

저현고 인공지능 특강

J. Kang Ph.D.

26 of 79

Human Pose Estimation

Human Pose Estimation을 한마디로 요약하면?

26

누구나 TensorFlow!

J. Kang Ph.D.

27 of 79

Human Pose Estimation

Human Pose Estimation을 한마디로 요약하면?

카메라 입력으로부터 사람 신체구조를 찾는 것

27

누구나 TensorFlow!

J. Kang Ph.D.

28 of 79

거북목 프로젝트는 가장 기본이 되는

2D single pose estimation

졸라맨 에스티멘이션 을 기반으로 합니다.

28

누구나 TensorFlow!

J. Kang Ph.D.

29 of 79

Human Pose Estimation

Pose estimation =

Localization + Classification

29

누구나 TensorFlow!

J. Kang Ph.D.

30 of 79

Human Pose Estimation

Pose estimation =

Localization + Classification

30

(x,y)=(0.34,0.92)

누구나 TensorFlow!

J. Kang Ph.D.

31 of 79

Human Pose Estimation

Pose estimation =

Localization + Classification

31

Head

Neck

Rshoulder

Lshoulder

누구나 TensorFlow!

J. Kang Ph.D.

32 of 79

Human Pose Estimation

Pose estimation

Localization: 입력 image로부터 신체부위 (keypoint)의 위치를 찾는 일
Classification: 찾은 신체 부위를 종류를 구분하는 일

32

Pose coordinate

Prediction

head=(0.1, 0.3)
neck=(0.2, 0.6)
RShoulder=(0.3, 0.1)
LShoulder=(0.1, 0.9)

누구나 TensorFlow!

J. Kang Ph.D.

33 of 79

DeepPose (Alexander’14)

기존 방법: Localization → classification

신체부위는 몬 가 잘 찾는거 같은데…

33

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

34 of 79

DeepPose (Alexander’14)

기존 방법: Localization → classification

신체부위는 몬 가 잘 찾는거 같은데…
찾은 부위가 몬지 잘 모르겠다 ㅠ

34

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

35 of 79

DeepPose (Alexander’14)

장님이 코끼리 만지는 것 처럼...

35

Image credit: http://www.1000ventures.com/design_elements/selfmade/elephant_holistic-6perceptions.png

누구나 TensorFlow!

J. Kang Ph.D.

36 of 79

DeepPose (Alexander’14)

기존 방법: Localization → classification

신체의 일부가 가려져 있거나 보이지 않으면....

36

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

37 of 79

DeepPose (Alexander’14)

기존 방법: Localization → classification

신체의 일부가 가려져 있거나 보이지 않으면…
Classification을 잘 못함ㅠ

이게 팔인가~ 다리인가~ / 사람은 대칭인가~

37

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

38 of 79

DeepPose (Alexander’14)

기존 방법: Localization → classification

신체의 일부가 가려져 있거나 보이지 않으면…
Classification을 잘 못함ㅠ

이게 팔인가~ 다리인가~ / 사람은 대칭인가~
사람의 신체구조에 대한 이해 부족

38

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

39 of 79

DeepPose (Alexander’14)

핵심: Holistic reasoning + Coordinate Regression

딥러닝을 이용하여 나무와 숲을 동시에 보자!

Localization + classification을 동시에 수행

Toshev, A., Szegedy, C., “Deeppose: Human pose estimation via deep neural networks,” CVPR 2014

39

누구나 TensorFlow!

J. Kang Ph.D.

40 of 79

DeepPose (Alexander’14)

핵심: Holistic reasoning + Coordinate Regression

딥러닝을 이용하여 나무와 숲을 동시에 보자!

Localization + classification을 동시에 수행

Global Context를 이해하는 모델!

머리는 목위에 있다!
사람 신체 구조는 대칭이라구!
안보이는 body parts도 찾을 수 있게!
찾기 어려운 body part에 대한 정확도 개선!

40

누구나 TensorFlow!

J. Kang Ph.D.

41 of 79

DeepPose (Alexander’14)

핵심: Holistic reasoning + Coordinate Regression

목표: 입력 image X 를 2D 좌표 벡터 Y로 Regression(회귀) 하는 함수를 학습하는 것

41

누구나 TensorFlow!

J. Kang Ph.D.

42 of 79

DeepPose (Alexander’14)

핵심: Holistic reasoning + Coordinate Regression

목표: 입력 image X 를 2D 좌표 벡터 Y로 Regression(회귀) 하는 함수를 학습하는 것

42

: 회귀 함수

: 전처리 함수

: 후처리 함수

누구나 TensorFlow!

J. Kang Ph.D.

43 of 79

DeepPose (Alexander’14)

핵심: Holistic reasoning + Coordinate Regression

목표: 입력 image X 를 2D 좌표 벡터 Y로 Regression(회귀) 하는 함수를 학습하는 것

43

AlexNet

X

Input image

Coordinate prediction

Head = (0.1,0.3)

Neck = (0.2,0.6)

Rshoulder = (0.8,0.1)

….

Y

True coordinate

(y0,y1,....,yk)

L2 loss

Train

.

전처리

후처리

누구나 TensorFlow!

J. Kang Ph.D.

44 of 79

DeepPose (Alexander’14)

문제: Localization 정확성이 부족해ㅠ

Global context !

즉 신체 구조에 대한 이해 부족!�

Special thank to 피카소! :-)

누구나 TensorFlow!

J. Kang Ph.D.

45 of 79

DeepPose (Alexander’14)

개선: Pose displacement regression

Regressors의 직렬연결로 반복 개선
한번 가르쳐서 못 알아듣는건 사람이나 기계나

�

누구나 TensorFlow!

J. Kang Ph.D.

46 of 79

DeepPose (Alexander’14)

개선: Pose displacement regression

�

.

*

1st regression

(y0,y1)^S=0

2nd regression

(y0,y1)^S=1

누구나 TensorFlow!

J. Kang Ph.D.

47 of 79

DeepPose (Alexander’14)

개선: Pose displacement regression

Regressors의 직렬연결로 반복 개선

이전 prediction 와 True값의 차이(displacement)를 예측하여 개선�

AlexNet

X with i-th box

from prev stage

Displacement

prediction

L2 loss

Train

Displacement

from prev stage

.

Single stage regressor

전처리

후처리

누구나 TensorFlow!

J. Kang Ph.D.

48 of 79

DeepPose (Alexander’14)

개선: Pose displacement regression

Regressors의 직렬연결로 반복 개선

이전 prediction 와 True값의 차이(displacement)를 예측하여 개선�

X

Input image

True coordinate

Y=(y0,y1,....,yk)

1st stage

regressor

.

S-th stage

regressor

Coordinate

prediction

Head = (0.1,0.3)

Neck = (0.2,0.6)

Rshoulder = (0.8,0.1)

….

.

X

Input image

Displacement

누구나 TensorFlow!

J. Kang Ph.D.

49 of 79

DeepPose (Alexander’14)

개선: Pose displacement regression

반복 학습으로 Localization 정확도 증가
Global context understanding!:

한번 학습한 좌표를 다시 입력으로 주어 모델이 신체 구조를 더 잘 이해하도록 함

�

49

누구나 TensorFlow!

J. Kang Ph.D.

50 of 79

Convolutional Heatmap Regressor

Coordinate regression의 한계

좌표값만 출력하기 때문에 모델 prediction Y가 포함하는 표현력 (정보)가 적음

직렬연결 + 반복 coordinate prediction을 해도 충분한 location의 정확도 개선이 어려움

이용할 수 있는 정보가 애당초 적으니깐!

�

50

누구나 TensorFlow!

J. Kang Ph.D.

51 of 79

Convolutional Heatmap Regressor

핵심: Use heatmap regression

Heatmap: 2D Pixelwise likelihood map for part locations
heatmap의 픽셀단위로 part localization의 confidence값을 제공

part localization의 full 정보를 제공하는 것임

51

image credit: https://arxiv.org/abs/1609.01743

누구나 TensorFlow!

J. Kang Ph.D.

52 of 79

Convolutional Heatmap Regressor

핵심: Use heatmap regression

모델의 출력 값으로 heatmap을 사용
좌표값을 뽑아내기 위한 argmax()를 사용

52

Model

X

Input image

Heatmap prediction

Y

True coordinate

(y0,y1,....,yk)

Some loss fn

Train

Heatmap

generator

.

True

Heatmap

.

누구나 TensorFlow!

J. Kang Ph.D.

53 of 79

Convolutional Heatmap Regressor

핵심: Use heatmap regression!!

다양한 형태의 heatmap regressor가 제안됨
Not limited to the below:

53

누구나 TensorFlow!

J. Kang Ph.D.

54 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!

54

image credit: FLIC dataset

누구나 TensorFlow!

J. Kang Ph.D.

55 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!

다양한 크기의 receptive field를 가지도록 해야!

55

image credit: FLIC dataset

누구나 TensorFlow!

J. Kang Ph.D.

56 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!

56

image credit: FLIC dataset

누구나 TensorFlow!

J. Kang Ph.D.

57 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!
해결방향1: Convolutional Pose Machine (Wei CVPR16)

Multi-stage feature learning을 하여 effective receptive field를 점차 키워나간다.
conv filter size를 다양하게 사용

57

image credit: https://arxiv.org/abs/1602.00134

누구나 TensorFlow!

J. Kang Ph.D.

58 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!
해결방향2: Stacked Hourglass (Newell, ECCV 16)

Multi-stage encoder-and-decoder 구조
Single channel pipeline

skip connection + max pooling + 같은 크기의 conv filter

58

누구나 TensorFlow!

J. Kang Ph.D.

59 of 79

Multiscale Understanding

문제: 다양한 크기의 사람 객체를 처리해야!
해결방향2: Stacked Hourglass (Newell, ECCV 16)

Global context 이해를 위한 반복적 Hourglass stacking!
Iterative heatmap update in a feedward manner

59

누구나 TensorFlow!

J. Kang Ph.D.

60 of 79

Beyond...

Beyond 2D single pose estimation

Multi pose estimation

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields - [CODE] - Cao, Z., Simon, T., Wei, S., & Sheikh, Y. (CVPR 2017)

3D pose estimation

3D Human Pose Estimation in the Wild by Adversarial Learning - Yang, W., Ouyang, W., Wang, X., Ren, J.S., Li, H., & Wang, X. (2018)
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera - [CODE] - Mehta, Dushyant et al. (SIGGRAPH 2017)

Person generation

Deformable GANs for Pose-based Human Image Generation - [CODE] - Siarohin, A., Sangineto, E., Lathuilière, S., & Sebe, N. (CVPR 2018)
Dense Pose Transfer - Neverova, N., Guler, R.A., & Kokkinos, I. (ECCV 2018)

60

Image credit:

https://arxiv.org/abs/1312.4659

누구나 TensorFlow!

J. Kang Ph.D.

61 of 79

모바일 앱에 올리기 위한 모델사이즈 줄이기�

��어디 한번 실전에 적용해 보자!��

61

J. Kang Ph.D. presents

62 of 79

Our Baseline Model

Stacking 4 hourglass (HG) layers

For understanding global context

Each HG layer has 4 stages

For containing four feature maps

62

J. Kang Ph.D. presents

63 of 79

Our Baseline Model

Stacking 4 hourglass (HG) layers

For understanding global context
Residual learning conv layer with 256 channels

Each HG layer has 4 stages

For containing four feature maps

63

J. Kang Ph.D. presents

64 of 79

Our Baseline Model

모바일 앱에는 올라갈 수 없는 사이즈ㅜㅜ

855.8MB 앱 너 같으면 다운 받겠냐 ?
거북목이 중요해 배터리가 중요해?

64

J. Kang Ph.D. presents

65 of 79

65

66 of 79

사이즈를 줄이기 위한 방향

기술적인 방법을 사용해서 해결 방향

Pruning, Quantization, others..

모바일 사용자 환경 분석하는 해결 방향�

66

J. Kang Ph.D. presents

67 of 79

사이즈를 줄이기 위한 방향

기술적인 방법을 사용해서 해결 방향

Pruning, Quantization, others..

복잡하고, 오래 걸리고, 성능 열화가 있음, 삽질우려

모바일 사용자 환경 분석하는 해결 방향!!!
1) 사용자 환경 분석
2) 입력 데이터의 범주 한정
3) 모델 단순화 가능

고전적인 Model Selection 문제!

67

J. Kang Ph.D. presents

68 of 79

사이즈를 줄이기 위한 방향

두가지 질문!

Global Context understanding이 필요한가?
Multi-scale understanding이 필요한가?�

68

J. Kang Ph.D. presents

69 of 79

사이즈를 줄이기 위한 방향

거북목 앱 사용환경: 우리가 일하는 환경!
사람을 올려다 보는 각도로 스마트폰 거치
인물의 어깨선 까지 카메라 시선에 들어오는 환경

→ 총 4 종류 bodypart (Head / Neck / Lshoulder / Rshoulder)

스마트폰 카메라에서 사람 머리까지의 거리 0.3~1.0 m 상정
사람은 일하면서 몸은 거의 움직이지 않음 → 키보드 워리어

�

69

J. Kang Ph.D. presents

70 of 79

거북목앱에 Global Context이해가 필요해?

카메라 앵글이 항상 일정 / 신체 움직임 적음
4종류 bodypart가 항상 카메라에 노출 됨

Global context 이해는 검출이 잘 안되는/occluded 신체 부위 을 위함

→ 필요없음

70

1/4배

Proposed: Single HG model

J. Kang Ph.D. presents

71 of 79

거북목 앱에 Multi-scale 이해가 필요해?

카메라에 입력되는 사람의 스케일 차이작음!

입력 마다 신체부위의 크기차이가 제한적

엄청난 Multi-scale understanding은 필요없음

71

Conventional: four stages

J. Kang Ph.D. presents

72 of 79

거북목 앱에 Multi-scale 이해가 필요해?

카메라에 입력되는 사람의 스케일 차이작음!

입력 마다 신체부위의 크기차이가 제한적

엄청난 Multi-scale understanding은 필요없음

72

Proposed: two stages

J. Kang Ph.D. presents

73 of 79

Step1

Step2

Step3

Step4

855.8MB

220.3MB

258.4MB

4.56MB

1.Baseline

2. Single HG model

+ Inverted bottleneck

3. Feature Space

Optimization

5. HG Stage

Reduction

X0.3

X0.006

1.55MB

3.83MB

2.17MB

<1MB ??

거북목 프로젝트 모델 줄인 이야기

J. Kang Ph.D. presents

74 of 79

Sample Prediction

�

74

Batchsize=64, lr=1e-3,

Alpha=0.0625

Expansion rate =10

2.19MB

Batchsize=64, lr=1e-3,

Alpha=0.0625,

Expansion rate =7

1.55MB

J. Kang Ph.D. presents

75 of 79

Sample Prediction

�

75

Batchsize=64, lr=1e-3,

Alpha=0.0625

Expansion rate =10

2.19MB

Batchsize=64, lr=1e-3,

Alpha=0.0625,

Expansion rate =7

1.55MB

J. Kang Ph.D. presents

76 of 79

Sample Prediction

�

76

Batchsize=64, lr=1e-3,

Alpha=0.0625

Expansion rate =10

2.19MB

Batchsize=64, lr=1e-3,

Alpha=0.0625,

Expansion rate =7

1.55MB

J. Kang Ph.D. presents

77 of 79

Sample Prediction

�

77

Batchsize=64, lr=1e-3,

Alpha=0.0625

Expansion rate =10

2.19MB

Batchsize=64, lr=1e-3,

Alpha=0.0625,

Expansion rate =7

1.55MB

J. Kang Ph.D. presents

78 of 79

오늘의 이야기

단순한 논문 구현은 그만! 프로젝트를 시작하자
메시지:

세상에 가치를 주는 프로젝트를 하자!
구글 캠프는 최고의 인공지능 교육 프로그램이다.

Pose estimation 10 min Tour
메시지:

heatmap를 써야함
Iterative regression for Global context 이해
Hourglass / CPM for Multi-scale 이해

모바일 앱을 위한 모델 사이즈를 줄이기
메시지:

모바일 UX를 잘 관찰하면 더 좋고 단순한 해결방법이 반드시 존재한다!
복잡한 해결보다 간단한 해결이 항상 낫다!

78

79 of 79

79

야호! 끝났다!

이제 DevFest를 즐기자!