1 of 20

DEEP LEARNING FOR VISUAL TRACKING

F4B305

Agustin Picard

Ricardo Andreasen

Tomas Volker

Amadou Agne

Gautier Cosne

Chayan Toufan Tabrizi

2 of 20

SOMMAIRE

INTRODUCTION
CONVOLUTIONAL NEURAL NETWORK
GOTURN
YOLO
APPLICATION TO OUR PROJECT

5.1 GOTURN IMPLEMENTATION

5.2 YOLO IMPLEMENTATION

RESULTS AND CONCLUSION

3 of 20

1 . INTRODUCTION

GOAL : Tracking an object in subsequent video frames

CONSTRAINTS :

Object Motion
Changes in viewpoint
Lighting changes
Partial Occlusion
Deformation

METHOD : Use Convolutional Neural Network to extract features thanks to a large image datasets and compute a regression

DEEP LEARNING FOR VISUAL TRACKING

3

21/01/2018

Image

CNN Regression

Coordinates of the bounding box

4 of 20

2 . CONVOLUTIONAL NEURAL NETWORK

DEEP LEARNING FOR VISUAL TRACKING

4

21/01/2018

Classic Neural Network	Convolutional Neural Network
Flattening Image : Loss of the spatial aspect	Feature Map
Computationally expensive: 512∗512∗3∗Neurons weights	Convolution & Pooling

5 of 20

3 . GOTURN*

DEEP LEARNING FOR VISUAL TRACKING

5

21/01/2018

Previous Frame

Current Frame

What to track

Search Region

C,w,h

C,𝜆*w,𝜆*h

5 first Convolutional Layers of CaffeNet

Fully-Connected� Layers

Predicted Location of target within search region

* Generic Object Tracking Using Regression Network

6 of 20

4. YOLO*

DEEP LEARNING FOR VISUAL TRACKING

6

21/01/2018

Divides the input image into a grid

For each grid cell, we predict :

B bounding boxes with confidence
C class probabilities

Multiply confidence and probabilities to give class-specific confidence scores for each box

Threshold the class-specific confidence score to have final detections

* You Only Look Once

7 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

7

21/01/2018

5.1 GOTURN IMPLEMENTATION

Kotlin implementation
TensorFlow for Java
Based on model imported from GOTURN-Tensorflow implementation https://github.com/tangyuhao/GOTURN-Tensorflow

Pre-trained model, import graph and weights
Must wrap Java library. Incomplete functionality must be supplemented.
Fast reliable workflow, enables flexible data visualisation and prototyping. High performance.

Pre-trained:

Frames from videos: 13,082 images of 251 objects
Augmented still images: 239,283 from 134,821 images

8 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

8

21/01/2018

5.2 GOTURN IMPLEMENTATION - CONTINUED

9 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

9

21/01/2018

5.3 GOTURN IMPLEMENTATION - CONTINUED

10 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

10

21/01/2018

5.4 GOTURN IMPLEMENTATION - CONTINUED

11 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

11

21/01/2018

Use Yolo9000 : trained on 9000 classes

Assuming our target object is in the trained classes

Adapting Detection algorithm to tracking

5.2 YOLO IMPLEMENTATION

Nearest box to the previous one

Frame t predicted

(Ground truth for t=0)

Frame t+1 original

Threshold

YOLO9000 Object Detection

Frame t+1 predicted

12 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

12

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

13 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

13

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

14 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

14

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

15 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

15

21/01/2018

SEQUENCE	YOLO Mean Centroid Distance	GOTURN Mean Centroid Distance	GOTURN Mean IOU
Bear	6.5	19.8 ; 20	0.47
Octopus	32	36 ; 131.76	0.073
Fish	17	3 ; 2.57	0.66

METHOD	PROS	CONS
GOTURN	Performs on any type of target Feed forward is fast	Sensible to camera motion and large movement Focus on the main part of the object
YOLO	Confidence score for each box Great performance on objects from a trained class	Needs to have a model trained on the class of the target object Many objects are detected, needs to focus on the target one

16 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

16

21/01/2018

SEQUENCE	YOLO Mean Centroid Distance	GOTURN Mean Centroid Distance	GOTURN Mean IOU
Bear	6.5	19.8 ; 20	0.47
Octopus	32	36 ; 131.76	0.073
Fish	17	3 ; 2.57	0.66

METHOD	PROS	CONS
GOTURN	Performs on any type of target Feed forward is fast	Sensible to camera motion and large movement Focus on the main part of the object
YOLO	Confidence score for each box Great performance on objects from a trained class	Needs to have a model trained on the class of the target object Many objects are detected, needs to focus on the target one

17 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

17

21/01/2018

SEQUENCE	YOLO Mean Centroid Distance	GOTURN Mean Centroid Distance	GOTURN Mean IOU
Bear	6.5	19.8 ; 20	0.47
Octopus	32	36 ; 131.76	0.073
Fish	17	3 ; 2.57	0.66

METHOD	PROS	CONS
GOTURN	Performs on any type of target Feed forward is fast	Sensible to camera motion and large movement Focus on the main part of the object
YOLO	Confidence score for each box Great performance on objects from a trained class	Needs to have a model trained on the class of the target object Many objects are detected, needs to focus on the target one

18 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

18

21/01/2018

SEQUENCE	YOLO Mean Centroid Distance	GOTURN Mean Centroid Distance	GOTURN Mean IOU
Bear	6.5	19.8 ; 20	0.47
Octopus	32	36 ; 131.76	0.073
Fish	17	3 ; 2.57	0.66

METHOD	PROS	CONS
GOTURN	Performs on any type of target Feed forward is fast	Sensible to camera motion and large movement Focus on the main part of the object
YOLO	Confidence score for each box Great performance on objects from a trained class	Needs to have a model trained on the class of the target object Many objects are detected, needs to focus on the target one

19 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

19

21/01/2018

SEQUENCE	YOLO Mean Centroid Distance	GOTURN Mean Centroid Distance	GOTURN Mean IOU
Bear	6.5	19.8 ; 20	0.47
Octopus	32	36 ; 131.76	0.073
Fish	17	3 ; 2.57	0.66

METHOD	PROS	CONS
GOTURN	Performs on any type of target Feed forward is fast	Sensible to camera motion and large movement Focus on the main part of the object
YOLO	Confidence score for each box Great performance on objects from a trained class	Needs to have a model trained on the class of the target object Many objects are detected, needs to focus on the target one

20 of 20

6. RESTITUTION CHALLENGE

20