1 of 20

DEEP LEARNING FOR VISUAL TRACKING

F4B305

Agustin Picard

Ricardo Andreasen

Tomas Volker

Amadou Agne

Gautier Cosne

Chayan Toufan Tabrizi

2 of 20

SOMMAIRE

  1. INTRODUCTION
  2. CONVOLUTIONAL NEURAL NETWORK
  3. GOTURN
  4. YOLO
  5. APPLICATION TO OUR PROJECT

5.1 GOTURN IMPLEMENTATION

5.2 YOLO IMPLEMENTATION

  • RESULTS AND CONCLUSION

3 of 20

1 . INTRODUCTION

        • GOAL : Tracking an object in subsequent video frames

        • CONSTRAINTS :
          • Object Motion
          • Changes in viewpoint
          • Lighting changes
          • Partial Occlusion
          • Deformation

        • METHOD : Use Convolutional Neural Network to extract features thanks to a large image datasets and compute a regression

DEEP LEARNING FOR VISUAL TRACKING

3

21/01/2018

Image

CNN Regression

Coordinates of the bounding box

4 of 20

2 . CONVOLUTIONAL NEURAL NETWORK

DEEP LEARNING FOR VISUAL TRACKING

4

21/01/2018

Classic Neural Network

Convolutional Neural Network

Flattening Image :

Loss of the spatial aspect

Feature Map

Computationally expensive:

512∗512∗3∗Neurons weights

Convolution & Pooling

 

5 of 20

3 . GOTURN*

DEEP LEARNING FOR VISUAL TRACKING

5

21/01/2018

Previous Frame

Current Frame

What to track

Search Region

C,w,h

C,𝜆*w,𝜆*h

5 first Convolutional Layers of CaffeNet

Fully-Connected� Layers

Predicted Location of target within search region

* Generic Object Tracking Using Regression Network

6 of 20

4. YOLO*

DEEP LEARNING FOR VISUAL TRACKING

6

21/01/2018

        • Divides the input image into a grid

        • For each grid cell, we predict :
          • B bounding boxes with confidence
          • C class probabilities

        • Multiply confidence and probabilities to give class-specific confidence scores for each box

        • Threshold the class-specific confidence score to have final detections

* You Only Look Once

7 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

7

21/01/2018

5.1 GOTURN IMPLEMENTATION

  • Kotlin implementation
  • TensorFlow for Java
  • Based on model imported from GOTURN-Tensorflow implementation https://github.com/tangyuhao/GOTURN-Tensorflow

  • Pre-trained model, import graph and weights
  • Must wrap Java library. Incomplete functionality must be supplemented.
  • Fast reliable workflow, enables flexible data visualisation and prototyping. High performance.

Pre-trained:

  • Frames from videos: 13,082 images of 251 objects
  • Augmented still images: 239,283 from 134,821 images

8 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

8

21/01/2018

5.2 GOTURN IMPLEMENTATION - CONTINUED

9 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

9

21/01/2018

5.3 GOTURN IMPLEMENTATION - CONTINUED

10 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

10

21/01/2018

5.4 GOTURN IMPLEMENTATION - CONTINUED

11 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

11

21/01/2018

        • Use Yolo9000 : trained on 9000 classes

        • Assuming our target object is in the trained classes

        • Adapting Detection algorithm to tracking

5.2 YOLO IMPLEMENTATION

Nearest box to the previous one

Frame t predicted

(Ground truth for t=0)

Frame t+1 original

Threshold

YOLO9000 Object Detection

Frame t+1 predicted

12 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

12

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

13 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

13

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

14 of 20

5. APPLICATION TO OUR PROJECT

DEEP LEARNING FOR VISUAL TRACKING

14

21/01/2018

5.2 YOLO IMPLEMENTATION - CONTINUED

15 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

15

21/01/2018

SEQUENCE

YOLO Mean Centroid Distance

GOTURN Mean Centroid Distance

GOTURN Mean IOU

Bear

6.5

19.8 ; 20

0.47

Octopus

32

36 ; 131.76

0.073

Fish

17

3 ; 2.57

0.66

METHOD

PROS

CONS

GOTURN

  • Performs on any type of target
  • Feed forward is fast
  • Sensible to camera motion and large movement
  • Focus on the main part of the object

YOLO

  • Confidence score for each box
  • Great performance on objects from a trained class
  • Needs to have a model trained on the class of the target object
  • Many objects are detected, needs to focus on the target one

16 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

16

21/01/2018

SEQUENCE

YOLO Mean Centroid Distance

GOTURN Mean Centroid Distance

GOTURN Mean IOU

Bear

6.5

19.8 ; 20

0.47

Octopus

32

36 ; 131.76

0.073

Fish

17

3 ; 2.57

0.66

METHOD

PROS

CONS

GOTURN

  • Performs on any type of target
  • Feed forward is fast
  • Sensible to camera motion and large movement
  • Focus on the main part of the object

YOLO

  • Confidence score for each box
  • Great performance on objects from a trained class
  • Needs to have a model trained on the class of the target object
  • Many objects are detected, needs to focus on the target one

17 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

17

21/01/2018

SEQUENCE

YOLO Mean Centroid Distance

GOTURN Mean Centroid Distance

GOTURN Mean IOU

Bear

6.5

19.8 ; 20

0.47

Octopus

32

36 ; 131.76

0.073

Fish

17

3 ; 2.57

0.66

METHOD

PROS

CONS

GOTURN

  • Performs on any type of target
  • Feed forward is fast
  • Sensible to camera motion and large movement
  • Focus on the main part of the object

YOLO

  • Confidence score for each box
  • Great performance on objects from a trained class
  • Needs to have a model trained on the class of the target object
  • Many objects are detected, needs to focus on the target one

18 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

18

21/01/2018

SEQUENCE

YOLO Mean Centroid Distance

GOTURN Mean Centroid Distance

GOTURN Mean IOU

Bear

6.5

19.8 ; 20

0.47

Octopus

32

36 ; 131.76

0.073

Fish

17

3 ; 2.57

0.66

METHOD

PROS

CONS

GOTURN

  • Performs on any type of target
  • Feed forward is fast
  • Sensible to camera motion and large movement
  • Focus on the main part of the object

YOLO

  • Confidence score for each box
  • Great performance on objects from a trained class
  • Needs to have a model trained on the class of the target object
  • Many objects are detected, needs to focus on the target one

19 of 20

6. RESULTS AND CONCLUSION

DEEP LEARNING FOR VISUAL TRACKING

19

21/01/2018

SEQUENCE

YOLO Mean Centroid Distance

GOTURN Mean Centroid Distance

GOTURN Mean IOU

Bear

6.5

19.8 ; 20

0.47

Octopus

32

36 ; 131.76

0.073

Fish

17

3 ; 2.57

0.66

METHOD

PROS

CONS

GOTURN

  • Performs on any type of target
  • Feed forward is fast
  • Sensible to camera motion and large movement
  • Focus on the main part of the object

YOLO

  • Confidence score for each box
  • Great performance on objects from a trained class
  • Needs to have a model trained on the class of the target object
  • Many objects are detected, needs to focus on the target one

20 of 20

6. RESTITUTION CHALLENGE

20