1 of 22

Object Detection

With Adam, Shreyas, Rohan, Joon, and Sky

2 of 22

Table of Contents

01

03

02

04

Background

Data

What is Object Detection, and what are its possible applications?

To train/test our model, what data was used, and how did we use it?

Detection Models

Classification Models

What types of models are at our disposal, and which ones should we use?

How can we use more advanced models to make even better predictions?

3 of 22

4 of 22

01

Object Detection

(Background)

5 of 22

What is Object Detection?

6 of 22

7 of 22

8 of 22

02

Data

9 of 22

Our Data

Images/Pixels
Normalization

Outputs

Inputs

Probability of image classification

[B, C, T]
[0.1, 0.1, 0.8] TRUCK
[0.1, 0.7, 0.2] CAR

Labels/Index

Shreyas 1 min 30 sec

For our data, the inputs were images. However, computers cannot just compute images the way they are, they have to compute them using pixels. The pixel value intensity is used for computation in order to classify images. Also, the image is in RGB so there are 3 images filled with pixels running from 0 to 255 which show one completed image. These pixel values are then normalized to allow for easier and faster computation. The car shown is an example of an image that would be processed by an algorithm to be classified. With the S, we can see the matrix of pixel values for that image. Now, moving to the outputs, this is where our image is classified by an algorithm. Each image can be classified with an index, 0 being background, 1 being car, and 2 being truck. The output of our algorithm is the probability of the image classification. This means that we will get the probability out of 1 that an image is a background, car, or truck. An example of this is shown at the bottom. The first number is the background probability, next is car, and last is truck. Here, there is a 10 percent chance for background, 10 percent for car, and 80 percent for truck; this image would be classified as a truck. For the next one, there is a 10 percent change for background, 70 percent for car, and 20 percent for truck; this image would be classified as a car.

10 of 22

03

Sliding Windows

11 of 22

Sliding Window Algorithm

The sliding window algorithm takes cropped parts of the image of a fixed size and runs them through the image classifier mentioned before. However, it has some problems…

12 of 22

04

Classification Models

13 of 22

Our Methods of Classification

Neural Networks

Convolutional NN

Transfer Learning

14 of 22

Neural Networks

Terms:

Weights: Connection between neurons signifying importance of the input
Activation function (non-linear):

ReLU (Rectified Linear Unit) Softmax

Goal:

Adjust weights to get optimal accuracy

15 of 22

Our Model

Results

16 of 22

Convolutional Neural Networks

How it works:

Kernel (filter) is used to convert input data into a feature map

Highlights features
Multiple times

Activation function
Pooling: Shrinks size of data

Prevents overfitting (learning from unimportant data)
Faster calculations

17 of 22

Our Model

Results

18 of 22

Transfer Learning

Transfer learning is the use of “expert” models trained on other tasks to make predictions on a new task.
Some transfer models we can use are:

VGG16
VGG19
ResNet50
DenseNet121

For our predictions, we used the VGG16 model (pictured below)
The VGG16 model had an accuracy of about 95%, which was almost 10% more than the CNN model!

19 of 22

05

YOLO

20 of 22

YOLO

“You Only Look Once”

How it works: Divide up the image into grids -> Predict the bounding box along with what object is in the bounding box + the probability of object being present -> Label the boxes and the object inside the boxes

21 of 22

YOLO

Pros

Unlike previous Object Detection models, which use different classifiers to perform detection, YOLO uses a single fully connected layer.
This makes YOLO the most accurate and fastest model by far – which can process images up to 155 frames per second. ��

Cons

YOLO struggles to detect small objects that are clustered in a group since each grid is constrained to detect a single object.
This also means they struggle to detect close objects.
YOLO also struggles with recall and localize things compared to other models.

https://www.youtube.com/watch?v=69Ii3HjUiTM&t=1s&ab_channel=JosephRedmon

22 of 22

THANKS

Do you have any questions?

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, infographics & images by Freepik and illustrations by Stories