1 of 26

Computer Vision Part I

Arvind Ramaswami

Image Classification with Convolutional Neural Networks

2 of 26

What is Computer Vision?

Making programs have a high-level understanding of an image

Three problems in computer vision:

https://research.fb.com/learning-to-segment/

3 of 26

Where we are in Computer Vision

https://github.com/facebookresearch/Detectron

4 of 26

Shortcomings of Computer Vision

Explaining and Harnessing Adversarial Examples. Goodfellow et al.

Robust Physical-World Attacks on Deep Learning Visual Classification. Eykholt, Etimov et al.

5 of 26

Adversarial Patch. Brown, Mané et al.

6 of 26

Neural networks

  • Similar to the brain? Not really
  • Function approximator

Universal approximation theorem: A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn

https://en.wikipedia.org/wiki/Universal_approximation_theorem

7 of 26

Fully-connected neural network

-Each layer: contains a linear operation and a nonlinear activation function

-Involves stretching and squashing of space to create decision boundaries for prediction

P(dog)

P(cat)

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

8 of 26

Example of a neural network

Probabilities

9 of 26

Capacity

  • There are many functions that can approximate the training data, but only a subset of them can generalize.
  • Bias: error of predictions within the training set, variance: error outside the training set due to sensitivity to noise within the training data
  • Too many parameters: causes a high variance, the neural network overfits to the training set.

http://scott.fortmann-roe.com/docs/BiasVariance.html

10 of 26

Flaw of a fully-connected neural network

Neural networks are difficult to train for image because they need to account for shifting images.

Fully-connected neural networks are not spatially invariant.

11 of 26

Convolutions

f = 3,

s = 1

12 of 26

Sparse connections

Fully connected layer

Convolutional layer: much fewer computations

13 of 26

Spatial invariance

The kernel weights are shared among different parts of the image, so moving the image around should produce similar results.

14 of 26

Edge detection

Sobel operator

Prewitt operator

https://en.wikipedia.org/wiki/Sobel_operator

https://en.wikipedia.org/wiki/Prewitt_operator

15 of 26

Pooling

-Reduces dimensionality while maintaining the structure

-Max and avg.

16 of 26

Spatial invariance in pooling

Shifting the image slightly should retain the max values over patches.

17 of 26

Convolutional neural network

-Involve convolution and pooling layers

-ReLu: leads to faster convergence than other activation functions

18 of 26

Training the model

  • Data is split into batches to reduce computation.
  • Batches of data are fed at a time.
  • Once all the batches are fed, an epoch is finished. Training is generally repeated for several epochs.

19 of 26

Keras

-High-level API: allows to create a model quickly in Python

-Builds the model using a graph

-Runs an optimization function to minimize the loss

20 of 26

Coding example in Keras of a basic CNN

21 of 26

Useful models

AlexNet

VGG19

22 of 26

Visualization of features

Zeiler and Fergus, Visualizing and Understanding Convolutional Networks

Extraction of higher-level features in later layers of convolutional neural networks

23 of 26

24 of 26

25 of 26

Larger CNNs

Residual neural networks: can extend beyond their capacity and perform well on new data

Smaller chance of vanishing gradients

26 of 26

Questions?

Contact: aramaswami32@gatech.edu