Computer Vision Part I
Arvind Ramaswami
Image Classification with Convolutional Neural Networks
What is Computer Vision?
Making programs have a high-level understanding of an image
Three problems in computer vision:
https://research.fb.com/learning-to-segment/
Where we are in Computer Vision
https://github.com/facebookresearch/Detectron
Shortcomings of Computer Vision
Explaining and Harnessing Adversarial Examples. Goodfellow et al.
Robust Physical-World Attacks on Deep Learning Visual Classification. Eykholt, Etimov et al.
Adversarial Patch. Brown, Mané et al.
Neural networks
Universal approximation theorem: A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn
https://en.wikipedia.org/wiki/Universal_approximation_theorem
Fully-connected neural network
-Each layer: contains a linear operation and a nonlinear activation function
-Involves stretching and squashing of space to create decision boundaries for prediction
P(dog)
P(cat)
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Example of a neural network
Probabilities
Capacity
http://scott.fortmann-roe.com/docs/BiasVariance.html
Flaw of a fully-connected neural network
Neural networks are difficult to train for image because they need to account for shifting images.
Fully-connected neural networks are not spatially invariant.
Convolutions
f = 3,
s = 1
Sparse connections
Fully connected layer
Convolutional layer: much fewer computations
Spatial invariance
The kernel weights are shared among different parts of the image, so moving the image around should produce similar results.
Edge detection
Sobel operator
Prewitt operator
https://en.wikipedia.org/wiki/Sobel_operator
https://en.wikipedia.org/wiki/Prewitt_operator
Pooling
-Reduces dimensionality while maintaining the structure
-Max and avg.
Spatial invariance in pooling
Shifting the image slightly should retain the max values over patches.
Convolutional neural network
-Involve convolution and pooling layers
-ReLu: leads to faster convergence than other activation functions
Training the model
Keras
-High-level API: allows to create a model quickly in Python
-Builds the model using a graph
-Runs an optimization function to minimize the loss
Coding example in Keras of a basic CNN
Useful models
AlexNet
VGG19
Visualization of features
https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
http://cs231n.github.io/understanding-cnn/
http://kvfrans.com/visualizing-features-from-a-convolutional-neural-network/
Zeiler and Fergus, Visualizing and Understanding Convolutional Networks
Extraction of higher-level features in later layers of convolutional neural networks
Larger CNNs
Residual neural networks: can extend beyond their capacity and perform well on new data
Smaller chance of vanishing gradients
Questions?
Contact: aramaswami32@gatech.edu