Convolutional neural networks
CS5670: Computer Vision
Slides from Fei-Fei Li, Justin Johnson, Serena Yeung
Readings
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
(Cornell University)
Hinton and Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science, 2016.
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Fast-forward to today: ConvNets* are everywhere
* and other recent architectures, like Transformers
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Fast-forward to today: ConvNets are everywhere
Self-driving cars (video courtesy Tesla)
Cloud TPU v4 Pods
Text-to-image
“A computer vision class watching a cool lecture, crayon drawing”
“A computer vision class watching a cool lecture, album cover”
What is a ConvNet?
Motivation – Feature Learning
Life Before Deep Learning
Input Pixels
Extract Hand-Crafted Features
Figure: Karpathy 2016
Concatenate into a vector x
Linear Classifier
Ans
SVM
Why use features? Why not pixels?
Slide from Karpathy 2016
Q: What would be a
very hard set of classes for a linear classifier to distinguish?
(assuming x = pixels)
Goal: linearly separable classes
Aside: Image Features
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Image Features: Motivation
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Image Features: Motivation
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Last layer of many CNNs is a linear classifier
Input Pixels
Ans
Perform everything with a big neural network, trained end-to-end
This piece is just a linear classifier
Key: perform enough processing so that by the time you get to the end of the network, the classes are linearly separable
(GoogLeNet)
Visualizing AlexNet in 2D with t-SNE
[Donahue, “DeCAF: DeCAF: A Deep Convolutional …”, arXiv 2013]
(2D visualization using t-SNE)
Linear Classifier
Convolutional neural networks
Layer types:
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Number of weights: 5 x 5 x 3 + 1 = 76
(vs. 3072 for a fully-connected layer)
(+1 for bias term)
Adapted from Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
(total number of parameters to learn: 6 x (75 + 1) = 456)
How many parameters are in a convolution layer consisting of 3 3x3x1 filters (each with bias term)?
ⓘ Start presenting to display the poll results on this slide.
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
“1x1 convolutions”
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Convolutional layer—properties
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
Slide credit: Fei-Fei Li & Andrej Karpathy & Serena Leung
AlexNet (2012)
Output: 1,000-D vector (probabilities over 1,000 ImageNet categories)
Elgendy, Deep Learning for Vision Systems, https://livebook.manning.com/book/grokking-deep-learning-for-computer-vision/chapter-5/v-3/
6M parameters in total
Big picture
Data is key—enter ImageNet
Performance improvements on ILSVRC
AlexNet
Pre-deep learning era
{
Deep learning era
Image credit: Zaid Alyafeai, Lahouari Ghouti
Questions?