Training Deep Learning Models for Vision
Day 2
Convolutional Neural Networks
Convolutional layers
Disadvantages fully connected layers:
Convolutional layers
Disadvantages fully connected layers:
Use translation equivariance:
Convolutional layers
Input: Image�Output: Transform of the image
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | |
| | |
| | |
*
Input image
Weights
1st hidden layer
| | |
| | |
| | |
Convolutional layers
| | |
| | |
| | |
*
1st hidden layer
Input: Image�Output: Transform of the image
Input image
Weights
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | |
| | |
| | |
Convolutional layers
| | |
| | |
| | |
*
1st hidden layer
Input: Image�Output: Transform of the image
Input image
Weights
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | |
| | |
| | |
Convolutional layers
| | |
| | |
| | |
*
Input: Image�Output: Transform of the image
1st hidden layer
Input image
Weights
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Convolutional layers
| | |
| | |
| | |
*
#parameters: filter_size * input_channels * output channels�Independent of image size
1st hidden layer
Input image
Weights
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Building a network
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
*
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
*
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
Input image
Weights
Weights
Pooling layers
Reduce layer size by “simple” operation via sliding window�Usually maxpooling
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
*
| | |
| | |
| | |
Some layer
Down�sampling
Next layer
Pooling layers
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
*
| | |
| | |
| | |
Down�sampling
Next layer
Some layer
Reduce layer size by “simple” operation via sliding window�Usually maxpooling
Pooling layers
*
| | |
| | |
| | |
Down�sampling
Next layer
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Some layer
Reduce layer size by “simple” operation via sliding window�Usually maxpooling
Pooling layers
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
*
| | |
| | |
| | |
Down�sampling
Next layer
Some layer
Reduce layer size by “simple” operation via sliding window�Usually maxpooling
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Building a network
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
*
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
*
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
Input image
Weights
Weights
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
Architectures for Vision
Architectures for vision
Adaptations for training deeper models
ImageNet
Image classification dataset�
�
ImageNet
Image classification dataset�
Important benchmark�in computer vision�
https://www.researchgate.net/figure/Performance-of-different-approaches-in-ImageNet-2015-competition_fig2_309392322
AlexNet
VGG
��
http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture09.pdf
ResNet
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/�
More architectures
More architectures
Advanced training
Learning Rate
Learning Rate
https://www.jeremyjordan.me/nn-learning-rate/
Choosing a Learning Rate
MLP exercise with different learning rates
lr=1e-4
Choosing a Learning Rate
Learning rate schedulers:
Optimizers
Optimizers
Optimizers
https://ruder.io/optimizing-gradient-descent/
Optimizers
https://ruder.io/optimizing-gradient-descent/
Optimizers
Optimizers
Optimizers
Data Augmentation
Increasing the amount of labeled data is expensive!��Idea: use transformation to get different but valid data point
Original Image
Data Augmentation
Increasing the amount of labeled data is expensive!��Idea: use transformation to get different but valid data point
Original Image
Flipped horizontally
Color jitter
Data Augmentation
Increasing the amount of labeled data is expensive!��Idea: use transformation to get different but valid data point
Original Image
Flipped horizontally
Color jitter
Rotate 90 deg
Data Augmentation
Increasing the amount of labeled data is expensive!��Idea: use transformation to get different but valid data point
Original Image
Flipped horizontally
Rotate 90 deg
Normalization layers
Parameters initialized to zero mean and unit variance�-> this is lost with parameter updates���Keep parameters normalized during training
More techniques
More techniques
More techniques
Best Practices
Exercises
Some critique of yesterday exercises
Some critique of yesterday exercises
The gist of the exercises�“Classical” vision pipeline: Fixed (convolutional) filters + classifier��Can express convolutional filters as convolutional layers�-> learnable��Today: learn it end to end via CNN!�
DL architectures on CIFAR
�Send link to your notebook on gitter (or adlcourse2020@gmail.com)