1 of 78

2 of 78

Training Deep Learning Models for Vision

Day 1

3 of 78

Course Overview

4 of 78

Organisation

Monday - Thursday

09:30 - 12:30 Introductory lecture, overview of exercises, work on exercises
13:30 - 17:30 Work on exercises, we will be in room for questions discussions

Friday: full day to work on exercises�
2CP for the course; solve the exercises to be eligible�

5 of 78

Organisation

Hygiene Rules:

Please don’t come with typical symptoms or contact with someone who tested positive
Take dedicated entrance / exit
Stay at your assigned seats during the course
Wear mask in the building
Fill in the data collection sheet on day 1 (we will keep list of attendance for the other days)

6 of 78

Exercises

Monday, Tuesday, Wednesday

Prepared exercises in jupyter notebooks
We recommend to use google colab (offers a free gpu)
We will be available in the room for questions and discussions�

7 of 78

Exercises

Monday, Tuesday, Wednesday

Prepared exercises in jupyter notebooks
We recommend to use google colab (offers a free gpu)
We will be available in the room for questions and discussions�

Thursday, Friday

Larger exercise without prepared notebooks
We will provide some ideas for exercises
You are welcome to work on your own computer vision related problem!

8 of 78

Exercises

Hosted on github:�https://github.com/constantinpape/training-deep-learning-models-for-vison�
To share links to code, resources etc:�https://gitter.im/training-deep-learning-models-for-vison/community#�
Share link to colab notebook

9 of 78

Content

Machine Learning in Computer Vision�
Convolutional Neural Networks for Image classification�
Image to Image Networks for segmentation and denoising�
Object detection

10 of 78

Machine Learning Recap

11 of 78

Science at large

Observe a phenomenon�
Construct a model

E.g. physical law�

Make predictions from the model

12 of 78

Machine Learning

Observe a phenomenon�
Construct a model Automatically�
Make predictions from the model

13 of 78

Machine Learning

Observe a phenomenon�
Construct a model Automatically�
Make predictions from the model

xkcd.com/1838

14 of 78

Learning techniques

Supervised Learning

Input data and correct predictions (“ground-truth”) available�

Unsupervised Learning

Only input data available�

Reinforcement Learning

Input data and sparse rewards available

15 of 78

Learning techniques

Supervised Learning

Input data and correct predictions (“ground-truth”) available�

Unsupervised Learning

Only input data available�

Reinforcement Learning

Input data and sparse rewards available

16 of 78

Supervised learning

Two main tasks

Classification

Predict one out of a discrete set of classes�

Regression

Predict continuous value

17 of 78

Supervised learning

Define a model with parameters

From tens to millions, linear model to CNN�

18 of 78

Supervised learning

Define a model with parameters

From tens to millions, linear model to CNN�

Estimate parameters from the observations

Training set consists of input data and correct predictions (labels)
Common notation: X, Y�

19 of 78

Supervised learning

Define a model with parameters

From tens to millions, linear model to CNN�

Estimate parameters from the observations

Training set consists of input data and correct predictions (labels)
Common notation: X, Y�

Check the model predictions

On different observations: test set

20 of 78

Train and test set

Don’t train on test data
Don’t validate on train data�

21 of 78

Train and test set

Don’t train on test data
Don’t validate on train data��
Separate training / validation sets

Full model has additional hyperparameters (e.g. post-processing)�-> need separate training set
Validate model performance during training to find best set of parameters�-> need validation set

22 of 78

Shallow Models

Nearest Neighbor Classifier, Random Forest, ...�
Logistic Regression�

23 of 78

Shallow Models

Logistic Regression

Given input vector x, predict probability for classes in y
Input is multiplied by weight vector w
Non-linearity to determine class probability:

Sigmoid (binary classification)
Softmax (multiple classes)�

24 of 78

Shallow Models

Logistic Regression

Given input vector x, predict probability for classes in y
Input is multiplied by weight vector w
Non-linearity to determine class probability:

Sigmoid (binary classification)
Softmax (multiple classes)�

25 of 78

Shallow Models

Logistic Regression

Given input vector x, predict probability for classes in y
Input is multiplied by weight vector w
p(y=1) = sigmoid(x * w + b)�

26 of 78

Shallow Models

Logistic Regression

p(y=1) = sigmoid(x * w + b)

Very simple artificial neural network!

Input

Prediction

x₁

x₂

x₃

+

w₁

w₂

w₃

p_y

sig

27 of 78

Going deeper

Single layer network

Add one “hidden” layer�

Hidden

Input

Prediction

28 of 78

Going deeper

Single layer network

Add one “hidden” layer�

Multi-layer network

Add multiple hidden layers
Apply non-linearity to each layer output��

Prediction

Input

Hidden layers

29 of 78

Going deeper

Single layer network

Add one “hidden” layer�

Multi-layer network

Add multiple hidden layers
Apply non-linearity to each layer output��

Prediction

Input

Hidden layers

30 of 78

Training

Minimize loss between prediction y and target t
Classification: cross entropy (assumes y and t in [0, 1])��

31 of 78

Training

Minimize loss between prediction y and target t
Classification: cross entropy (assumes y and t in [0, 1])��
Minimize via (stochastic) gradient descent
Update parameters based on derivative w.r.t the loss function

32 of 78

Gradient Descent: Simplest Example

Addition of two inputs:�y = w1 * x1 + w2 * x2�
L2 Loss:�L = ½ (y(w) - t)^2�

x1

x2

y

L

w1

w2

33 of 78

Gradient Descent: Simplest Example

Addition of two inputs:�y = w1 * x1 + w2 * x2�
L2 Loss:�L = ½ (y(w) - t)^2�
Derivatives:�dL/dw1 = (y(w) - t) * x1�dL/dw2 = (y(w) - t) * x2�

x1

x2

y

L

w1

w2

34 of 78

Gradient Descent: Simplest Example

Random weight initialization:�w1 = 0.1, w2 = 0.1

�

x1

x2

y

L

0.1

35 of 78

Gradient Descent: Simplest Example

Random weight initialization:�w1 = 0.1, w2 = 0.1
Forward pass for data point in training set:�x1 = 0, x2 = 1, t = 0.5�

0

1

0.1

L

0.1

36 of 78

Gradient Descent: Simplest Example

Forward pass for data point in training set:�x1 = 0, x2 = 1, t = 0.5
Compute the loss:�½ * (0.1 - 0.5)^2 = 0.08�

0

1

0.1

0.08

0.1

37 of 78

Gradient Descent: Simplest Example

Compute the gradients:�dL/dw1 = 0�dL/dw2 = -0.4�

0

1

0.1

0.08

0.1

0

-0.4

38 of 78

Gradient Descent: Simplest Example

Update the weights with learning rate:�w_new = w - lr * dL/dw�
Example: lr = 0.1�

x1

x2

y

L

0.1

0.14

39 of 78

Gradient Descent: Simplest Example

Forward pass for the same point:�The loss decreased.�

0

1

0.14

0.065

0.1

0.14

40 of 78

Gradient Descent

Batch gradient descent

Compute gradients for all training samples and update�

Stochastic gradient descent

Compute gradients for single sample and update�

Mini-batch stochastic gradient descent

Compute gradients for several (mini-batch) samples

41 of 78

Computer Vision Recap

42 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

43 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

44 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

45 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

46 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

Dog

47 of 78

Computer Vision tasks

Cat

Dog

Image Classification

Object detection

Semantic segmentation

48 of 78

Example: Semantic Segmentation

49 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Cell!

50 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Cell!

51 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Cell!

Not Cell!

52 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Is the texture smooth?

Cell!

Not Cell!

53 of 78

Learn it instead!

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Is the texture smooth?

Cell!

Not Cell!

54 of 78

Shallow learning

Black�Box

Training data

55 of 78

Compute features

56 of 78

Add (sparse) training data

57 of 78

Add (sparse) training data

58 of 78

Train classifier

59 of 78

Predict all pixels or new data

60 of 78

Deep Learning

61 of 78

Deep Learning

Convolutional Neural Network�(here: U-Net, will cover later)�

62 of 78

Deep Learning Libraries �and Pytorch Basics

63 of 78

Why do we need a deep learning framework?

Gradient descent toy example

x1

x2

y

L

w1

w2

64 of 78

Why do we need a deep learning framework?

Gradient descent toy example��
Real applications:

Millions / billions of parameters
Different model architectures

x1

x2

y

L

w1

w2

65 of 78

Why do we need a deep learning framework?

Specifying update rules (gradients) not feasible
Need fast application of model (forward pass) and gradients (backward pass)

��

66 of 78

Why do we need a deep learning framework?

Specifying update rules (gradients) not feasible
Need fast application of model (forward pass) and gradients (backward pass)

�Deep Learning Frameworks:

Auto-diff: only need to specify model forward pass
Efficient implementation on GPU��

67 of 78

68 of 78

69 of 78

Torch tensor

nd array, like numpy.array but supports auto-differentiation

70 of 78

Torch nn

functionality for neural networks

71 of 78

Torch nn.Module

define custom models

72 of 78

Torch Basics

torch.nn.functional: functional interface for torch.nn
torch.optim: optimizers - stochastic gradient descent and other optimizers
torch.utils.data - data providers

�Resources:

Tutorials: https://pytorch.org/tutorials/
Documentation: https://pytorch.org/docs/stable/

73 of 78

Implementing from scratch ...

Torch is low level: a lot of (basic) functionality still needs to be implemented��Educational: we implement everything in torch to understand how it works��

74 of 78

Implementing from scratch ...

Torch is low level: a lot of (basic) functionality still needs to be implemented��Educational: we implement everything in torch to understand how it works��In Practice: good idea to use suited library on top of torch

torchvision, torchaudio, torchtext
ignite - like keras for torch
and many more�

75 of 78

Exercises

76 of 78

Machine learning and computer vision with pytorch

Dataset: CIFAR10

60,000 32x32 pixel images�
Image classification with 10 classes

�� https://www.cs.toronto.edu/~kriz/cifar.html

77 of 78

First steps in machine learning for vision

https://github.com/constantinpape/training-deep-learning-models-for-vison

78 of 78

First steps in machine learning for vision

https://github.com/constantinpape/training-deep-learning-models-for-vison

Data preparation for pytorch on CIFAR10
Logistic Regression on CIFAR
MLP on CIFAR�
Tasks and questions building on the exercise at the end of the notebook
Send short summary of the answers on gitter or to adlcourse2020@gmail.com