1 of 78

2 of 78

Training Deep Learning Models for Vision

Day 1

3 of 78

Course Overview

4 of 78

Organisation

  • Monday - Thursday
    • 09:30 - 12:30 Introductory lecture, overview of exercises, work on exercises
    • 13:30 - 17:30 Work on exercises, we will be in room for questions discussions
  • Friday: full day to work on exercises�
  • 2CP for the course; solve the exercises to be eligible�

5 of 78

Organisation

  • Hygiene Rules:
    • Please don’t come with typical symptoms or contact with someone who tested positive
    • Take dedicated entrance / exit
    • Stay at your assigned seats during the course
    • Wear mask in the building
    • Fill in the data collection sheet on day 1 (we will keep list of attendance for the other days)

6 of 78

Exercises

  • Monday, Tuesday, Wednesday
    • Prepared exercises in jupyter notebooks
    • We recommend to use google colab (offers a free gpu)
    • We will be available in the room for questions and discussions�

7 of 78

Exercises

  • Monday, Tuesday, Wednesday
    • Prepared exercises in jupyter notebooks
    • We recommend to use google colab (offers a free gpu)
    • We will be available in the room for questions and discussions�
  • Thursday, Friday
    • Larger exercise without prepared notebooks
    • We will provide some ideas for exercises
    • You are welcome to work on your own computer vision related problem!

8 of 78

Exercises

9 of 78

Content

  • Machine Learning in Computer Vision�
  • Convolutional Neural Networks for Image classification�
  • Image to Image Networks for segmentation and denoising�
  • Object detection

10 of 78

Machine Learning Recap

11 of 78

Science at large

  • Observe a phenomenon�
  • Construct a model
    • E.g. physical law�
  • Make predictions from the model

12 of 78

Machine Learning

  • Observe a phenomenon�
  • Construct a model Automatically
  • Make predictions from the model

13 of 78

Machine Learning

  • Observe a phenomenon�
  • Construct a model Automatically
  • Make predictions from the model

xkcd.com/1838

14 of 78

Learning techniques

  • Supervised Learning
    • Input data and correct predictions (“ground-truth”) available�
  • Unsupervised Learning
    • Only input data available�
  • Reinforcement Learning
    • Input data and sparse rewards available

15 of 78

Learning techniques

  • Supervised Learning
    • Input data and correct predictions (“ground-truth”) available�
  • Unsupervised Learning
    • Only input data available�
  • Reinforcement Learning
    • Input data and sparse rewards available

16 of 78

Supervised learning

Two main tasks

  • Classification
    • Predict one out of a discrete set of classes�
  • Regression
    • Predict continuous value

17 of 78

Supervised learning

  • Define a model with parameters
    • From tens to millions, linear model to CNN�

18 of 78

Supervised learning

  • Define a model with parameters
    • From tens to millions, linear model to CNN�
  • Estimate parameters from the observations
    • Training set consists of input data and correct predictions (labels)
    • Common notation: X, Y�

19 of 78

Supervised learning

  • Define a model with parameters
    • From tens to millions, linear model to CNN�
  • Estimate parameters from the observations
    • Training set consists of input data and correct predictions (labels)
    • Common notation: X, Y�
  • Check the model predictions
    • On different observations: test set

20 of 78

Train and test set

  • Don’t train on test data
  • Don’t validate on train data�

21 of 78

Train and test set

  • Don’t train on test data
  • Don’t validate on train data��
  • Separate training / validation sets
    • Full model has additional hyperparameters (e.g. post-processing)�-> need separate training set
    • Validate model performance during training to find best set of parameters�-> need validation set

22 of 78

Shallow Models

  • Nearest Neighbor Classifier, Random Forest, ...�
  • Logistic Regression�

23 of 78

Shallow Models

Logistic Regression

  • Given input vector x, predict probability for classes in y
  • Input is multiplied by weight vector w
  • Non-linearity to determine class probability:
    • Sigmoid (binary classification)
    • Softmax (multiple classes)

24 of 78

Shallow Models

Logistic Regression

  • Given input vector x, predict probability for classes in y
  • Input is multiplied by weight vector w
  • Non-linearity to determine class probability:
    • Sigmoid (binary classification)
    • Softmax (multiple classes)

25 of 78

Shallow Models

Logistic Regression

  • Given input vector x, predict probability for classes in y
  • Input is multiplied by weight vector w
  • p(y=1) = sigmoid(x * w + b)

26 of 78

Shallow Models

Logistic Regression

  • p(y=1) = sigmoid(x * w + b)

Very simple artificial neural network!

Input

Prediction

x1

x2

x3

+

w1

w2

w3

py

sig

27 of 78

Going deeper

  • Single layer network
    • Add one “hidden” layer�

Hidden

Input

Prediction

28 of 78

Going deeper

  • Single layer network
    • Add one “hidden” layer�
  • Multi-layer network
    • Add multiple hidden layers
    • Apply non-linearity to each layer output��

Prediction

Input

Hidden layers

29 of 78

Going deeper

  • Single layer network
    • Add one “hidden” layer�
  • Multi-layer network
    • Add multiple hidden layers
    • Apply non-linearity to each layer output��

Prediction

Input

Hidden layers

30 of 78

Training

  • Minimize loss between prediction y and target t
  • Classification: cross entropy (assumes y and t in [0, 1])���

31 of 78

Training

  • Minimize loss between prediction y and target t
  • Classification: cross entropy (assumes y and t in [0, 1])���
  • Minimize via (stochastic) gradient descent
  • Update parameters based on derivative w.r.t the loss function

32 of 78

Gradient Descent: Simplest Example

  • Addition of two inputs:�y = w1 * x1 + w2 * x2�
  • L2 Loss:�L = ½ (y(w) - t)^2�

x1

x2

y

L

w1

w2

33 of 78

Gradient Descent: Simplest Example

  • Addition of two inputs:�y = w1 * x1 + w2 * x2�
  • L2 Loss:�L = ½ (y(w) - t)^2�
  • Derivatives:�dL/dw1 = (y(w) - t) * x1�dL/dw2 = (y(w) - t) * x2�

x1

x2

y

L

w1

w2

34 of 78

Gradient Descent: Simplest Example

  • Random weight initialization:�w1 = 0.1, w2 = 0.1

x1

x2

y

L

0.1

0.1

35 of 78

Gradient Descent: Simplest Example

  • Random weight initialization:�w1 = 0.1, w2 = 0.1
  • Forward pass for data point in training set:�x1 = 0, x2 = 1, t = 0.5�

0

1

0.1

L

0.1

0.1

36 of 78

Gradient Descent: Simplest Example

  • Forward pass for data point in training set:�x1 = 0, x2 = 1, t = 0.5
  • Compute the loss:�½ * (0.1 - 0.5)^2 = 0.08�

0

1

0.1

0.08

0.1

0.1

37 of 78

Gradient Descent: Simplest Example

  • Compute the gradients:�dL/dw1 = 0�dL/dw2 = -0.4�

0

1

0.1

0.08

0.1

0.1

0

-0.4

38 of 78

Gradient Descent: Simplest Example

  • Update the weights with learning rate:�w_new = w - lr * dL/dw�
  • Example: lr = 0.1�

x1

x2

y

L

0.1

0.14

39 of 78

Gradient Descent: Simplest Example

  • Forward pass for the same point:�The loss decreased.�

0

1

0.14

0.065

0.1

0.14

40 of 78

Gradient Descent

  • Batch gradient descent
    • Compute gradients for all training samples and update�
  • Stochastic gradient descent
    • Compute gradients for single sample and update�
  • Mini-batch stochastic gradient descent
    • Compute gradients for several (mini-batch) samples

41 of 78

Computer Vision Recap

42 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

43 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

44 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

Cat

45 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

Cat

Cat

46 of 78

Why is vision difficult?

Summer project: program a computer to use a camera to identify objects�Marvin Minsky, 1966

Cat

Cat

Cat

Dog

47 of 78

Computer Vision tasks

Cat

Dog

Image Classification

Object detection

Semantic segmentation

48 of 78

Example: Semantic Segmentation

49 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Cell!

50 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Cell!

Cell!

51 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Cell!

Cell!

Not Cell!

52 of 78

Rule based

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Is the texture smooth?

Cell!

Cell!

Not Cell!

53 of 78

Learn it instead!

Cells vs background segmentation

[Image: Gerlich Lab]

Is the pixel white?

Are all neighbors white?

Is the pixel near an edge?

Is the texture smooth?

Cell!

Cell!

Not Cell!

54 of 78

Shallow learning

BlackBox

Training data

55 of 78

Compute features

56 of 78

Add (sparse) training data

57 of 78

Add (sparse) training data

58 of 78

Train classifier

59 of 78

Predict all pixels or new data

60 of 78

Deep Learning

61 of 78

Deep Learning

Convolutional Neural Network�(here: U-Net, will cover later)�

62 of 78

Deep Learning Libraries �and Pytorch Basics

63 of 78

Why do we need a deep learning framework?

  • Gradient descent toy example

x1

x2

y

L

w1

w2

64 of 78

Why do we need a deep learning framework?

  • Gradient descent toy example����
  • Real applications:
    • Millions / billions of parameters
    • Different model architectures

x1

x2

y

L

w1

w2

65 of 78

Why do we need a deep learning framework?

  • Specifying update rules (gradients) not feasible
  • Need fast application of model (forward pass) and gradients (backward pass)

���

66 of 78

Why do we need a deep learning framework?

  • Specifying update rules (gradients) not feasible
  • Need fast application of model (forward pass) and gradients (backward pass)

�Deep Learning Frameworks:

  • Auto-diff: only need to specify model forward pass
  • Efficient implementation on GPU��

67 of 78

Most popular libraries

Tensorflow - low level framework���� Keras - high level framework based on tensorflow���� Pytorch - low level framework

68 of 78

Most popular libraries

Tensorflow - low level framework���� Keras - high level framework based on tensorflow��� � Pytorch - low level framework

Google

Facebook

69 of 78

Torch tensor

nd array, like numpy.array but supports auto-differentiation

70 of 78

Torch nn

functionality for neural networks

71 of 78

Torch nn.Module

define custom models

72 of 78

Torch Basics

  • torch.nn.functional: functional interface for torch.nn
  • torch.optim: optimizers - stochastic gradient descent and other optimizers
  • torch.utils.data - data providers

�Resources:

73 of 78

Implementing from scratch ...

Torch is low level: a lot of (basic) functionality still needs to be implemented��Educational: we implement everything in torch to understand how it works���

74 of 78

Implementing from scratch ...

Torch is low level: a lot of (basic) functionality still needs to be implemented��Educational: we implement everything in torch to understand how it works��In Practice: good idea to use suited library on top of torch

  • torchvision, torchaudio, torchtext
  • ignite - like keras for torch
  • and many more�

75 of 78

Exercises

76 of 78

Machine learning and computer vision with pytorch

Dataset: CIFAR10

  • 60,000 32x32 pixel images�
  • Image classification with 10 classes

���� https://www.cs.toronto.edu/~kriz/cifar.html

77 of 78

First steps in machine learning for vision

https://github.com/constantinpape/training-deep-learning-models-for-vison

78 of 78

First steps in machine learning for vision

https://github.com/constantinpape/training-deep-learning-models-for-vison