1 of 15

Deep Learning

Abir Das

Assistant Professor

Computer Science and Engineering Department

Indian Institute of Technology Kharagpur

http://cse.iitkgp.ac.in/~adas/

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

2 of 15

Agenda

Deep Learning | Regularization and Batchnorm (c) Abir Das

2

11 Oct 2022

  • Introduce the concepts of
    • Regularization
    • Dropout
    • Batch normalization

  • Resource: Goodfellow Book (Chapter 7)

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

3 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

3

11 Oct 2022

Regularization

  • Machine learning is concerned more about the performance on the test data than on the training data

  • According to the Goodfellow book, chapter 7 – “Many strategies used in Machine Learning are explicitly designed to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as Regularization”.

  • Also – in the book, regularization is defined as – “Any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error”.

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

4 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

4

11 Oct 2022

Regularization Strategies

  • Adding restrictions on parameter values

  • Adding constraints that are designed to encode specific kinds of prior knowledge

  • Use of ensemble methods/dropout

  • Dataset augmentation

  • In practical Deep Learning scenarios, we almost do find – the best fitting model (in the sense of minimizing generalization error) is a large model that has been regularized appropriately

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

5 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

5

11 Oct 2022

Parameter Norm Penalties

 

 

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

6 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

6

11 Oct 2022

Parameter Norm Penalties

 

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

7 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

7

11 Oct 2022

L-2 Parameter Norm Regularization

 

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

8 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

8

11 Oct 2022

Regularization Strategies: Dataset Augmentation

  • One way to get better generalization is to train on more data.
  • But under most circumstances, data is limited. Furthermore, labelling is an extremely tedious task.
  • Dataset Augmentation provides a cheap and easy way to increase the amount of training data.

Color Jitter

Horizontal Flip

And many many more

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

9 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

9

11 Oct 2022

Regularization Strategies: Dropout

  • Bagging is a technique for reducing generalization error through combining several models (Breiman, 1994)
  • Bagging: (1) Train k different models on k different subsets of training data, constructed to have the same number of examples as the original dataset through random sampling from that dataset
  • Bagging: (2) Have all of the models vote on the output for test examples
  • Dropout is a computationally inexpensive but powerful extension of Bagging
  • Training with dropout consists of training sub-networks that can be formed by removing non-output units from an underlying base network

Images courtesy: Goodfellow et. al., Karpathy et. al.

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

10 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

10

11 Oct 2022

Dropout – At Test Time

  • Ideally, the randomness would have to be integrated out.
  • Monte Carlo approximation: Do many forward passes with different random neurons dropped out. Then average out all predictions.
  • An approximation to this approximation:
    • Can this be done in a single forward pass!
    • Can this be done without dropping out any neuron during forward pass at test time!
    • 1st way: Get the output of the network at test time with all neurons on. Scale down this by multiplying it with the probability value with which neurons are dropped during training.
    • 2nd way: During training compute the output of the network that you get after dropping out neurons with probability ‘p’. During training itself, scale up this by multiplying it with (1/p). At test time, get the output as what is coming by keeping all the neurons on.

Images courtesy: Goodfellow et. al., Karpathy et. al.

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

11 of 15

Dropout (Fun Intuition)

Deep Learning | Regularization and Batchnorm (c) Abir Das

11

11 Oct 2022

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

12 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

12

11 Oct 2022

Batch Normalization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

13 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

13

11 Oct 2022

Given some intermediate values in NN,

 

 

 

 

 

 

If,

 

 

then,

 

Implementing BatchNorm

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

14 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

14

11 Oct 2022

Effect of Batch Normalization on Biases

 

 

 

 

 

 

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in

15 of 15

Deep Learning | Regularization and Batchnorm (c) Abir Das

15

11 Oct 2022

Thank you

Computer Science and Engineering| Indian Institute of Technology Kharagpur

cse.iitkgp.ac.in