1 of 35

Deep Learning (DEEP-0001)�

2 – Supervised Learning

2 of 35

Supervised learning

  • Overview
  • Notation
    • Model
    • Loss function
    • Training
    • Testing
  • 1D Linear regression example
    • Model
    • Loss function
    • Training
    • Testing
  • Where are we going?

3 of 35

Supervised learning overview

  • Supervised learning model = mapping from one or more inputs to one or more outputs
  • Model is a family of equations
  • Computing the inputs from the outputs = inference
  • Model also includes parameters
  • Parameters affect outcome of equation
  • Training a model = finding parameters that predict outputs “well” from inputs for a training dataset of input/output pairs

4 of 35

Supervised learning

  • Overview
  • Notation
    • Model
    • Loss function
    • Training
    • Testing
  • 1D Linear regression example
    • Model
    • Loss function
    • Training
    • Testing
  • Where are we going?

5 of 35

6 of 35

Notation:

  • Input:

  • Output:

  • Model:

Variables always Roman letters

Normal = scalar

Bold = vector

Capital Bold = matrix

Functions always square brackets

Normal = returns scalar

Bold = returns vector

Capital Bold = returns matrix

7 of 35

Notation example:

  • Input:

  • Output:

  • Model:

Structured or tabular data

8 of 35

Model

  • Parameters:

  • Model :

Parameters always Greek letters

9 of 35

Loss function

  • Training dataset of I pairs of input/output examples:

  • Loss function or cost function measures how bad model is:

or for short:

Returns a scalar that is smaller when model maps inputs to outputs better

10 of 35

Training

  • Loss function:

  • Find the parameters that minimize the loss:

Returns a scalar that is smaller when model maps inputs to outputs better

11 of 35

Testing

  • To test the model, run on a separate test dataset of input / output pairs

  • See how well it generalizes to new data

12 of 35

Supervised learning

  • Overview
  • Notation
    • Model
    • Loss function
    • Training
    • Testing
  • 1D Linear regression example
    • Model
    • Loss function
    • Training
    • Testing
  • Where are we going?

13 of 35

Example: 1D Linear regression model

  • Model:

  • Parameters

y-offset

slope

14 of 35

Example: 1D Linear regression model

  • Model:

  • Parameters

y-offset

slope

15 of 35

Example: 1D Linear regression model

  • Model:

  • Parameters

y-offset

slope

16 of 35

Example: 1D Linear regression model

  • Model:

  • Parameters

y-offset

slope

17 of 35

Example: 1D Linear regression training data

18 of 35

Example: 1D Linear regression training data

Loss function:

“Least squares loss function”

19 of 35

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

20 of 35

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

21 of 35

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

22 of 35

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

23 of 35

Example: 1D Linear regression loss function

24 of 35

Example: 1D Linear regression loss function

25 of 35

Example: 1D Linear regression loss function

26 of 35

Example: 1D Linear regression loss function

27 of 35

Example: 1D Linear regression training

28 of 35

Example: 1D Linear regression training

29 of 35

Example: 1D Linear regression training

30 of 35

Example: 1D Linear regression training

31 of 35

Example: 1D Linear regression training

This technique is known as gradient descent

32 of 35

Possible objections

  • But you can fit the line model in closed form!
    • Yes – but we won’t be able to do this for more complex models
  • But we could exhaustively try every slope and intercept combo!
    • Yes – but we won’t be able to do this when there are a million parameters

33 of 35

Example: 1D Linear regression testing

  • Test with different set of paired input/output data
    • Measure performance
    • Degree to which this is same as training = generalization
  • Might not generalize well because
    • Model too simple
    • Model too complex
      • fits to statistical peculiarities of data
      • this is known as overfitting

34 of 35

Supervised learning

  • Overview
  • Notation
    • Model
    • Loss function
    • Training
    • Testing
  • 1D Linear regression example
    • Model
    • Loss function
    • Training
    • Testing
  • Where are we going?

35 of 35

Where are we going?

  • Shallow neural networks (a more flexible model)
  • Deep neural networks (an even more flexible model)
  • Loss functions (where did least squares come from?)
  • How to train neural networks (gradient descent and variants)
  • How to measure performance of neural networks (generalization)