1 of 48

Linear Regression

How does a computer draw a line that best fits the data?

By Eric Honer

2 of 48

Recap of Machine Learning

3 of 48

Types of Learning

  • Supervised Learning: Requires labeled data
  • Unsupervised Learning: Pattern identification without labeled data
  • Linear Regression is a supervised method

4 of 48

Classification vs Regression

Classification predicts a distinct class, while regression predicts a value

5 of 48

What is Linear Regression?

Linear Regression is a form of supervised learning and is used for regression

6 of 48

What is Linear Regression

We want to find a line that fits the data

(remember line of best fit?)

7 of 48

What is Linear Regression

We want to find a line that fits the data

(remember line of best fit?)

Linear meaning:

8 of 48

What is Linear Regression

We want to find a line that fits the data

(remember line of best fit?)

Linear meaning:

  • A straight line (not bendy)

9 of 48

Which of these is a Line of Best Fit?

Line 1

Line 2

Line 3

10 of 48

How Good is our Line?

11 of 48

Find the Residual

12 of 48

Find the Residual

Answer: 0

13 of 48

Find the Residual

14 of 48

Find the Residual

Answer: 10

15 of 48

Find the Residual

16 of 48

Find the Residual

Answer: -7

17 of 48

What to do with Residuals

Residuals: [0, -5, 10, -3, 1, 7, -2]

  • Option 1: Sum the residuals

18 of 48

What to do with Residuals

Residuals: [0, -5, 10, -3, 1, 7, -2]

  • Option 1: Sum the residuals
    • BAD! The positive and negative residuals will cancel each other out

19 of 48

What to do with Residuals

Residuals: [0, -5, 10, -3, 1, 7, -2]

  • Option 1: Sum the residuals
    • BAD! Positive and negative residuals cancel each other out
  • Option 2: Sum the squared residuals (SSR)
    • GOOD! Eliminates negative errors

SSR = 02 + (-5)2 + 102 + (-3)2 + 12 + 72 + (-2)2 = 188

20 of 48

Mean Squared Error (MSE)

  • SSR - more data -> larger error
  • MSE - On average, what is the error

MSE = 188/7 = 26.85

21 of 48

How do we get here

Line 1

Line 2

Line 3

22 of 48

Equation of a Line

y = mx + b

y:

m:

x:

b:

23 of 48

Equation of a Line

y = mx + b

y: output value

m: slope

x: input value

b: y-intercept

Our goal: Find the best values for m and b that minimize the loss for all our (x, y) datapoints

24 of 48

Gradient Descent

Let’s pick a random y-intercept

25 of 48

Gradient Descent

Loss

y-intercept

26 of 48

Gradient Descent

Loss

y-intercept

27 of 48

Gradient Descent

Loss

y-intercept

28 of 48

Gradient Descent

Loss

y-intercept

29 of 48

Gradient Descent

  • Find the right slope value
    • same way and same time as y-intercept

30 of 48

Gradient Descent

Let’s pick a random slope

31 of 48

Gradient Descent

Loss

Slope

32 of 48

Gradient Descent

Loss

Slope

33 of 48

Gradient Descent

Loss

Slope

34 of 48

Gradient Descent

Loss

Slope

35 of 48

Gradient Descent

?

36 of 48

Gradient Descent

Slope

Loss

37 of 48

What does the SSR look like when we adjust one parameter?

The better our line of best fit, the lower the loss

parameter

38 of 48

Gradient Descent

When the gradient of the loss function is as close to zero as we can get, it means we have found the optimal parameters

39 of 48

How Much do we Adjust?

  • Find gradient
  • Multiply gradient by

learning rate

  • Subtract from

original parameter value

40 of 48

Learning Rate

  • The learning rate determines how much to adjust a parameter
  • Like a step size
  • You choose the learning rate (such as 0.1 or 0.01)
    • Too large - overshoot minima
    • Too small - takes too long to converge

41 of 48

Updating a parameter

  • Current slope: 1.5
  • Current y-intercept: 40
  • The gradient of the loss with respect to the slope: -5.
  • The gradient of the loss with respect to the y-intercept: 100.
  • The learning rate: 0.1.
  • Find new slope and y-intercept after one iteration

Remember: new Weight = Weight - LR * gradient of loss with respect to Weight

42 of 48

Updating a parameter

  • Current slope: 1.5
  • Current y-intercept: 40
  • The gradient of the loss with respect to the slope: -5.
  • The gradient of the loss with respect to the y-intercept: 100.
  • The learning rate: 0.1.
  • Find new slope and y-intercept after one iteration

Remember: new Weight = Weight - LR * gradient of loss with respect to Weight

New slope: 2

New y-intercept: 30

43 of 48

It’s Just a Ball Rolling Down a Hill

44 of 48

Higher dimensions?!?!

What if we want to predict a student’s grade based on the # of hours they studied AND the # of hours of sleep they got? Is that possible?

YES!

Now, our equation is

We use the same gradient descent approach just with one extra parameter

45 of 48

Let’s put all of that knowledge together with Python

Go to the code!

46 of 48

Congrats! You’ve Learned Linear Regression!

You’re already ahead of the masses! This is the first step towards learning about more advanced cutting-edge models that shape the world today.

47 of 48

48 of 48

Thank you!

Keep in Touch :

discord.gg/santacruzai

linktr.ee/santacruzai