JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 7

Regularization

Jason Lu

2 of 7

Overfitting

When a model memorizes instead of learn the data

The model fits the training data too closely, failing to generalize

This is a big issue, because the model needs to be applicable to the real world, which isn’t part of our training data

Overfitting

👇

3 of 7

Recap - Linear Regression

In the previous slides we talked about linear regression

Loss function (MSE) 👇

Gradient Update Rule 👇

What is we have too many features or large coefficients?

4 of 7

Why Regularization?

As coefficients for our model increases, our model’s variance increases
Larger weights ➡️High Variance ➡️Overfitting➡️Poor Test Performance
Regularization adds a penalty for large weights/coefficients
The key idea is that simpler models generalize better

5 of 7

Ridge (L2) Regularization

We modify the loss function to include a squared penalty term
Coefficients shrink but rarely to zero
λ is the regularization strength (can be tuned)

6 of 7

Lasso (L1) Regularization

We modify the loss function to include an absolute penalty term
Produces sparse models, so some weights become exactly zero
Lasso performs feature selection by removing unnecessary features

7 of 7

Summary

Overfitting is memorization, not learning
Regularization penalizes complexity of models (large weights in the case of linear regression)
Ridge

Shrinkage
Squared penalty term

Lasso

Sparsity
Absolute value penalty term