1 of 7

Regularization

Jason Lu

2 of 7

Overfitting

  • When a model memorizes instead of learn the data
    • The model fits the training data too closely, failing to generalize
  • This is a big issue, because the model needs to be applicable to the real world, which isn’t part of our training data

Overfitting

👇

3 of 7

Recap - Linear Regression

  • In the previous slides we talked about linear regression

  • Loss function (MSE) 👇

  • Gradient Update Rule 👇

What is we have too many features or large coefficients?

4 of 7

Why Regularization?

  • As coefficients for our model increases, our model’s variance increases
  • Larger weights ➡️High Variance ➡️Overfitting➡️Poor Test Performance
  • Regularization adds a penalty for large weights/coefficients
  • The key idea is that simpler models generalize better

5 of 7

Ridge (L2) Regularization

  • We modify the loss function to include a squared penalty term
  • Coefficients shrink but rarely to zero
  • λ is the regularization strength (can be tuned)

6 of 7

Lasso (L1) Regularization

  • We modify the loss function to include an absolute penalty term
  • Produces sparse models, so some weights become exactly zero
  • Lasso performs feature selection by removing unnecessary features

7 of 7

Summary

  • Overfitting is memorization, not learning
  • Regularization penalizes complexity of models (large weights in the case of linear regression)
  • Ridge
    • Shrinkage
    • Squared penalty term
  • Lasso
    • Sparsity
    • Absolute value penalty term