(Slides adapted from Sandrine Dudoit and Joey Gonzalez)
UC Berkeley Data 100 Summer 2019
Sam Lau
Learning goals:
Announcements
Cross-Validation
Simple Validation
Sample
Training Set
Validation Set
Test Set
Training Error
Validation Error
Test Error
Used to fit a model.
Used to choose a model.
Used to report final accuracy.
Assessing Model Risk
Training Set
Validation Set
Test Set
Training Error
Validation Error
Test Error
Used to fit a model.
Used to choose a model.
Used to report final accuracy.
Minimizes empirical risk
Estimates population risk
“Clean” estimate of population risk
Model Selection
K-Fold CV
3-Fold CV
Sample
Rest of Sample
Test Set
Validation Set
Training Set
Fold 1
Validation Set
Training Set
Training Set
Fold 2
We repeat this entire process for each model we want to try out.
Validation Set
Training Set
Fold 3
K-Fold CV Analysis
(Demo)
Estimating Risk, Bias, and Variance
Regularization
Weighty Issues
Large model weights create complicated models.
Idea: Prevent large weights to make simpler models.
Regularization
λ: Regularization parameter (non-negative)
Same ol’ loss as usual
Penalty for θ values
Ridge and Lasso Regression
L2 norm
(Demo)
L1 norm
Regularization Parameter
L2
L1
What happens when...
Don’t regularize the bias
Normalize Data Before Using Regularization
Exercise to take home:
Why two kinds of regularization?
A more sophisticated explanation
Suppose we have a linear model with two parameters and no intercept term.
As we tweak the two parameters, loss changes.
Without regularization, we just pick θ̂.
θ1
θ2
θ̂
A more sophisticated explanation
Regularization balances loss with the regularization penalty.
For L2 regularization, we have circular contours for the penalty. Why?
θ1
θ2
θ̂
θ̂ with L2 regularization
A more sophisticated explanation
For L1 regularization, we have diamond-shaped contours for the penalty. Why?
Notice that this sets one parameter = 0!
This idea extends to multiple dimensions.
θ1
θ2
θ̂
θ̂ with L1 regularization
A tuning knob for bias-variance
Summary