LO 4.2.2.G
Learning Objective: Describe the advantage of Lasso over the Ridge regression.
Review:


- Unlike Ridge regression, Lasso can perform variable selection.
- Ridge regression will shrink all of the coefficients towards zero, but it will not set any of them exactly to zero (unless λ = ∞).
- In the case of the Lasso, the l1 component of the loss function

will force some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large.
EXTRA

With reference to the Lasso loss function and the figure shown below:
- When λ = 0, then the lasso simply gives the least-squares fit.
- When λ becomes sufficiently large, the lasso gives the null model in which all coefficient estimates equal zero.
- In between these two extremes, depending on the value of λ, the Lasso can produce a model involving any number of variables.
- Lasso will generate a model involving a subset p-q predictors.

Source: Assigned reading
Curves: Income (black), Limit (red), Rating ( blue), Student (yellow)

With reference to the Ridge loss function and the figure shown below:
- When λ = 0, then the Ridge regression simply gives the least-squares fit.
- When λ is extremely large, then all of the ridge coefficient estimates are basically zero; this corresponds to the null model that contains no predictors.
- In between these two extremes, increasing the value of λ will tend to reduce the magnitudes of the coefficients, but will not result in the exclusion of any of the variables.
- Ridge regression will always generate a model involving all p predictors.

Source: Assigned reading