1 of 96

Regression 1

2 of 96

Assumption: Linear Model

2

3 of 96

Assumption: Linear Model

3

4 of 96

Linear Regression

4

5 of 96

Linear Regression as Optimization

5

6 of 96

Re-cast Problem as Least Squares

6

7 of 96

Optimization

7

8 of 96

Optimization: Note

8

the same principle in a higher dimension

9 of 96

Revisit: Least-Square Solution

9

10 of 96

1. Solve using Linear Algebra

known as least square

10

11 of 96

1. Solve using Linear Algebra

known as least square

11

12 of 96

2. Solve using Gradient Descent

12

13 of 96

3. Solve using CVXPY Optimization

13

14 of 96

3. Solve using CVXPY Optimization

14

15 of 96

15

16 of 96

Regression with Outliers

16

17 of 96

Regression with Outliers

17

18 of 96

18

19 of 96

Think About What Makes Different

It is important to understand what makes them different

19

20 of 96

Scikit-Learn

Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license
https://scikit-learn.org/stable/index.html#

20

21 of 96

Scikit-Learn

21

22 of 96

Scikit-Learn: Regression

22

23 of 96

Scikit-Learn: Regression

23

24 of 96

Multivariate Linear Regression

Linear regression for multivariate data

Same in matrix representation

24

25 of 96

Multivariate Linear Regression

25

26 of 96

Multivariate Linear Regression

26

27 of 96

Nonlinear Regression (Actually Linear Regression)

Linear regression for non-linear data

Same as linear regression, just with non-linear features

Method 1: constructing explicit feature vectors

polynomial features
Radial basis function (RBF) features

Method 2: implicit feature vectors, kernel trick (optional)

27

28 of 96

Nonlinear Regression (Actually Linear Regression)

Polynomial (here, quad is used as an example)

28

29 of 96

Polynomial Regression

29

30 of 96

Polynomial Regression

30

31 of 96

Summary: Linear Regression

Though linear regression may seem limited, it is very powerful, since the input features can themselves include non-linear features of data

Linear regression on non-linear features of data

For least-squares loss, optimal parameters still are

31

32 of 96

Regression 2

33 of 96

Linear Regression: Advanced

33

Overfitting
Linear Basis Function Models
Regularization (Ridge and Lasso)
Evaluation

34 of 96

Overfitting: Start with Linear Regression

34

35 of 96

Recap: Nonlinear Regression

Polynomial (here, quad is used as an example)

35

36 of 96

Nonlinear Regression

36

10 input points with degree 9 (or 10)

37 of 96

Polynomial Fitting with Different Degrees

37

Low error on input data points,

but high error nearby

38 of 96

Loss

Loss: Residual Sum of Squares (RSS)

38

Minimizing loss in training data is

often not the best

Low error on input data points,

but high error nearby

39 of 96

Issue with Rich Representation

Low error on input data points, but high error nearby
Low error on training data, but high error on testing data

39

40 of 96

Function Approximation:�Linear Basis Function Model

40

41 of 96

Function Approximation

Select coefficients among a well-defined function (basis) that closely matches a target function in a task-specific way

41

42 of 96

Recap: Nonlinear Regression

Polynomial (here, quad is used as an example)

42

Different perspective:

- Approximate a target function as a linear combination of basis

43 of 96

Construct Explicit Feature Vectors

Consider linear combinations of fixed nonlinear functions

Polynomial
Radial Basis Function (RBF)

43

44 of 96

Polynomial Basis

1) Polynomial functions

44

45 of 96

RBF Basis

45

46 of 96

Linear Regression with RBF

With many features, our prediction function becomes very expensive
Can lead to overfitting

46

47 of 96

Regularization

47

48 of 96

Issue with Rich Representation

Low error on input data points, but high error nearby
Low error on training data, but high error on testing data

48

49 of 96

Generalization Error

49

50 of 96

Representational Difficulties

50

51 of 96

With Less Basis Functions: Fewer RBF Centers

51

52 of 96

With Less Basis Functions: Fewer RBF Centers

Least-squares fits for different numbers of RBFs

52

53 of 96

Representational Difficulties

53

54 of 96

Regularization (Shrinkage Methods)

Often, overfitting associated with very large estimated parameters
We want to balance

how well function fits data
magnitude of coefficients

multi-objective optimization
𝜆 is a tuning parameter

54

55 of 96

Regularization (Shrinkage Methods)

55

56 of 96

RBF: Start from Rich Representation

56

57 of 96

RBF with Regularization

57

58 of 96

58

59 of 96

59

60 of 96

RBF with LASSO

Approximated function looks similar to that of ridge regression

60

61 of 96

Non-zero coefficients indicate ‘selected’ features

61

LASSO

Ridge

62 of 96

Non-zero coefficients indicate ‘selected’ features

62

LASSO

63 of 96

Sparsity for Feature Selection using Lasso

63

64 of 96

Regression with Selected Features

64

65 of 96

LASSO vs. Ridge

Another equivalent forms of optimizations

65

66 of 96

LASSO vs. Ridge

Another equivalent forms of optimizations

66

67 of 96

L2 Regularizers: Simple Example

67

68 of 96