1 of 96

Regression 1

2 of 96

Assumption: Linear Model

  •  

2

3 of 96

Assumption: Linear Model

  •  

3

4 of 96

Linear Regression

  •  

4

5 of 96

Linear Regression as Optimization

  •  

5

6 of 96

Re-cast Problem as Least Squares

  •  

6

7 of 96

Optimization

7

8 of 96

Optimization: Note

8

the same principle in a higher dimension

9 of 96

Revisit: Least-Square Solution

  •  

9

10 of 96

1. Solve using Linear Algebra

  • known as least square

10

11 of 96

1. Solve using Linear Algebra

  • known as least square

11

12 of 96

2. Solve using Gradient Descent

12

13 of 96

3. Solve using CVXPY Optimization

13

14 of 96

3. Solve using CVXPY Optimization

  •  

14

15 of 96

 

  •  

15

16 of 96

Regression with Outliers

  •  

16

17 of 96

Regression with Outliers

17

18 of 96

 

18

19 of 96

Think About What Makes Different

  • It is important to understand what makes them different

19

20 of 96

Scikit-Learn

  • Machine Learning in Python
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • https://scikit-learn.org/stable/index.html#

20

21 of 96

Scikit-Learn

21

22 of 96

Scikit-Learn: Regression

22

23 of 96

Scikit-Learn: Regression

23

24 of 96

Multivariate Linear Regression

  • Linear regression for multivariate data

  • Same in matrix representation

24

25 of 96

Multivariate Linear Regression

25

26 of 96

Multivariate Linear Regression

26

27 of 96

Nonlinear Regression (Actually Linear Regression)

  • Linear regression for non-linear data

  • Same as linear regression, just with non-linear features

  • Method 1: constructing explicit feature vectors
    • polynomial features
    • Radial basis function (RBF) features

  • Method 2: implicit feature vectors, kernel trick (optional)

27

28 of 96

Nonlinear Regression (Actually Linear Regression)

  • Polynomial (here, quad is used as an example)

28

29 of 96

Polynomial Regression

29

30 of 96

Polynomial Regression

30

31 of 96

Summary: Linear Regression

  • Though linear regression may seem limited, it is very powerful, since the input features can themselves include non-linear features of data

  • Linear regression on non-linear features of data

  • For least-squares loss, optimal parameters still are

31

32 of 96

Regression 2

33 of 96

Linear Regression: Advanced

33

  • Overfitting
  • Linear Basis Function Models
  • Regularization (Ridge and Lasso)
  • Evaluation

34 of 96

Overfitting: Start with Linear Regression

34

35 of 96

Recap: Nonlinear Regression

  • Polynomial (here, quad is used as an example)

35

36 of 96

Nonlinear Regression

36

10 input points with degree 9 (or 10)

37 of 96

Polynomial Fitting with Different Degrees

37

Low error on input data points,

but high error nearby

38 of 96

Loss

  • Loss: Residual Sum of Squares (RSS)

38

Minimizing loss in training data is

often not the best

Low error on input data points,

but high error nearby

39 of 96

Issue with Rich Representation

  • Low error on input data points, but high error nearby
  • Low error on training data, but high error on testing data

39

40 of 96

Function Approximation:�Linear Basis Function Model

40

41 of 96

Function Approximation

  • Select coefficients among a well-defined function (basis) that closely matches a target function in a task-specific way

41

42 of 96

Recap: Nonlinear Regression

  • Polynomial (here, quad is used as an example)

42

Different perspective:

- Approximate a target function as a linear combination of basis

43 of 96

Construct Explicit Feature Vectors

  • Consider linear combinations of fixed nonlinear functions
    • Polynomial
    • Radial Basis Function (RBF)

43

44 of 96

Polynomial Basis

1) Polynomial functions

44

45 of 96

RBF Basis

  •  

45

46 of 96

Linear Regression with RBF

  • With many features, our prediction function becomes very expensive
  • Can lead to overfitting

46

47 of 96

Regularization

47

48 of 96

Issue with Rich Representation

  • Low error on input data points, but high error nearby
  • Low error on training data, but high error on testing data

48

49 of 96

Generalization Error

  •  

49

50 of 96

Representational Difficulties

  •  

50

51 of 96

With Less Basis Functions: Fewer RBF Centers

51

52 of 96

With Less Basis Functions: Fewer RBF Centers

  • Least-squares fits for different numbers of RBFs

52

53 of 96

Representational Difficulties

  •  

53

54 of 96

Regularization (Shrinkage Methods)

  • Often, overfitting associated with very large estimated parameters
  • We want to balance
    • how well function fits data
    • magnitude of coefficients 

    • multi-objective optimization
    • 𝜆 is a tuning parameter

54

55 of 96

Regularization (Shrinkage Methods)

  •  

55

56 of 96

RBF: Start from Rich Representation

56

57 of 96

RBF with Regularization

  •  

57

58 of 96

 

58

59 of 96

 

  •  

59

60 of 96

RBF with LASSO

  • Approximated function looks similar to that of ridge regression

60

61 of 96

 

  • Non-zero coefficients indicate ‘selected’ features

61

LASSO

Ridge

62 of 96

 

  • Non-zero coefficients indicate ‘selected’ features

62

LASSO

63 of 96

Sparsity for Feature Selection using Lasso

  •  

63

64 of 96

Regression with Selected Features

64

65 of 96

LASSO vs. Ridge

  • Another equivalent forms of optimizations

65

66 of 96

LASSO vs. Ridge

  • Another equivalent forms of optimizations

66

67 of 96

L2 Regularizers: Simple Example

67

68 of 96

L1 Regularizers: Simple Example

68

69 of 96

L1 Regularizers: Simple Example

69

70 of 96

L1 Regularizers: Simple Example

70

71 of 96

L1 Regularizers: Simple Example

71

72 of 96

L1 Regularizers: Simple Example

72

73 of 96

L1 Regularizers: Simple Example

73

74 of 96

L1 Regularizers: Simple Example

74

75 of 96

L1 Regularizers: Simple Example

75

76 of 96

L1 Regularizers: Simple Example

76

77 of 96

Evaluation

  • Adding more features will always decrease the loss
  • How do we determine when an algorithm achieves “good” performance?

  • A better criterion:
    • Training set (e.g., 70 %)
    • Testing set (e.g., 30 %)

  • Performance on testing set called generalization performance

77

78 of 96

Regression 3

79 of 96

Linear Regression Examples

79

  • De-noising
  • Total Variation

80 of 96

De-noising Signal

  •  

80

81 of 96

Transform it to an Optimization Problem

  •  

81

Source:

      • Boyd & Vandenberghe's book "Convex Optimization
      • http://cvxr.com/cvx/examples/ (Figures 6.8-6.10: Quadratic smoothing)
      • Week 4 of Linear and Integer Programming by Coursera of Univ. of Colorado

82 of 96

Transform it to an Optimization Problem

  •  

82

83 of 96

Least-Square Problems

  •  

83

84 of 96

Coded in Python

84

85 of 96

 

85

86 of 96

CVXPY Implementation

86

87 of 96

 

87

88 of 96

 

88

89 of 96

 

89

90 of 96

 

90

91 of 96

Signal with Sharp Transition + Noise

  •  

91

Chapter 6.3 from Boyd & Vandenberghe's book "Convex Optimization

92 of 96

 

  • Quadratic smoothing smooths out both noise and sharp transitions in signal, but this is not what we want
  • We will not be able to preserve the signal’s sharp transitions.

  • Any ideas ?

92

93 of 96

 

  •  

93

94 of 96

 

  • Total Variation (TV) smoothing preserves sharp transitions in signal, and this is not bad

  • Note that how TV reconstruction does a better job of preserving the sharp transitions in the signal while removing the noise.

94

95 of 96

Total Variation Image

  •  

95

Idea comes from http://www2.compute.dtu.dk/~pcha/mxTV/

96 of 96

Total Variation Image

96

Idea comes from http://www2.compute.dtu.dk/~pcha/mxTV/