JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 96

Classification: Perceptron

2 of 96

Classification

2

3 of 96

Classification

We will learn

Perceptron
Support vector machine (SVM)
Logistic regression

To find

a classification boundary

3

4 of 96

Perceptron

4

5 of 96

Perceptron

5

6 of 96

Perceptron

6

7 of 96

Distance from a Line

7

8 of 96

8

9 of 96

9

10 of 96

10

11 of 96

Sign

Sign with respect to a line

11

12 of 96

12

13 of 96

Perceptron Algorithm

The perceptron implements

Given the training set

pick a misclassified point

and update the weight vector

13

14 of 96

Perceptron Algorithm

14

15 of 96

Iterations of Perceptron

15

16 of 96

Diagram of Perceptron

16

17 of 96

Perceptron Loss Function

17

18 of 96

Perceptron Algorithm in Python

18

19 of 96

Perceptron Algorithm in Python

19

20 of 96

Perceptron Algorithm in Python

20

21 of 96

Perceptron Algorithm in Python

21

22 of 96

Perceptron Algorithm in Python

22

23 of 96

Perceptron Algorithm in Python

23

24 of 96

Scikit-Learn for Perceptron

24

25 of 96

The Best Hyperplane Separator?

Perceptron finds one of the many possible hyperplanes separating the data if one exists
Of the many possible choices, which one is the best?

Utilize distance information
Intuitively we want the hyperplane having the maximum margin
Large margin leads to good generalization on the test data

we will see this formally when we discuss Support Vector Machine (SVM)

Utilize distance information from all data samples

We will see this formally when we discuss the logistic regression

Perceptron will be shown to be a basic unit for neural networks and deep learning later

25

26 of 96

Support Vector Machine

27 of 96

Classification (Linear)

Autonomously figure out which category (or class) an unknown item should be categorized into

Number of categories / classes

Binary: 2 different classes
Multiclass: more than 2 classes

Feature

The measurable parts that make up the unknown item (or the information you have available to categorize)

27

28 of 96

Distance from a Line

28

29 of 96

29

30 of 96

30

31 of 96

31

32 of 96

32

33 of 96

33

34 of 96

Illustrative Example

34

35 of 96

Hyperplane

35

36 of 96

Decision Making

36

37 of 96

Decision Boundary or Band

37

38 of 96

Data Generation for Classification

38

39 of 96

Optimization Formulation 1

39

40 of 96

Optimization Formulation 1

40

41 of 96

CVXPY 1

41

42 of 96

CVXPY 1

42

43 of 96

Linear Classification: Outlier

Note that in the real world, you may have noise, errors, or outliers that do not accurately represent the actual phenomena

Linearly non-separable case

43

44 of 96

Outliers

No solutions (hyperplane) exist

We have to allow some training examples to be misclassified !
but we want their number to be minimized

44

45 of 96

Optimization Formulation 2

45

46 of 96

Optimization Formulation 2

The optimization problem for the non-separable case

46

47 of 96

Expressed in a Matrix Form

47

48 of 96

CVXPY 2

48

49 of 96

Further Improvement

Notice that hyperplane is not as accurately represent the division due to the outlier

Can we do better when there are noise data or outliers?
Yes, but we need to look beyond linear programming

Idea: large margin leads to good generalization on the test data��

49

50 of 96

Maximize Margin

50

51 of 96

Support Vector Machine

51

52 of 96

Support Vector Machine

In a more compact form

52

53 of 96

Scikit-learn

53

54 of 96

Classifying Non-linear Separable Data

54

55 of 96

Classifying Non-linear Separable Data

55

56 of 96

Classifying Non-linear Separable Data

56

57 of 96

Kernel

Often we want to capture nonlinear patterns in the data

nonlinear regression: input and output relationship may not be linear
nonlinear classification: classes may note be separable by a linear boundary

Linear models (e.g. linear regression, linear SVM) are not just rich enough

by mapping data to higher dimensions where it exhibits linear patterns
apply the linear model in the new input feature space
mapping = changing the feature representation

Kernels: make linear model work in nonlinear settings

57

58 of 96

Nonlinear Classification

58

https://www.youtube.com/watch?v=3liCbRZPrZA

59 of 96

Classifying Non-linear Separable Data

59

60 of 96

Classifying Non-linear Separable Data

60

61 of 96

Classifying Non-linear Separable Data

61

62 of 96

Logistic Regression

63 of 96

Linear Classification: Logistic Regression

Logistic regression is a classification algorithm

don't be confused

Perceptron: make use of sign of data

SVM: make use of margin (minimum distance)

Distance from two closest data points

We want to use distance information of all data points

logistic regression

63

64 of 96

Using Distances

64

65 of 96

Using Distances

65

66 of 96

Using Distances

66

67 of 96

Using Distances

67

68 of 96

Using all Distances

68

69 of 96

Using all Distances

69

70 of 96

Using all Distances with Outliers

SVM vs. Logistic Regression

70

SVM

Logistic Regression

71 of 96

Sigmoid Function

71

Step function

72 of 96

Sigmoid Function

72

73 of 96

Sigmoid Function

73

74 of 96

Goal: We Need to Fit 𝜔 to Data

74

75 of 96

Goal: We Need to Fit 𝜔 to Data

It would be easier to work on the log likelihood.

The logistic regression problem can be solved as a (convex) optimization problem:

Again, it is an optimization problem

75

76 of 96

Logistic Regression using GD

76

77 of 96

Gradient Descent

To use the gradient descent method, we need to find the derivative of it

We need to compute

77

78 of 96

Gradient Descent

78

79 of 96

Gradient Descent for Logistic Regression

Maximization problem
Be careful on matrix shape

79

80 of 96

Python Implementation

80

81 of 96

Python Implementation

81

82 of 96

Logistic Regression using CVXPY

82

83 of 96

Probabilistic Approach (or MLE)

83

84 of 96

Probabilistic Approach (or MLE)

84

85 of 96

CVXPY Implementation

85

86 of 96

In a More Compact Form

86

87 of 96

CVXPY Implementation

87

88 of 96

Logistic Regression using Scikit-Learn

88

89 of 96

Logistic Regression using Scikit-Learn

89

90 of 96

Non-linear Classification

90

91 of 96

Non-linear Classification

Same idea as non-linear regression: non-linear features

Explicit or implicit Kernel

91

92 of 96

Explicit Kernel

92

93 of 96

Non-linear Classification

93

94 of 96

Multiclass Classification

94

95 of 96

Multiclass Classification

Generalization to more than 2 classes is straightforward

one vs. all (one vs. rest)
one vs. one

95

96 of 96

Multiclass Classification

96