1 of 48

ME5990

�

Seminar: Historical Terms� in Machine Learning (pattern recognition)

2 of 48

Outline

Supervised Learning

K-nearest neighbors
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

3 of 48

K-nearest neighborhood

Non-parametric classification
First developed in 1951 by Evelyn Fix and Joseph Hodges
Can my neighbor vote?

https://brilliantmaps.com/2020-county-election-map/

4 of 48

Problem for Start

If we were given a dataset with 2 classes:

Can we propose a voting mechanism to determine which class the green dot belongs to?

(0,1)

?

5 of 48

K: number of neighbors

(0,1)

(-1,2)

d

6 of 48

Some other distances

7 of 48

K: number of neighbors

(0,1)

(-1,2)

d

8 of 48

K: number of neighbors

K-Neighbor: K-closest point

The green point belongs to red class because among 3 neighbors:

3 out of 3 are red
0 out of 3 are blue

(0,1)

(-1,2)

d

9 of 48

K: number of neighbors

K-Neighbor: K-closest point

The green point belongs to blue class because among 3 neighbors:

1 out of 3 are red
2 out of 3 are blue

(0,1)

(-1,2)

d

10 of 48

Number of K matters

For different K, the classification may be different

11 of 48

Decision Boundary

Decision boundary

https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

12 of 48

K-nearest Neighbor

If you are near red, you are red
If you are near blue, you are blue

The people who you surround yourself with matters

13 of 48

Outline

Supervised Learning

K-nearest neighbors
Naïve Bayes Classifier
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

14 of 48

Outline

Supervised Learning

K-nearest neighbors
Naïve Bayes Classifier
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

15 of 48

Principal Component Analysis

Which feature is more important?

The direction “spread most” is most important

Looks like it is the “lucky color code”

16 of 48

PCA

The original features can be projected to another set feature with widest spread

17 of 48

PCA technique

Max principal Stress?

18 of 48

PCA

You probably were taught to solve the problem by Mohr’s circle

19 of 48

PCA

20 of 48

PCA

21 of 48

PCA cookbook (via eigenvector decoupling)

22 of 48

Demonstration of PCA

(-1,-1)

(1,1)

(3,3)

23 of 48

Demonstration of PCA

(-1,-1)

(1,1)

(3,3)

24 of 48

Demonstration of PCA

(-1,-1)

(1,1)

(3,3)

25 of 48

Demonstration of PCA

(-1,-1)

(1,1)

(3,3)

(0,0.2)

26 of 48

PCA

Too many features and you don’t know which one is most important? Do principal component analysis!
PCA has the same concept as max principal stress problem
Don’t use Mohr’s circle anymore, use Eigenvalue and eigenvectors!!

(https://amesweb.info/StressStrainTransformations/PlaneStressPrincipalStressCal/PrincipalStressCalculator.aspx)

(https://www.emathhelp.net/en/calculators/linear-algebra/eigenvalue-and-eigenvector-calculator/?i=%5B%5B100%2C40%5D%2C%5B40%2C70%5D%5D)

27 of 48

Outline

Supervised Learning

K-nearest neighbors
Naïve Bayes Classifier
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

28 of 48

Perceptron

One of the oldest classifier

Invented in 1943, before computer is invented

Mark I perceptron machine, from wikipedia

29 of 48

Decision boundary for LDF

Image from: https://www.csd.uwo.ca/~oveksler/Courses/CS434a_541a/Lecture9.pdf

30 of 48

LDF for 2 classes

(0,1)

(1,0)

Point	g(x)
(0,1)	-1 < 0
(1,0)	1 > 0
(2, 1)?

31 of 48

Hyperplane

(0,1)

(1,0)

32 of 48

LDF: augmented feature vector

33 of 48

LDF: Augmented feature vector

34 of 48

Demonstration of “augmented feature vector”

35 of 48

Summary

Traditionally, perceptron is a linear discriminator for binary classification. The data are encoded with (+1, -1).
We have a new name of perceptron: fully connected layer
Multi-layer perceptron: a feedforward network made with multiple fully connected layer

Proposed on top of perceptron in 1960s
Training method was only invented in 1970s

36 of 48

Outline

Supervised Learning

K-nearest neighbors
Naïve Bayes Classifier
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

37 of 48

Support Vector Machine

Another powerful linear classifier
Can be used to classify multiple classes

Support

38 of 48

Support vector machine

We want our line to be at “center” of these two classes

What is “center”?

We define margin by expanding the line till hit a point
We wish the margin to be as large as possible

39 of 48

Support vector machine

What is support vector?

What does “we want the margin as large as possible” mean?

We want the length of support vector to the boundary as large as possible!

Support vector

40 of 48

Support vector machine: some maths

m

41 of 48

One-to-one, one-to-rest

One-to-one

42 of 48

One-to-one, one-to-rest

One-to-rest

https://www.baeldung.com/cs/svm-multiclass-classification

43 of 48

Support Vector Machine

A really powerful linear discriminant method
Reported good at solving hand-writing problem, in bar with LeNET

44 of 48

Outline

Supervised Learning

K-nearest neighbors
Principal Component Analysis
Perceptron
Support Vector Machine
Ensemble Methods

45 of 48

Ensemble methods

Bagging (Breiman 1994,...): Bootstrap aggregating
Random forests (Breiman 2001,...)
Boosting (Freund and Schapire 1995, Friedman et al. 1998,...)

Predict class label for unseen data by aggregating a set of predictions (classifiers learned from the training data)

46 of 48

General Idea

47 of 48

Ensemble methods

All ensemble methods consist of:

Step 1: Generate multiple sub-dataset via various re-sampling technique

Bootstrapping
Pasting
…

Step 2: Train classifiers on each sub-dataset

Use same model for each sub-dataset (e.g. SVM, Naïve Bayes)

Step 3: Aggregate (combine) all classifier

Voting for classification questions is the most popular way
Find average of models (especially for regression problem)

48 of 48

Summary on Bagging

If majority’s voting is correct, we will be destined to choose the correct one
If majority’s voting is wrong, we will be destined to choose the wrong one