1 of 18

Machine Learning - �Basic Principles & Practice�3. Nearest Neighbor Classification

Cong Li 李聪

机器学习 - 基础原理与实践

3. 最近邻分类

2 of 18

Rote Learning Is Impractical �死记硬背的学习不实用

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

病人

发烧

鼻塞

流涕

畏寒

头痛

感冒

1

2

3

4

5

6

7

8

9

*

???

Recall the example 回忆一下这个例子

3 of 18

Can We Improve Simply?�我们能否进行简单的改进?

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

4 of 18

Idea 思路 (1)

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

He who walks with wise men will be wise, but the companion of fools will suffer harm

- Proverbs 13:20

近朱者赤,近墨者黑

-《太子少傅箴》

Show me who your friends are, and I’ll tell you who you are.

- Old adage

5 of 18

Idea 思路 (2)

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

6 of 18

An Abstract Example �一个抽象的例子

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

7 of 18

A Concrete Example �一个具体的例子

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

病人

发烧

鼻塞

流涕

畏寒

头痛

感冒

1

2

3

4

5

6

7

8

9

*

?

*

Now we have improved it!

现在我们改进了死记硬背学习!

8 of 18

Nearest Neighbor Classification 最近邻分类

  • Training 训练
    • Memorize all the training data 记住所有的训练数据
      • Like rote learning 和死记硬背学习一样
  • Classification 分类
    • Find the closest neighbor (the most similar training data sample) 找到最近的邻居(最相似的训练数据)
    • Take the neighbor’s class 采纳邻居的类别

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

9 of 18

Do You Still Remember?�你还记得吗?

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

No assumption, no learning!

无假设,不学习!

Now where is the assumption?

那么假设在哪里?

10 of 18

Assumption 假设

  • Similar Data Samples Come from the Same Class 相似的数据来自同一个类
  • Assumption Embedded 隐含的假设
    • How do you define similarity? 你怎么定义相似度?
      • Different definitions imply different assumptions 不同的定义意味着不同的假设

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

11 of 18

Similarity/Distance 相似度/距离

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

病人

发烧

鼻塞

流涕

畏寒

头痛

a

b

0

0

1

0

+

1

+

+

+

= 2

Figure out the difference of values for each attribute, and then aggregate the differences

找出每个属性上的差异,然后把这些差异汇总起来

属性1

属性2

属性3

属性n

2.1

6.1

-1.7

7.2

5.6

-1.3

-3.3

0

3.5

7.4

1.6

7.2

+

+

+

+

12 of 18

Practice 3.1 实践3.1

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Practice time: try nearest neighbor classification

实践时刻:尝试最近邻分类

Use nearest neighbor classification to recognize handwritten ZIP codes

用最近邻分类进行手写邮政编码识别

13 of 18

Practice 3.2 实践3.2

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Practice time: accelerate nearest neighbor classification

实践时刻:加速最近邻分类

Use NumPy library

用NumPy库

14 of 18

Euclidean Distance 欧氏距离 (1)

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Attribute 1

属性1

Attribute 2 属性2

Data sample 1

数据1: (-3,1)

Data sample 2

数据2: (1,-2)

4

3

 

 

15 of 18

Euclidean Distance 欧氏距离 (2)

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Attribute 1

属性1

Attribute 2 属性2

Data sample 1

数据1: (-3,1,-4)

Data sample 2

数据2: (1,-2,8)

4

3

 

 

Attribute 3 属性3

 

12

16 of 18

Euclidean Distance 欧氏距离 (3)

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Spaces w/ dimension greater than 3 go beyond our imagination

高于3维的空间超出了我们的想象范畴

However, the calculation is similar

但计算是类似的

 

17 of 18

Practice 3.3 实践3.3

Machine Learning – Basic Principles & Practice: 3. Nearest Neighbor Classification

机器学习 – 基础原理与实践:3. 最近邻分类

Practice time: try nearest neighbor classification w/ Euclidean distance 实践时刻:尝试基于欧氏距离的最近邻分类

Use Euclidean distance in recognizing handwritten ZIP codes

在手写邮政编码识别中使用欧氏距离

18 of 18

The End