1 of 38

Machine Learning - �Basic Principles & Practice�7. Occam’s Razor

Cong Li 李聪

机器学习 - 基础原理与实践

7. 奥卡姆剃刀

2 of 38

Should You Still Remember … �倘若你依然记得……

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Sense 词义

Example 例句

兴趣

Reading is one of my interests

利益

In this case, my interests conflict with yours

利息

The Fed cut interest rates again

Best performance achieved using a selected set of most useful attributes �选择一部分最有用的属性时取得了最佳效果

3 of 38

Reality 现实

  • The ‘Best’ Performance 该“最佳”效果
    • Typically not achievable in practice �实践中往往很难实现
  • Reasons 原因
    • Difficult to rate attribute importance �难以衡量属性的重要性
    • Difficult to determine attribute number 难以确定属性数量
      • Too few attributes bad performance even w/ large amount of data 属性太少 即便训练数据再多也难以取得好的效果

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

4 of 38

Can We Do Better? �我们能做得更好吗?

  • It’s Not Easy 这并不简单
  • Introduce a New Principle�引入一个新原则
    • Occam’s Razor by Philosopher William of Ockham 哲学家奥卡姆的威廉的奥卡姆剃刀

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

5 of 38

Different Formulations�不同的表述

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Entities should not be multiplied without necessity.

若无必要,勿增实体。

The simplest solution is most likely the right one.

最简单的方案极可能是恰当的方案。

When presented with competing hypotheses that make the same predictions, one should select the solution with the fewest assumptions.

当面对多种能作出同样准确的预言的论断时,那么应该挑选其中使用假设最少的。

6 of 38

In Machine Learning �在机器学习中

  • Make the Same Predictions�作出同样准确的预言
    • Perform similarly on training data �在训练数据上表现类似
  • The Fewest Assumptions 假设最少的
    • Assumption to make classifiers ‘simple’, e.g., 使候选分类器“简单”的假设,例如
      • Use fewest attributes 用最少的属性
      • ‘Classification capability’ of the candidate classifier set is weak �候选分类器集合的“分类能力”弱

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

7 of 38

Illustration: Fewer Attributes�说明:更少的属性 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Use both attributes�用两个属性

8 of 38

Illustration: Fewer Attributes�说明:更少的属性 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Use only 1 attribute�只用一个属性

9 of 38

Illustration: Fewer Attributes�说明:更少的属性 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Which category of classifiers is simpler?�哪种类型的分类器更简单?

Which category’s classification capability is weaker?�哪种类型的分类能力更弱?

10 of 38

Illustration: Fewer Attributes�说明:更少的属性 (2)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

Both lines separate the training data correctly. Which one do we choose?

两条线都正确地划分了训练数据。那我们选哪个?

11 of 38

Illustration: Fewer Attributes�说明:更少的属性 (2)

Machine Learning – Basic Principles & Practice: 6. The Curse of Dimensionality

机器学习 – 基础原理与实践:6. 维度的诅咒

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

More data tells the result

更多的数据阐明了结果

More data

更多数据

12 of 38

Unfortunately 不幸的是

  • Given More Attributes �当存在更多的属性之时
    • Intractable to exhaust the combinations of a certain number of attributes �遍历一定数量的属性组合是难于处理的
  • The Alternative 别的办法
    • Broader concept on ‘classification capability’ of the candidate classifier set 为候选分类器集合的“分类能力”指定一个更宽泛的定义
      • Starting from something intuitive: margin �从直观的事物开始:间距

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

13 of 38

Margin to a Data Sample�距离某个数据点的间距

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

A linear classifier

一个线性分类器

A data sample

一个数据点

Margin

间距

14 of 38

Large Margin Classification�大间距分类

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

Not only separate training data correctly 不仅正确地划分了训练数据

But also maintain a certain margin to all the data samples �而且离所有的数据都有一定的间距

15 of 38

Large Margin Classification�大间距分类

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

Only separate training data correctly

仅正确地划分了训练数据

Very small margin for a certain sample�离某个数据点间距非常小

16 of 38

Large Margin Classification�大间距分类

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

Which one performs better?

哪个更好?

Unseen data 未见数据

Assuming an ‘even’ data distribution

假设数据分布“均匀”

Large margin provides a ‘safe’ buffer area

大间距提供了一个“安全”的缓冲区

17 of 38

Large Margin Classification�大间距分类

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Training data 训练数据

Unseen data 未见数据

Which one performs better?

哪个更好?

‘Safe’ buffer area lost here

这里不存在“安全”的缓冲区

18 of 38

Learning Linear Classifiers�学习线性分类器 (1)

  • Classifier Learned 学习到的分类器
    • Usually depend on training samples around classification boundary �取决于分类边界附近的训练数据

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Training samples around classification boundary�分类边界附近的训练数据

19 of 38

Learning Linear Classifiers�学习线性分类器 (2)

  • Learning 学习
    • Separate training data correctly �正确划分训练数据
    • Leave a certain margin 预留一些间距
      • Classify training data ‘confidently’�对训练数据进行“更有置信度”的分类
  • Intuitive Explanation 直观解释
    • More likely to tolerate unseen samples close to those training samples around classification boundary 更有可能容忍靠近分类边界附近的训练数据的未见数据

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

20 of 38

Perceptron Algorithm w/ Margins 带间距的感知器算法

  •  

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

21 of 38

Practice 7.1 实践7.1

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Practice time: use perceptron w/ margins on the synthetic problem �实践时刻:在人造问题上使用带间距的感知器

How does it perform comparing w/ simple perceptron?

和简单的感知器相比,它的表现如何?

22 of 38

Result 结果

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Perceptron w/ margins performs better

带间隔的感知器表现更好

23 of 38

Example 实例 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Ground truth classification boundary 真实分类边界

 

Learned w/ 7 attributes w/o margins 不带间距用7个属性学习到的

 

Learned w/ 11 attributes w/o margins 不带间距用11个属性学习到的

 

Learned w/ 11 attributes w margins 带间距用11个属性学习到的

 

24 of 38

Example 实例 (2)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

w1

w2

w3

w4

w5

w6

w7

w8

w9

w10

w11

b

T

1.02

0.93

-0.86

0.46

0.33

0.30

0.26

0.16

0.04

0.03

0.00

1.00

7

0.47

1.17

-0.48

0.51

0.36

-0.12

1.12

0.00

0.00

0.00

0.00

1.00

11

0.44

0.99

-0.32

0.70

0.55

0.29

1.55

-0.76

0.36

0.35

1.09

1.00

11-M

0.40

0.69

-0.69

0.62

0.58

-0.19

0.56

-0.65

0.12

-0.10

0.72

1.00

In learning w/ 11 attributes, the overrating or incorrect rating issue gets alleviated

用11个属性学习时,高估和错估的问题被改善了

25 of 38

What’s the Insight?�我们洞察到了什么?

  • Perceptron w/ Margins 带间距的感知器
    • Perform better in the synthetic problem�在人造问题中表现更好
  • The Question 问题是
    • How is that relevant to Occam’s Razor in terms of classification capability of candidate classifier set 这和关注候选分类器集合的分类能力的奥卡姆剃刀有关吗?
    • We start from a math theorem�我们从一个数学定理开始

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

26 of 38

Math Theorem 数学定理

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

 

 

 

A classifier learned given all training samples correctly classified

学习到一个正确分类了所有训练数据的分类器

 

 

27 of 38

Upper Bound of Error 错误上界

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

 

Because we tolerate more exceptions�因为我们能接受更多的例外

 

 

28 of 38

Upper Bound of Error 错误上界

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

 

Because we have seen & adapted to more data, less likely we encounter unexpected cases in the future �因为我们已经看到并适应于更多的数据,将来遇到意外的情况的可能性就越小

 

 

29 of 38

Upper Bound of Error 错误上界

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

 

Occam’s Razor in machine learning (proved in math)�(被数学证明的)机器学习中的奥卡姆剃刀

 

 

We are not going through the proof or the exact definition of VC dimension (classification capability), but will provide an intuitive explanation 我们不会讲这个证明或VC维(分类能力)的准确定义,但会给出一个直观的解释

30 of 38

Classification Capability�分类能力

  • Candidate Classifier Set �候选分类器集合
    • Contain many candidate classifiers �包含了很多候选分类器
  • Its Capability 它的能力
    • If we can always handle different complex classification problems w/ right candidate classifiers from the set �如果我们总是能用该集合中合适的候选分类器来处理各种复杂的分类问题
    • Then its capability is strong & vice versa �那么它的能力就很强,反之则弱

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

31 of 38

VC Dimension VC

  •  

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

32 of 38

Margin & VC Dimension �间距和VC维 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Given some data

给定一些数据

Any line is a candidate w/o margins

不考虑间距,每条线都是一个候选分类器

33 of 38

Margin & VC Dimension �间距和VC维 (1)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Attribute 1

属性1

Attribute 2 属性2

Given some data

给定一些数据

Setting a required margin

设置一个间距

Some lines are eliminated from the candidate set

一些直线从候选集中被剔除了

Required margin

间距

Less candidates remain in the set

留在集合中的候选少了

34 of 38

Margin & VC Dimension �间距和VC维 (2)

  •  

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

35 of 38

Margin & VC Dimension �间距和VC维 (3)

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

 

Larger margin 间距越大

 

36 of 38

Practice 7.2 实践7.2

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Practice time: use perceptron w/ margins on word sense disambiguation

实践时刻:将带间距的感知器用于词义消歧

Does it perform better? 它是否有更好的表现?

37 of 38

Result 结果

Machine Learning – Basic Principles & Practice: 7. Occam’s Razor

机器学习 – 基础原理与实践:7. 奥卡姆剃刀

Methods 方法

Accuracy 准确率

Nearest neighbor 最近邻

69.3%

Perceptron 感知器

83.6%

Perceptron w/ margin 带间距的感知器

85.7%

38 of 38

The End