Machine Learning - �Basic Principles & Practice�7. Occam’s Razor
Cong Li 李聪
机器学习 - 基础原理与实践
7. 奥卡姆剃刀
Should You Still Remember … �倘若你依然记得……
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Sense 词义 | Example 例句 |
兴趣 | Reading is one of my interests |
利益 | In this case, my interests conflict with yours |
利息 | The Fed cut interest rates again |
… | … |
Best performance achieved using a selected set of most useful attributes �选择一部分最有用的属性时取得了最佳效果
Reality 现实
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Can We Do Better? �我们能做得更好吗?
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Different Formulations�不同的表述
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Entities should not be multiplied without necessity.
若无必要,勿增实体。
The simplest solution is most likely the right one.
最简单的方案极可能是恰当的方案。
When presented with competing hypotheses that make the same predictions, one should select the solution with the fewest assumptions.
当面对多种能作出同样准确的预言的论断时,那么应该挑选其中使用假设最少的。
In Machine Learning �在机器学习中
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Illustration: Fewer Attributes�说明:更少的属性 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Use both attributes�用两个属性
Illustration: Fewer Attributes�说明:更少的属性 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Use only 1 attribute�只用一个属性
Illustration: Fewer Attributes�说明:更少的属性 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Which category of classifiers is simpler?�哪种类型的分类器更简单?
Which category’s classification capability is weaker?�哪种类型的分类能力更弱?
Illustration: Fewer Attributes�说明:更少的属性 (2)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
Both lines separate the training data correctly. Which one do we choose?
两条线都正确地划分了训练数据。那我们选哪个?
Illustration: Fewer Attributes�说明:更少的属性 (2)
Machine Learning – Basic Principles & Practice: 6. The Curse of Dimensionality
机器学习 – 基础原理与实践:6. 维度的诅咒
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
More data tells the result
更多的数据阐明了结果
More data
更多数据
Unfortunately 不幸的是
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Margin to a Data Sample�距离某个数据点的间距
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
A linear classifier
一个线性分类器
A data sample
一个数据点
Margin
间距
Large Margin Classification�大间距分类
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
Not only separate training data correctly 不仅正确地划分了训练数据
But also maintain a certain margin to all the data samples �而且离所有的数据都有一定的间距
Large Margin Classification�大间距分类
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
Only separate training data correctly
仅正确地划分了训练数据
Very small margin for a certain sample�离某个数据点间距非常小
Large Margin Classification�大间距分类
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
Which one performs better?
哪个更好?
Unseen data 未见数据
Assuming an ‘even’ data distribution
假设数据分布“均匀”
Large margin provides a ‘safe’ buffer area
大间距提供了一个“安全”的缓冲区
Large Margin Classification�大间距分类
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Training data 训练数据
Unseen data 未见数据
Which one performs better?
哪个更好?
‘Safe’ buffer area lost here
这里不存在“安全”的缓冲区
Learning Linear Classifiers�学习线性分类器 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Training samples around classification boundary�分类边界附近的训练数据
Learning Linear Classifiers�学习线性分类器 (2)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Perceptron Algorithm w/ Margins 带间距的感知器算法
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Practice 7.1 实践7.1
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Practice time: use perceptron w/ margins on the synthetic problem �实践时刻:在人造问题上使用带间距的感知器
How does it perform comparing w/ simple perceptron?
和简单的感知器相比,它的表现如何?
Result 结果
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Perceptron w/ margins performs better
带间隔的感知器表现更好
Example 实例 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Ground truth classification boundary 真实分类边界
Learned w/ 7 attributes w/o margins 不带间距用7个属性学习到的
Learned w/ 11 attributes w/o margins 不带间距用11个属性学习到的
Learned w/ 11 attributes w margins 带间距用11个属性学习到的
Example 实例 (2)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
| w1 | w2 | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | b |
T | 1.02 | 0.93 | -0.86 | 0.46 | 0.33 | 0.30 | 0.26 | 0.16 | 0.04 | 0.03 | 0.00 | 1.00 |
7 | 0.47 | 1.17 | -0.48 | 0.51 | 0.36 | -0.12 | 1.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
11 | 0.44 | 0.99 | -0.32 | 0.70 | 0.55 | 0.29 | 1.55 | -0.76 | 0.36 | 0.35 | 1.09 | 1.00 |
11-M | 0.40 | 0.69 | -0.69 | 0.62 | 0.58 | -0.19 | 0.56 | -0.65 | 0.12 | -0.10 | 0.72 | 1.00 |
In learning w/ 11 attributes, the overrating or incorrect rating issue gets alleviated
用11个属性学习时,高估和错估的问题被改善了
What’s the Insight?�我们洞察到了什么?
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Math Theorem 数学定理
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
A classifier learned given all training samples correctly classified
学习到一个正确分类了所有训练数据的分类器
Upper Bound of Error 错误上界
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Because we tolerate more exceptions�因为我们能接受更多的例外
Upper Bound of Error 错误上界
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Because we have seen & adapted to more data, less likely we encounter unexpected cases in the future �因为我们已经看到并适应于更多的数据,将来遇到意外的情况的可能性就越小
Upper Bound of Error 错误上界
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Occam’s Razor in machine learning (proved in math)�(被数学证明的)机器学习中的奥卡姆剃刀
We are not going through the proof or the exact definition of VC dimension (classification capability), but will provide an intuitive explanation 我们不会讲这个证明或VC维(分类能力)的准确定义,但会给出一个直观的解释
Classification Capability�分类能力
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
VC Dimension VC维
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Margin & VC Dimension �间距和VC维 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Given some data
给定一些数据
Any line is a candidate w/o margins
不考虑间距,每条线都是一个候选分类器
Margin & VC Dimension �间距和VC维 (1)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Attribute 1
属性1
Attribute 2 属性2
Given some data
给定一些数据
Setting a required margin
设置一个间距
Some lines are eliminated from the candidate set
一些直线从候选集中被剔除了
Required margin
间距
Less candidates remain in the set
留在集合中的候选少了
Margin & VC Dimension �间距和VC维 (2)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Margin & VC Dimension �间距和VC维 (3)
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Larger margin 间距越大
Practice 7.2 实践7.2
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Practice time: use perceptron w/ margins on word sense disambiguation
实践时刻:将带间距的感知器用于词义消歧
Does it perform better? 它是否有更好的表现?
Result 结果
Machine Learning – Basic Principles & Practice: 7. Occam’s Razor
机器学习 – 基础原理与实践:7. 奥卡姆剃刀
Methods 方法 | Accuracy 准确率 |
Nearest neighbor 最近邻 | 69.3% |
Perceptron 感知器 | 83.6% |
Perceptron w/ margin 带间距的感知器 | 85.7% |
The End