Machine Learning - �Basic Principles & Practice�Other Topic: Modern Paradigm
Cong Li 李聪
机器学习 – 基础原理与实践
番外:现代范式
Deep Learning 深度学习
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
High Dimensional Features Learned 学习到的高维特征
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
14x14
1 3x3 convolution → 1 new feature
一个3x3卷积
→ 一个新特征
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
32 convolutions
→ 32 new features
32个卷积
→ 32个新特征
14x14
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
14x14
14x14x32
32 features for each 14x14 location
14x14的每个位置上都有32个特征
Each 3x3 convolution operator learned, not prescribed
每个3x3卷积算子都是学习到的,而非人工指定
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
14x14
14x14x32
1 3x3x32 convolution → 1 new feature
一个3x3x32卷积
→ 一个新特征
12x12
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
14x14
14x14x32
64 convolutions
→ 64 new features
64个卷积
→ 64个新特征
12x12
Recall 回忆
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
16x16
14x14
14x14x32
12x12
12x12x64
64 features for each 12x12 location
12x12的每个位置上都有64个特征
Each 3x3x32 convolution operator learned, not prescribed
每个3x3x32卷积算子都是学习到的,而非人工指定
Another Example 另一个例子
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
Another Example 另一个例子
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | ? |
User 2 用户2 | 7 | ? | 9 |
User 3 用户3 | ? | 5 | 8 |
User 4 用户4 | 2 | ? | 3 |
High Dimensional Features�高维特征属性
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| 电影本身 Film | 用户关注 User preference |
1 | Horror film? 是否为恐怖片? | Prefer horror films? 是否喜欢恐怖片? |
High Dimensional Features�高维特征属性
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| 电影本身 Film | 用户关注 User preference |
1 | Horror film? 是否为恐怖片? | Prefer horror films? 是否喜欢恐怖片? |
2 | A certain actor’s presence & performance 某个演员出镜率和表现 | Favor the actor? 是否关注某个演员 |
High Dimensional Features�高维特征属性
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| 电影本身 Film | 用户关注 User preference |
1 | Horror film? 是否为恐怖片? | Prefer horror films? 是否喜欢恐怖片? |
2 | A certain actor’s presence & performance 某个演员出镜率和表现 | Favor the actor? 是否关注某个演员 |
3 | A certain director 为某个导演执导 | Favor the director? 偏好某个导演的影片 |
… | … | … |
Imagination here. Feature semantics learned automatically, not prescribed 这里只是想象。特征语义自动习得,并非事先预设
Model 模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
Model 模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
Rating of user 3 for film 2
用户3对电影的评分
Model 模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
Rating of user 3 for film 2
用户3对电影的评分
Training 训练
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
Training 训练
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
Prediction 预测
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | ? |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | | 3 |
Prediction 预测
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
|
|
|
|
| | | |
| | | |
| | | |
| | | |
|
|
|
|
|
|
|
|
| Film 1 电影1 | Film 2 电影2 | Film 3 电影3 |
User 1 用户1 | 8 | 3 | |
User 2 用户2 | 7 | | 9 |
User 3 用户3 | | 5 | 8 |
User 4 用户4 | 2 | ? | 3 |
Learning Residual 学习残差 (1)
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Learning Residual 学习残差 (2)
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Residual Network 残差网络
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
我 |
来 |
举 |
一个 |
例子 |
Text processing 句子处理
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Initial feature representation from a table
初始特征表示来自一张表格
Complex network
复杂网络
+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Residual
残差
+
+
+
+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Updated feature representation
更新后的特征表示
More Residual Layers 更多残差层
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Complex network
复杂网络
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Complex network
复杂网络
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
…
Each layer iteratively updates the feature representation
每一层都递进地更新了特征表示
Flexible number of layers, no need to consider which layer corresponds to which level of feature representation
灵活的层数,无需考虑哪一层对应于哪一级别的特征表示
Language Model 语言模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Predict the next word w/ probability 预测下一个词及其概率
幸运
…
…
挑战
…
机遇
…
问题
…
气候变化是全世界面临的
0.3
0.0015
0.4
0.001
Why Predicting the Next Word�为什么要预测下一个词 (1)
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Why Predicting the Next Word�为什么要预测下一个词 (2)
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
… In 1969, Neil Armstrong became the first person to
Complex network
复杂网络
Preceding text information
前文信息
walk
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
walk
Complex network
复杂网络
Preceding text information
前文信息
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
walk
Complex network
复杂网络
Preceding text information
前文信息
on
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
on
Complex network
复杂网络
Preceding text information
前文信息
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
on
Complex network
复杂网络
Preceding text information
前文信息
the
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
the
Complex network
复杂网络
Preceding text information
前文信息
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
the
Complex network
复杂网络
Preceding text information
前文信息
moon
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
moon
Complex network
复杂网络
Preceding text information
前文信息
Iterative Generation 反复生成
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
moon
Complex network
复杂网络
Preceding text information
前文信息
End-of-sequence
Attention Mechanism �注意力机制
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
In
1969
Neil
Armstrong
became
the
first
person
to
walk
| | | | |
Target word query
目标词查询
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Preceding text information 前文信息
Source word keys
源词键名
Dot product
点乘
0.02
1.2
0.9
1.3
0.5
0.01
1.1
0.6
0.05
0.3
Attention Mechanism �注意力机制
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
In
1969
Neil
Armstrong
became
the
first
person
to
walk
0.02
1.2
0.9
1.3
0.5
0.01
1.1
0.6
0.05
0.3
Softmax to probabilities
指数归一化为概率
0.0501
0.1631
0.1208
0.1803
0.0810
0.04900.1476
0.0895
0.0516
0.0663
Attention Mechanism �注意力机制
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
In
1969
Neil
Armstrong
became
the
first
person
to
walk
0.0501
0.1631
0.1208
0.1803
0.0810
0.04900.1476
0.0895
0.0516
0.0663
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Another preceding text information 另一种前文信息
Source word values
源词键值
Attention Mechanism �注意力机制
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
In
1969
Neil
Armstrong
became
the
first
person
to
walk
0.0501
0.1631
0.1208
0.1803
0.0810
0.04900.1476
0.0895
0.0516
0.0663
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Contribution from preceding text to ‘walk’ 来自前文的对’walk’的贡献
| | | | |
Contemporary LLMs �当代大规模语言模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Input word token(s) 输入词符
Embedding layer 特征表示层
High dimensional features 高维特征
High dimensional features 高维特征
High dimensional features 高维特征
Output layer 输出层
Next token probabilities 下一个词符概率
Transformer module Transformer模块
Transformer module Transformer模块
High dimensional features 高维特征
Transformer module Transformer模块
…
High dimensional features 高维特征
Transformer Module�Transformer模块
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
High dimensional features 高维特征
Higher dimensional features 更高维特征
Fully connecter layer 全连通层
Fully connecter layer 全连通层
Normalization layer 归一化层
Normalization layer 归一化层
High dimensional features 高维特征
Preceding text information
前文信息
Attention 注意力
Fully connecter layer 全连通层
Fully connecter layer 全连通层
All the keys/values
所有的键名/键值
New key/value
新的键名/键值
Current query 当前查询
+
+
Residual 残差
Residual 残差
Attention sub-module
注意力子模块
Incorporation information from the preceding text
整合前文的信息
Transformer Module�Transformer模块
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
High dimensional features 高维特征
Higher dimensional features 更高维特征
Fully connecter layer 全连通层
Fully connecter layer 全连通层
Normalization layer 归一化层
Normalization layer 归一化层
High dimensional features 高维特征
Preceding text information
前文信息
Attention 注意力
Fully connecter layer 全连通层
Fully connecter layer 全连通层
All the keys/values
所有的键名/键值
New key/value
新的键名/键值
Current query 当前查询
+
+
Multi-layer perceptron
sub-module
多层感知器子模块
Further transform features
进一步变换特征
Reasoning w/ a Long Output Path 基于长输出路径的推理
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
DeepSeek’s input template DeepSeek的输入模板
Reasoning w/ a Long Output Path 基于长输出路径的推理
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
A DeepSeek example 一个DeepSeek的例子
Reasoning Model 推理模型
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Reinforcement Learning �强化学习
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Sampling Output 输出采样
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Generate the next word based on probability
依据概率生成下一个词
幸运
…
…
挑战
…
机遇
…
问题
…
气候变化是全世界面临的
0.3
0.0015
0.4
0.001
X
√
X
√
Policy Gradient 策略梯度
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
Training task
训练任务
Output sequence 1
输出序列1
√
X
X
Task judge
任务裁判
Discourage all the steps
不赞赏所有步骤
Encourage all the steps
赞赏所有步骤
Discourage all the steps
不赞赏所有步骤
…
…
…
Output sequence 2
输出序列2
Output sequence n
输出序列n
Sampling output sequences based on the current model probabilities
基于当前模型概率
采样得到输出序列
Synthesize a loss function value
综合成一个损失函数值
Update model w/ gradient descent
用梯度下降更新模型
Summary 总结
Machine Learning – Basic Principles & Practice: Other Topic: Modern Paradigm
机器学习 – 基础原理与实践:番外:现代范式
The End