1
1
劉晉良�
Jinn-Liang Liu
清華大學動力機械工程學系
Department of Power Mechanical Engineering�National Tsing Hua University, Taiwan
�Dec 17, 2024 - Dec 8, 2025
Attention, Transformer, LLM
2024/12
203627, 2025/11
No Recurrent, No Convolution,
Global Dependency, Highly Parallel
Tokens, Embedding, Position Encoding
"dog bites man" vs "man bites dog"
N Tokens
D Features (Vectors)
Vector dimensions can range from a few hundred (e.g., 384 for all-MiniLM-L6-v2) to several thousands (e.g., 3072 for text-embedding-3-large)
Old
Transformer
New
Retrival, Augmented, Generation
Vector Indexing for DB
Hierarchical Navigable Small World (HNSW)
NN Layers
IVF (Inverted File)
Clustering
Tokens, Embedding, Position Encoding
Transformer: Model Architecture
Attention
Attention
Attention
Attention
Self-Attention
Symmetric
Asymmetric
(caulking iron vs tool)
Perceptron (Feed Forward Network)
Frank Rosenblatt 1958
Mark I Perceptron�(Wikipedia)�First AI Machine
Warren McCulloch, Walter Pitts 1943
"Devices of this sort are expected ultimately to be capable of concept formation, language translation, collation of military intelligence, and the solution of problems through inductive logic." Rosenblatt, 1957
Transformer Block
k
Multi-Head Self-Attention
Self-A
Q
K
V
Mask
Cross-A
Self-A
Multi-Head Self-Attention
Self-A
Self-A
Cross-A
Cross-A
Google (2024). Neural machine translation with a Transformer and Keras
Mask
2015 Luong et. al.
2017 Vaswani et. al.
Attention Matrix
Performance, Complexity
Understanding
The GPT2 model contains N Transformer decoder blocks. Each block includes a multi-head masked attention layer, a multi-layer perceptron layer, normalization, and dropout layers. The residual connection (branching line to the addition operator) allows the block to learn from the previous block's input. The multi-head masked attention layer (right panel) calculates attention scores using Q, K, and V vectors to capture sequential relationships in the input sequence. Transformers are typically pre-trained on enormous corpora in a self-supervised manner, prior to being fine-tuned.
�
�
No Encoder!
00:00 - Introduction to RAG
00:24 - Why Traditional Search Methods Don't Work
00:55 - The RAG Method Explained
01:54 - Step 1: Retrieval Process
02:25 - Step 2: Augmentation Explained
03:15 - Step 3: Generation Process
03:54 - Strategies for RAG Calibration
05:01 - Practical Lab Demo Introduction
05:27 - Demo - Set up Development Environment
06:10 - Demo - Initialize Vector Database
06:29 - Demo - Chunking Strategy and Embedding
07:19 - Demo - Feed AI Brain
07:50 - Demo - Semantic Search
08:16 - Demo - Launch a Simple Web Interface
09:43 - Conclusion & Free Lab Access
Help Desk
Processing
Expertise
Complexity
Computational
Maintenance
Catastrophic
Fogeting