1 of 102

劉晉良�

Jinn-Liang Liu

清華大學動力機械工程學系

Department of Power Mechanical Engineering�National Tsing Hua University, Taiwan

�Dec 17, 2024 - Dec 8, 2025

Attention, Transformer, LLM

2 of 102

2024/12

203627, 2025/11

3 of 102

No Recurrent, No Convolution,

Global Dependency, Highly Parallel

4 of 102

Tokens, Embedding, Position Encoding

Amanatullah (2023)

A Sarkar (2022)

M Saeed (2023)

"dog bites man" vs "man bites dog"

N Tokens

D Features (Vectors)

5 of 102

Vector Database

Vector dimensions can range from a few hundred (e.g., 384 for all-MiniLM-L6-v2) to several thousands (e.g., 3072 for text-embedding-3-large)

Old

6 of 102

Transformer

New

8 of 102

Retrival, Augmented, Generation

Vector Indexing for DB

Hierarchical Navigable Small World (HNSW)

NN Layers

IVF (Inverted File)

Clustering

9 of 102

Tokens, Embedding, Position Encoding

R Turner (2023)

10 of 102

Transformer: Model Architecture

15 of 102

Self-Attention

Symmetric

Asymmetric

(caulking iron vs tool)

16 of 102

Perceptron (Feed Forward Network)

Frank Rosenblatt 1958

Mark I Perceptron�(Wikipedia)�First AI Machine

Warren McCulloch, Walter Pitts 1943

"Devices of this sort are expected ultimately to be capable of concept formation, language translation, collation of military intelligence, and the solution of problems through inductive logic." Rosenblatt, 1957

17 of 102

Transformer Block

18 of 102

Multi-Head Self-Attention

Self-A

Mask

Cross-A

Self-A

19 of 102

Multi-Head Self-Attention

Mask

20 of 102

Self-A

Cross-A

Google (2024). Neural machine translation with a Transformer and Keras

Mask

21 of 102

Transformer Translation

2015 Luong et. al.

2017 Vaswani et. al.

Attention Matrix

22 of 102

Performance, Complexity

23 of 102

Understanding

24 of 102

The GPT2 model contains N Transformer decoder blocks. Each block includes a multi-head masked attention layer, a multi-layer perceptron layer, normalization, and dropout layers. The residual connection (branching line to the addition operator) allows the block to learn from the previous block's input. The multi-head masked attention layer (right panel) calculates attention scores using Q, K, and V vectors to capture sequential relationships in the input sequence. Transformers are typically pre-trained on enormous corpora in a self-supervised manner, prior to being fine-tuned.

�

Generative AI - ChatGPT2

�

No Encoder!

25 of 102

Self-Supervised Learning

32 of 102

Gaussian Error Linear Unit (GELU)

33 of 102

Transformers, the tech behind LLMs

51 of 102

RAG

00:00 - Introduction to RAG

00:24 - Why Traditional Search Methods Don't Work

00:55 - The RAG Method Explained

01:54 - Step 1: Retrieval Process

02:25 - Step 2: Augmentation Explained

03:15 - Step 3: Generation Process

03:54 - Strategies for RAG Calibration

05:01 - Practical Lab Demo Introduction

05:27 - Demo - Set up Development Environment

06:10 - Demo - Initialize Vector Database

06:29 - Demo - Chunking Strategy and Embedding

07:19 - Demo - Feed AI Brain

07:50 - Demo - Semantic Search

08:16 - Demo - Launch a Simple Web Interface

09:43 - Conclusion & Free Lab Access

94 of 102

RAG vs. CAG

98 of 102

RAG vs Fine Tuning vs Prompt

Processing

99 of 102

Expertise

Complexity

Computational

Maintenance

Catastrophic

Fogeting

1 of 102

2 of 102

3 of 102

4 of 102

5 of 102

6 of 102

7 of 102

8 of 102

9 of 102

10 of 102

11 of 102

12 of 102

13 of 102

14 of 102

15 of 102

16 of 102

17 of 102

18 of 102

19 of 102

20 of 102

21 of 102

22 of 102

23 of 102

24 of 102

25 of 102

26 of 102

27 of 102

28 of 102

29 of 102

30 of 102

31 of 102

32 of 102

33 of 102

34 of 102

35 of 102

36 of 102

37 of 102

38 of 102

39 of 102

40 of 102

41 of 102

42 of 102

43 of 102

44 of 102

45 of 102

46 of 102

47 of 102

48 of 102

49 of 102

50 of 102

51 of 102

52 of 102

53 of 102

54 of 102

55 of 102

56 of 102

57 of 102

58 of 102

59 of 102

60 of 102

61 of 102

62 of 102

63 of 102

64 of 102

65 of 102

66 of 102

67 of 102

68 of 102

69 of 102

70 of 102

71 of 102

72 of 102

73 of 102

74 of 102

75 of 102

76 of 102

77 of 102

78 of 102

79 of 102

80 of 102