1 of 96

AI Agent

李宏毅

免責聲明：AI Agent 是一個被廣泛使用的詞彙，故本課程中所講的 AI Agent 不一定跟其他地方一樣

2 of 96

AI Agent 的翻譯

今天使用 AI 的方式

AI Agent

人類給予明確指令

人類給予目標

AI 自己想辦法達成

「人工智慧代理人」

(解決某個研究問題)

假設 … 實驗 … 分析

需要多步驟、靈活調整計畫

AI 一個口令一個動作

3 of 96

AI Agent

Goal

Action

Observation

4 of 96

AI Agent (AlphaGo)

Goal

Action

贏棋

“5-5”

Observation

好像在那裡聽過這個段落？這是 Reinforcement Learning (RL) 常見開場

5 of 96

如何打造 AI Agent? RL?

Goal

Action

贏棋

RL: Learn to Maximize Reward

Reward

(RL: Reinforcement Learning)

“5-5”

侷限：需要為了每一個任務以 RL 訓練模型

Observation

6 of 96

如何打造 AI Agent? 直接用 LLM！

Goal

Action

LLM

“你要贏得勝利”

“我要下在 5-5”

以文字描述

轉譯為行動

(option)

Observation

以 LLM 直接實踐人類對於擁有 Agent 的渴望

7 of 96

LLM 能不能下棋？

BIG-bench

https://arxiv.org/abs/2206.04615

8 of 96

LLM 能不能下棋？

https://youtu.be/JHq4EKMg7fI?si=izKsH-GCVnZkooq_

9 of 96

如何打造 AI Agent? 直接用 LLM！

Goal

Action

LLM

還有多遠？

還可以多做什麼？

“你要贏得勝利”

“我要下在 5-5”

以文字描述

轉譯為行動

(option)

Observation

以 LLM 直接實踐人類對於擁有 Agent 的渴望

10 of 96

從 LLM 的角度來看 Agent 要解的問題

goal

obs 1

obs 2

action 1

action 2

obs 3

action 3

LLM

一直都在做接龍

AI Agent 倚靠的是語言模型現有的能力

11 of 96

�請注意在這堂課中�沒有任何模型被訓練

12 of 96

AI Agent 不是最近才熱門

2023 年春天曾經爆紅過一次

https://youtu.be/eQNADlR0jSs?si=4yGZEluAUzKK2VD0

AutoGPT, AgentGPT, BabyAGI, Godmode …

13 of 96

以 LLM 運行 AI Agent 的優勢

Typical Agent

LLM Agent

AlphaGo

事先設定好有限行為

近乎無限的可能

只能在棋盤上的 19x19 個位置落子

可以使用工具

14 of 96

以 LLM 運行 AI Agent 的優勢

AI programmer

Reward = -1

Typical Agent

LLM Agent

Compile Error

為什麼是 -1???

更多資訊

AI programmer

15 of 96

AI Agent 舉例：AI 村民組成的虛擬村莊

https://arxiv.org/abs/2304.03442

https://youtu.be/G44Lkj7XDsA?si=cMbKG3tqPbIgnnBq

16 of 96

Goal

Action

Observation

舉辦情人節派對、準備考試 ……

"getting ready for bed“

17 of 96

AI Agent 舉例：Minecraft 中的 AI NPC

https://www.youtube.com/watch?v=2tbaCn0Kl90

18 of 96

AI Agent 舉例：讓 AI 使用電腦

Computer Use,

Operator

19 of 96

AI Agent 舉例：讓 AI 使用電腦

Goal

Action

Observation

訂披薩、

上網購物 …

20 of 96

AI Agent 舉例：讓 AI 使用電腦

World of Bits: An Open-Domain Platform for Web-Based Agents (ICML, 2017)

21 of 96

AI Agent 舉例：讓 AI 使用電腦

WebArena

https://arxiv.org/abs/2306.06070

Mind2Web

https://arxiv.org/abs/2307.13854

https://arxiv.org/abs/2401.13649

VisualWebArena

22 of 96

AI Agent 舉例：用 AI 訓練模型

goal

obs 1

obs 2

action 1

action 2

obs 3

action 3

LLM

過 Strong Baseline

AIDE: The Machine Learning Engineer Agent

https://arxiv.org/abs/2502.13138

https://arxiv.org/abs/2410.20424

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

23 of 96

AI Agent 舉例：用 AI 做研究

https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

24 of 96

邁向更加真實的互動情境

goal

obs 1

obs 2

action 1

action 2

obs 3

action 3

回合制互動

即時互動

goal

obs 1

action 1

obs 2

action 2

立刻轉換行動

例如：語音對話

25 of 96

邁向更加真實的互動情境

User

tell

me

a

story

ok

stop I don’t like the story

obs 1

action 1

action 2

obs 3

obs 2

Once upon a time in a small village

Sorry ……

26 of 96

邁向更加真實的互動情境

https://arxiv.org/abs/2503.04721v1

Guan-Ting Lin

(with collaborators from Berkeley, UW, and MIT)

27 of 96

AI Agent 關鍵能力剖析

AI 如何根據經驗調整行為

AI 如何使用工具

AI 能不能做計劃

28 of 96

根據經驗調整行為

29 of 96

根據經驗調整行為

goal

obs 1

obs 2

action 1

action 2

LLM

寫一個 …

你是軟體工程師 …

Update

……

Update Parameters

Feedback

(Not Today)

30 of 96

根據經驗調整行為

goal

obs 1

obs 2

action 1

action 2

LLM

寫一個 …

你是軟體工程師 …

Update

……

Feedback

31 of 96

根據經驗調整行為

goal

obs 1

action 1

LLM

obs 10000

……

不斷回憶整個 Agent 一生的經歷 … ☹

?????

超常自傳式記憶 (Highly Superior Autobiographical Memory, HSAM)

超憶症 (Hyperthymesia)

32 of 96

根據經驗調整行為

obs 10000

Agent’s Memory

?????

goal

obs 1

action 1

……

obs 9999

action 9999

Read

Relevant Experience

其實這就是 RAG

Retrieval

Query

Database

(自己的經歷 vs. 別人的經歷)

33 of 96

StreamBench

https://arxiv.org/abs/2406.08747

https://stream-bench.github.io/

(done by Appier Researchers)

Goal: Maximize the accuracy over the sequence

……

Q1

Q2

Q1000

Q3

34 of 96

StreamBench

https://arxiv.org/abs/2406.08747

……

Q1

Q2

Q100

Q3

Read

Retrieval

……

Q65

Q78

Q99

35 of 96

StreamBench

https://arxiv.org/abs/2406.08747

36 of 96

StreamBench

https://arxiv.org/abs/2406.08747

……

Q1

Q2

Q100

Q3

Read

Retrieval

……

Q59

Q78

Q99

Negative feedback is unhelpful.

37 of 96

StreamBench

https://arxiv.org/abs/2406.08747

38 of 96

根據經驗調整行為

obs 10000

Agent’s Memory

goal

obs 1

action 1

……

obs 9999

action 9999

Relevant Experience

action 10000

記下來？

(被雞毛蒜皮的小事塞爆)

obs 10001

39 of 96

根據經驗調整行為

obs 10000

goal

obs 1

action 1

……

obs 9999

action 9999

Relevant Experience

action 10000

obs 10001

Write

Agent’s Memory

這件事要被記下來嗎？

40 of 96

根據經驗調整行為

obs 10000

Relevant Experience

action 10000

obs 10001

Write

Read

Reflection

thought 1

thought 2

thought 3

thought 4

goal

obs 1

action 1

……

obs 9999

action 9999

對於記憶中的資訊做重新整理

41 of 96

根據經驗調整行為

obs 10000

Relevant Experience

action 10000

obs 10001

Write

Read

Reflection

goal

obs 1

action 1

……

obs 9999

action 9999

Knowledge

Graph

https://arxiv.org/abs/2404.16130

GraphRAG

HippoRAG

https://arxiv.org/abs/2405.14831

42 of 96

有記憶的 ChatGPT

Write 模組決定要記下來

43 of 96

有記憶的 ChatGPT

44 of 96

有記憶的 ChatGPT

45 of 96

有記憶的 ChatGPT

Read 模組啟動

46 of 96

To learn more …

MemGPT

Agent Workflow Memory

A-MEM: Agentic Memory for LLM Agents

https://arxiv.org/abs/2310.08560

https://arxiv.org/abs/2409.07429

https://arxiv.org/abs/2502.12110

47 of 96

AI 如何使用工具

48 of 96

語言模型常用工具

Python

Search Engine

Other AI

(Different capabilities, stronger but costly)

工具可以看做是 Function，使用工具就是調用這些 Function

使用工具又叫 “Function Call”

工具：只需要知道怎麼使用，不需要知道內部運作原理

49 of 96

如何使用工具

如果遇到根據你的知識無法回答的問題，使用工具

把使用工具的指令放在 <tool> 和 </tool> 中間，使用完工具後你會得到輸出，放在 <output> 和 </output> 中間

語言

模型

現在你可以使用的工具如下：

查詢某地、某時溫度的函式 Temperature(location, time)，使用範例：Temperature('台北', '2025.02.22 14:26')

2025 年 3 月 10 日那天下午 2:00 ，高雄氣溫如何

如何使用所有工具

特定工具使用方式

User Prompt

System Prompt

這就是一串文字，無法真的呼叫函式

<tool>Temperature('高雄', '2025.03.10 14:00')</tool>

gpt-4o-mini

(使用工具的方法很多，這邊是只是一個通用的方法)

50 of 96

如何使用工具

語言

模型

2025 年 3 月 10 日那天下午 2:00 ，高雄氣溫如何

<tool>Temperature('高雄', '2025.03.10 14:00')</tool>

User Prompt

System Prompt

工具使用方式 ……

不需要呈現給使用者看

Agent 開發者

先設定好的流程

Temperature

不需要呈現給使用者看

2025 年 3 月 10 日下午 2:00，高雄的氣溫為攝氏32度。

(繼續去做接龍 ……)

使用者看到的輸出

gpt-4o-mini

(使用工具的方法很多，這邊是只是一個通用的方法)

51 of 96

最常使用的工具：搜尋引擎

Retrieval Augmented Generation (RAG)

52 of 96

使用其他 AI 作為工具

語言

模型

文字指令

文字回應

語言

模型

這個人在說什麼？

他說「大家好」

語言

模型

這個人心情怎麼樣

應該是心情蠻好的

53 of 96

https://arxiv.org/abs/2407.09886

54 of 96

使用其他 AI 作為工具

https://arxiv.org/abs/2407.09886

Chih-Kai

Yang

Chun-Yi

Kuan

Dynamic SUPERB 上的結果

55 of 96

非常多工具怎麼辦？

obs 1

action 1

Tool Use

Hundreds

of

Tool Descriptions

56 of 96

非常多工具怎麼辦？

https://arxiv.org/abs/2310.03128

https://arxiv.org/abs/2502.11271

Tool

Selection

selected tools

obs 1

action 1

Tool Use

Hundreds

of

Tool Descriptions

Agent’s Memory

57 of 96

模型自己打造工具

Tool

Selection

selected tools

obs 1

action 1

Make Tools

Hundreds

of

Tool Descriptions

Agent’s Memory

TroVE: https://arxiv.org/pdf/2401.12869

LATM: https://arxiv.org/abs/2305.17126

CREATOR: https://arxiv.org/abs/2305.14318

CRAFT: https://arxiv.org/abs/2309.17428

58 of 96

工具

因為過度相信工具而犯錯 …

59 of 96

假如工具有問題 … 以 RAG 為例

Source of image: https://www.linkedin.com/posts/petergyang_google-ai-overview-suggests-adding-glue-to-activity-7199246664329551872-9VdY/

60 of 96

工具

因為過度相信工具而犯錯 …

不要完全相信工具，要有自己的判斷力

61 of 96

語言模型有沒有自己的判斷力？

語言

模型

2025 年 3 月 10 日那天下午 2:00 ，高雄氣溫如何

<tool>Temperature('高雄', '2025.03.10 14:00')</tool>

User Prompt

System Prompt

工具使用方式 ……

不需要呈現給使用者看

2025 年 3 月 10 日下午 2:00 ，高雄的氣溫預測為攝氏 100 度。

(繼續去做接龍 ……)

gpt-4o-mini

62 of 96

語言模型有沒有自己的判斷力？

語言

模型

2025 年 3 月 10 日那天下午 2:00 ，高雄氣溫如何

<tool>Temperature('高雄', '2025.03.10 14:00')</tool>

User Prompt

System Prompt

工具使用方式 ……

不需要呈現給使用者看

2025 年 3 月 10 日下午 2:00 時，高雄的氣溫為攝氏 10000 度。這個數值顯然不合常理，可能是工具輸出錯誤。如需其他信息或查詢，請告訴我。

(繼續去做接龍 ……)

gpt-4o-mini

63 of 96

語言模型在做 RAG 時 ……

Internal Knowledge

External Knowledge

什麼樣的外部知識比較容易說服 AI ……

64 of 96

什麼樣的外部知識比較容易說服 AI ……

The likelihood of the LLM to adhere to the retrieved information presented in context is inversely correlated with the model’s confidence in its response without.

LLMs will increasingly revert to their priors when the original context is progressively modified with unrealistic values.

https://arxiv.org/abs/2404.10198v1

65 of 96

什麼樣的外部知識比較容易說服 AI ……

https://arxiv.org/abs/2401.11911

答案是 A

答案是 B

答案是 A

傾向相信 AI 同類的話

答案是 C

66 of 96

什麼樣的外部知識比較容易說服 AI ……

Meta Data

的影響

https://aclanthology.org/2024.blackboxnlp-1.24/

Cheng-Han Chiang

語言模型比較相信新的文章
資料來源沒有影響

67 of 96

什麼樣的外部知識比較容易說服 AI ……

https://aclanthology.org/2024.blackboxnlp-1.24/

Cheng-Han Chiang

Claude 3 比較贊同下面那邊文章

一模一樣的內容

Meta Data

的影響

68 of 96

就算工具可靠 … 不代表 AI 就不會犯錯

就算所有找到的資料都是對的，也不保證答案就是對的

ChatGPT Search

(同樣的輸入目前已經沒有這樣的問題)

69 of 96

使用工具與模型本身能力間的平衡

用工具不一定總是比較有效率
如果要做數學運算，用計算機一定比普通人心算快嗎？

問題：3 x 4

12

70 of 96

AI 能不能做計劃？

71 of 96

做計劃

goal

obs 1

obs 2

action 1

action 2

obs 3

action 3

Reactive Response?

Planning

72 of 96

做計劃

obs 1

obs 2

action 1

action 2

obs 3

action 3

action 1

action 2

action 3

plan

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

https://arxiv.org/abs/2305.04091

天下沒那麼好的事情

計劃就是要拿來改變的

73 of 96

做計劃

obs 1

obs 2

action 1

action 2’

action 1

action 2

action 3

plan

下棋：對手的招數跟預想不同

使用電腦：突然跳出廣告視窗

與預期不同，導致原有的計畫行不通

action 2’

action 3’

plan'

74 of 96

語言模型有能力做計畫嗎？

gpt-4o

75 of 96

https://arxiv.org/abs/2201.07207

76 of 96

PlanBench

https://arxiv.org/abs/2206.10498

https://arxiv.org/abs/2305.15771

可以執行的操作：

從桌上拿起一個積木
從另一個積木上拿起另一個積木
把積木放到桌上
將一個積木堆在另一個積木上

初始狀態：藍色積木在橘色積木的上面，紅色積木在桌子上，橘色積木在桌子上，黃色積木也在桌子上。

目標：讓橘色積木放置在藍色積木上。

將藍色積木從橘色積木上取下
將藍色積木放在桌子上
從桌上拿起橘色積木
將橘色積木堆放在藍色積木的上方

會不會 LLM 早就看過類似的題目了？

77 of 96

PlanBench

https://arxiv.org/abs/2206.10498

https://arxiv.org/abs/2305.15771

神秘方塊世界

攻擊

吞噬

屈服

征服

78 of 96

(讓物件 c 渴望物件 a)

79 of 96

https://arxiv.org/abs/2305.15771

80 of 96

https://arxiv.org/abs/2409.13373

81 of 96

https://arxiv.org/abs/2402.01622

TravelPlanner

82 of 96

https://osu-nlp-group.github.io/TravelPlanner/

83 of 96

https://arxiv.org/abs/2402.01622

84 of 96

https://osu-nlp-group.github.io/TravelPlanner/

85 of 96

https://arxiv.org/abs/2404.11891

86 of 96

https://arxiv.org/abs/2404.11891

87 of 96

強化 AI Agent 的規劃能力

obs 1

action 1-1

action 1-2

action 1-3

obs 2-1

obs 2-2

obs 2-3

action 2-1-1

action 2-1-2

obs 2-1-1

obs 2-1-2

action 2-2-1

obs 2-2-1

action 2-3-1

action 2-3-2

obs 2-3-1

obs 2-3-2

如果路徑太長怎麼辦？

實際試試看？

88 of 96

obs 1

action 1-1

action 1-2

obs 2-1

obs 2-2

action 2-2-1

action 2-2-2

obs 2-2-1

obs 2-2-2

有機會嗎？

沒有 ☹

有機會嗎？

有 ☺

減少沒必要的搜尋

Tree Search for Language Model Agents

https://arxiv.org/abs/2407.01476

89 of 96

https://arxiv.org/abs/2407.01476

Tree Search for Language Model Agents

90 of 96

obs 1

action 1-1

action 1-2

action 1-3

obs 2-1

obs 2-2

obs 2-3

action 2-1-1

action 2-1-2

obs 2-1-1

obs 2-1-2

action 2-2-1

obs 2-2-1

action 2-3-1

action 2-3-2

obs 2-3-1

obs 2-3-2

缺點：有些動作無法回溯

訂披薩

訂便當

訂都訂了誰管你啊

91 of 96

obs 1

action 1-1

action 1-2

obs 2-1

obs 2-2

action 2-2-1

action 2-2-2

有機會嗎？

沒有 ☹

有機會嗎？

有 ☺

減少沒必要的搜尋

……

92 of 96

obs 1

action 1-1

action 1-2

obs 2-1

obs 2-2

action 2-2-1

action 2-2-2

減少沒必要的搜尋

……

我們需要 World Model

AI 可以自己扮演 World Model?

93 of 96

https://arxiv.org/abs/2411.06559

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

94 of 96

從 AI Agent 的角度來看「思考」的能力

輸入

輸出

腦內小劇場

(Reasoning)

(Observation)

(Action)

95 of 96

可以執行的操作：

從桌上拿起一個積木
從另一個積木上拿起另一個積木
把積木放到桌上
將一個積木堆在另一個積木上

目前狀態：藍色積木在橘色積木的上面，紅色積木在桌子上，橘色積木在桌子上，黃色積木也在桌子上。

目標：讓橘色積木放置在藍色積木上。

告訴我你的下一步

下一步：使用操作2，從橘色積木上拿起藍色積木。

上略 1500 字

DeepSeek-R1

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

https://arxiv.org/abs/2502.08235

96 of 96

AI Agent 關鍵能力剖析

AI 如何根據經驗調整行為

AI 如何使用工具

AI 能不能做計劃