Harnessing Large Language Models for Planning:
A Lab on Strategies for Success and Mitigation of Pitfalls
Vishal Pallagani, Keerthiram Murugesan, Biplav Srivastava, Francesca Rossi, Lior Horesh
AAAI 2024
Lab Forum
LQ2
Contents
2
Introduction
01.
3
Why Explore LLMs for Planning?
4
Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse domains, excelling in natural language understanding, text generation, and even code generation tasks.
Planning problems involve the structured representation of tasks and goals, which are commonly expressed using Planning Domain Definition Language (PDDL). PDDL shares a logical and symbolic nature similar to Lisp programming.
Given LLMs' proven proficiency in programming code generation and understanding, it becomes imperative to explore their capabilities in comprehending PDDL and generating plans, tasks that demand even greater levels of reasoning.
Goals of the lab
5
Structure of the Lab
6
Topic | Objective | 10:55 am | 11:05 am | 11:15 am | 11:25 am | 11:35 am | 11:45 am | 11:55 am | 12:05 pm | 12:15 pm | 12:25 pm | 12:30 pm |
Brief overview of symbolic planners and plan validators | Introduce symbolic planning problems, planners, and validators | | | | | | | | | | | |
Deep-dive into LLMs | Introduce various training and architectural paradigms in language modeling | | | | | | | | | | | |
Overview of LLMs in Planning | Understand the different categories in which LLMs are being applied in Planning | | | | | | | | | | | |
Plansformer | Overview of Plansformer and hands-on session to use various LLMs for plan generation | | | | | | | | | | | |
Neuro-symbolic Approaches | Understand the �“Thinking Fast and Slow in AI” (SOFAI) framework for Planning | | | | | | | | | | | |
Summary and Q/A | Summarize the learnings in the lab and answer questions | | | | | | | | | | | |
10:55 am - 11:05 am
11:05 am - 11:20 am
11:20 am - 11:30 am
11:30 am - 12:00 pm
12:00 pm - 12:15 pm
12:15 pm - 12:30 pm
Brief Overview of Symbolic Planners and Plan Validators
02.
7
Complex Decisions
8
Perception
Action
Environment
Goals
(Static vs. Dynamic)
(Full vs. Partially Observable)
(Perfect vs. Imperfect)
(Full vs.
Partial satisfaction)
(Deterministic vs.
Stochastic)
(Instantaneous vs.
Durative)
(Single vs. multiple agent)
9
Illustration of a Planning Scenario
Blocks World
A
B
Robot arm
A
B
Blocks
Initial State
Goal State
All robots are equivalent
Illustration of Problem Representation
10
States: ((On-Table A) (On-Table B) …)
A
B
Actions: ((Name: (Pickup ?block ?robot)
Precondition: ((Clear ?block)
(Arm-Empty ?robot)
(On-Table ?block))
Add: ((Holding ?block ?robot))
Delete: ((Clear ?block)
(Arm-Empty ?robot)))…)
A
B
Illustration of Reasoning for Planning
11
Clear A
�Clear B�
On-Table A�
On-Table B�
Arm-Empty R1�
Arm-Empty R2
�On A B�
Pick-up A R1
Pick-up A R2
Clear A
�Clear B
Holding A R1�
On-Table A�
On-Table B�
Holding A R2
Arm-Empty R1�
Arm-Empty R2
Stack A B R1
Stack A B R2
Put-down A R2
Pick-up B R2
Pick-up B R1
Put-down A R1
Initial State�Level P-0
Goal State�Level P-2
Level A-0
Level P-1
Level A-1
12
Figure. Demonstration of automated planning problem with blocksworld domain example
Illustration of a Larger Planning Scenario
Active Areas of Research
Considerations
13
Plan Validation
14
15
>> It is time for some coding
Source: r/ProgrammerHumor
Deep-dive into LLMs
03.
16
Deep-dive into LLMs
In this lab, we will be focussing our discussion on three different language modeling techniques, namely, masked, seq2seq, and causal.
17
Credits: Google Cloud Skills Boost
Figure. The Transformer - model architecture.
Masked Language Modeling (MLM)
18
Credits: Cameron R. Wolfe
Source: Language Model Training and Inference: From Concept to Code (substack.com)
Seq2Seq Language Modeling (Seq2Seq)
19
Causal Language Modeling (CLM)
20
Credits: Cameron R. Wolfe
Source: Language Model Training and Inference: From Concept to Code (substack.com)
Given this basic description of three language modeling techniques used to build LLMs, what among them do you think is a good fit for plan generation?
21
Think in terms of the input and output.
22
>> It is time for some coding
Source: r/ProgrammerHumor
Plansformer
[Hands-on session]
04.
23
Plansformer
24
Plansformer [1,2], is an LLM based planner that is capable of generating valid and optimal plans for classical planning problems.
Traditional planners, although sound and complete, often cannot solve problems with vast space in a stipulated time or generalize. A learning based planner, such as Plansformer, is extremely good at harnessing learnt representations to generalize to unseen problems, and solve any problem in constant time.
Plansformer is obtained by fine-tuning CodeT5 on planning problems and their corresponding optimal plans. CodeT5 is an LLM that is pre-trained on programming language and associated natural language.
What
Why
How
[1] Pallagani, V., Muppasani, B., Murugesan, K., Rossi, F., Horesh, L., Srivastava, B., Fabiano, F. and Loreggia, A., 2023. Plansformer: Generating symbolic plans using transformers. Generalized Planning (GenPlan) Workshop at NeurIPS.
[2] Pallagani, V., Muppasani, B., Srivastava, B., Rossi, F., Horesh, L., Murugesan, K., Loreggia, A., Fabiano, F., Joseph, R. and Kethepalli, Y., 2023, August. Plansformer tool: demonstrating generation of symbolic plans using transformers. In IJCAI (Vol. 2023, pp. 7158-7162). International Joint Conferences on Artificial Intelligence..
25
>> It is time for some coding
Source: r/ProgrammerHumor
Plansformer’s Architecture
26
Figure. Plansformer Model Architecture showing modeling and evaluation phases. Modeling phase involves fine-tuning CodeT5 with data from planning domain. Evaluation phase shows both the planner and model testing.
Is Plansformer a Good Model?
27
Table. Results of model testing (best performance in bold). Plansformer-[x] denotes Plansformer for a specific domain.
Is Plansformer a Good Planner?
28
Table. Results of plan validation.
Can Plansformer Adapt to Another Domain?
29
Table. Plansformer-bw as the base model fine-tuned with and tested on (a) hanoi (b) grippers (c) driverlog, and (d) shows the comparison of valid plans generated by Plansformer-bw-hn derived models with Plansformer-hn trained using similar data points.
Overview of LLMs in Planning
05.
30
Applications of LLMs in Planning
31
Table. Comprehensive description of the eight categories utilizing LLMs in Planning [1]
[1] Pallagani, V., Roy, K., Muppasani, B., Fabiano, F., Loreggia, A., Murugesan, K., Srivastava, B., Rossi, F., Horesh, L. and Sheth, A., 2024. On the prospects of incorporating large language models (llms) in automated planning and scheduling (aps). International Conference on Automated Planning and Scheduling (ICAPS).
Applications of LLMs in Planning
32
Figure. Taxonomy of recent research in the intersection of LLMs and Planning with (#) mentioning the number of scholarly papers in each category [1].
In this tutorial, we focus on using LLMs for plan generation
[1] Pallagani, V., Roy, K., Muppasani, B., Fabiano, F., Loreggia, A., Murugesan, K., Srivastava, B., Rossi, F., Horesh, L. and Sheth, A., 2024. On the prospects of incorporating large language models (llms) in automated planning and scheduling (aps). International Conference on Automated Planning and Scheduling (ICAPS).
Capabilities of LLMs for Plan Generation
33
Fine-tuning is a method that updates the parameters of an LLM using a labeled dataset for the target task.
Prompting is a method that modifies the input of an LLM using a template or a cue to elicit the desired output.
Capabilities of LLMs for Plan Generation
We focus on answering four research questions [1]:
(a) To what extent can LLMs solve planning problems?
(b) What pre-training data is effective for plan generation?
(c) Does fine-tuning and prompting improve LLMs plan generation?
(d) Are LLMs capable of plan generalization?
34
[1] Pallagani, V., Muppasani, B., Murugesan, K., Rossi, F., Srivastava, B., Horesh, L., Fabiano, F. and Loreggia, A., 2023. Understanding the Capabilities of Large Language Models for Automated Planning. arXiv preprint arXiv:2305.16151.
Capabilities of LLMs for Plan Generation
For this study, we constructed a dataset comprising 18,000 planning problems along with their corresponding optimal plans across 6 domains. The dataset was divided into an 80%-20% train-test split, with the purpose of fine-tuning and evaluating the performance of the LLMs.
35
Table. Difficulty of planning domains
Capabilities of LLMs for Plan Generation
The LLMs along with their architectures considered for this study are as follows
36
Table. Architecture of benchmark LLMs
Research Question 1: To what extent can LLMs solve planning problems?
37
Table. Evaluation of plan generation capabilities of LLMs (both prompting pre-trained model and fine-tuned model). For each model, we report the inference time (Inf. Time), the percentage of satisficing plans (Sat. Plans), the percentage of optimal plans (Opt. Plans), and the degree of correctness (Deg. Corr.).
38
LLMs pre-trained on programming code outperform those solely trained on the textual corpus.
Fine-tuning is a superior approach to solving planning problems with LLMs. Overall, it has been observed that fine-tuned LLMs are capable of generating outputs for planning problems at a rate four times faster than pre-trained LLMs.
Research Question 2: What pre-training data is effective for plan generation?
Research Question 3: Does fine-tuning and prompting improve LLMs plan generation?
There are three tasks that help measure the capability of LLMs in plan generalization -
39
Research Question 4: Are LLMs capable of plan generalization?
40
Research Question 4: Are LLMs capable of plan generalization?
Task 1: Plan length generalization
Figure. Fine-tuned CodeT5 and code-davinci with few-shot prompting show poor plan length generalization capabilities: plans from fine-tuned CodeT5 overall have a higher degree of correctness. The x-axis represents the plan length and the y-axis represents the degree of correctness. The training plan lengths are highlighted in grey.
Fine-tuned CodeT5 model can generalize to plan lengths to some extent, while the few-shot prompting of code-davinci generates only a single valid plan
41
Research Question 4: Are LLMs capable of plan generalization?
Task 2: Object name randomization
Figure. Evaluating the capabilities of LLMs in handling randomized object names. In version 1, we used only single-digit numeric values as object names. In version 2, we used alphanumeric strings of length 2 (similar to the convention followed by IPC generators), where the combinations of alphabets and numerals used were unseen during training. Version 3 consisted of object names named after three alphabets.
Fine-tuned models can only generalize to object names belonging to the same vocabulary as that of training, while few-shot prompting of code-davinci handles the randomized object names but has poor plan generation capabilities.
42
Research Question 4: Are LLMs capable of plan generalization?
Task 3: Unseen domain generalization
Both fine-tuned and few-shot prompting approaches of CodeT5 and code-davinci respectively show no capabilities of plan generation for unseen domains.
Figure. Example of an incorrect generations from LLMs for a problem from an unseen domain.
Invalid Generations from Plansformer
43
Figure. Different types of invalid plans generated by Plansformer on Blocksworld domain
We notice that even in cases of incorrect generations, LLMs often generate partially correct action sequences. In our upcoming research, we introduce a neuro-symbolic approach where these partially correct plans are employed to inform the heuristics of symbolic planners, enabling faster replanning as opposed to starting the planning process from scratch.
Discussion Pointers
44
[1] Subbarao Kambhampati, Can LLMs Really Reason and Plan?.
https://cacm.acm.org/blogs/blog-cacm/276268-can-llms-really-reason-and-plan/fulltext [Last accessed on Feb 17, 2024]
Neuro-symbolic Approaches
06.
45
Thinking Fast and Slow in Humans
46
System 1 and System 2 in AI
47
System 1 Solvers | System 2 Solvers |
Rely on past experiences | Rely on procedures |
React to new problems | Called by metacognition |
Generate solutions with a certain confidence | Generate correct solutions |
Complexity independent on input size | Complexity dependent on input size |
Meta-cognition
48
To monitor and control:
The Role
To improve the System’s decisions quality
Main Goal
Our Choice
SOFAI = System 1 + System 2 + Metacognition
49
Ganapini, M.B., Campbell, M., Fabiano, F., Horesh, L., Lenchner, J., Loreggia, A., Mattei, N., Rossi, F., Srivastava, B. and Venable, K.B., 2022, September. Thinking fast and slow in AI: The role of metacognition. In International Conference on Machine Learning, Optimization, and Data Science (pp. 502-509). Cham: Springer Nature Switzerland.
Table. The SOFAI architecture, supporting System 1/System 2 agents and meta-cognition.
Plan-SOFAI: Instantiating SOFAI for Planning
(a) System 1
Plansformer
(b) System 2
A traditional planning system, such as FastDownward (for plan generation from scratch) or LPG (for partial plan completion).
(c) Metacognition Module
A rule-based function that chooses when to adopt S1 or S2 based on previous performance, expected cost, and expected accuracy on similar problems. It also includes a plan evaluator and has various hyperparameters to be more efficient.
50
Fabiano, F., Pallagani, V., Ganapini, M.B., Horesh, L., Loreggia, A., Murugesan, K., Rossi, F. and Srivastava, B., 2023, December. Plan-SOFAI: A Neuro-Symbolic Planning Architecture. In Neuro-Symbolic Learning and Reasoning in the era of Large Language Models.
Experimental Results
51
Other Neuro-symbolic Approaches to Combine LLMs and Planning
52
[1] Silver, T., Dan, S., Srinivas, K., Tenenbaum, J.B., Kaelbling, L.P. and Katz, M., 2023. Generalized Planning in PDDL Domains with Pretrained Large Language Models. arXiv preprint arXiv:2305.11014.
[2] Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., Biswas, J. and Stone, P., 2023. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
Summary
07.
53
To Learn More About SOFAI
54
THANK YOU ALL
Contact Information
Vishal Pallagani – vishalp@mailbox.sc.edu
55
55
Questions?
56