LLM for Embodied AI
Team 12
Edouard Albert-Roulhac
&
Abdelhakim Sehad
13-03-2024
Motivation
Image: qualcomm
Embodied AI – Definition
Build AI agents which interact with the world
Embodied AI – Challenges
VOYAGER: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang & al.
Oct. 2023
Voyager – Context
Minecraft as a virtual world
New paradigm
Voyager – Curriculum, Skill library, Iterative Prompting
Skill Library
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu & al.
Sept. 2023
EmbodiedGPT - Overview
Main approach – Key Points
EgoCOT
EgoCOT - Data Preparation
How EmbodiedGPT work’s ?
Training process
Evaluation
Evaluation
Results
Results
Strengths and Weaknesses
Reflections
Embodied Learning vs. Pre-Training
Paper | Focus | Embodiment | Chain of Thought | Dataset likely Open Source? |
PaLM-E | Develop an embodied multimodal language model | Yes (can interact with physical objects and perceive the world) | Not a core focus (paper emphasizes multimodal capabilities) | No (proprietary robot data) |
Voyager | Develop an open-ended embodied agent using LLMs | Yes (acts in Minecraft environment) | Not explicitly mentioned (focuses on learning through trial and error) | NaN (dataset not required) |
EmbodiedGPT | Pre-train LLMs for better visual-language understanding and reasoning | No (trained on data, not embodied interaction) | Yes | Yes (and built based on an open-source large-scale dataset – great scalability) |
Thank you !
Annexe
Annexe