Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
University of Hamburg
Department of Informatics
Knowledge Technology
How do humans/robots perceive the surroundings to uncover latent properties? [1]
2
Visual monitoring
Auditory monitoring
…
(Epistemic)
Uncertainty
Weigh
Knock on
Touch
…
Low-resolution sensing
…
Ambiguity in human instructions
Information insufficiency in modalities
[1] Kroemer, Oliver, Scott Niekum, and George Konidaris. "A review of robot learning for manipulation: Challenges, representations, and algorithms." The Journal of Machine Learning Research 22.1 (2021): 1395-1476.
Bridge the gap with LLMs
3
Robots with hand-crafted design
Humans
Robots with LLMs
Matcha* agent
(Multimodal environment chatting agent)
* By the name of a type of East Asian green tea. To fully appreciate matcha, one must engage multiple senses to perceive its appearance, aroma, taste, texture, and other sensory nuances.
4
Start with a vision module to describe the scene
Yellow block
Orange block
Gray block
…
Matcha agent
(Structure)
5
The scene description, and the task instruction (together with few-shot examples), will be fed into a large language model to actively choose the next perception action.
Yellow block
Orange block
Gray block
…
Pick up the plastic cube.
Few-shot examples
Matcha agent
(Structure)
6
Yellow block
Orange block
Gray block
…
Pick up the plastic cube.
Few-shot examples
The chosen action will be carried out with motion planning.
Matcha agent
(Structure)
7
Yellow block
Orange block
Gray block
…
Pick up the plastic cube.
Few-shot examples
Feeding back the multimodal response to the LLM and loop until the task is done.
Matcha agent
(Structure)
8
9
Matcha agent
(In simulation)
[2] Kerzel, Matthias, et al. "NICOL: A Neuro-inspired Collaborative Semi-humanoid Robot that Bridges Social Interaction and Reliable Manipulation." arXiv preprint arXiv:2305.08528 (2023).
https://youtu.be/rMMeMTWmT0k
Experiment results
10
*Random guess in principle: 33.33%
Generalization, Limitation and Future Work
11
Thank You for Your Attention!
12
University of Hamburg
Department of Informatics
Knowledge Technology
Logos
14