Generalist Models
for Robotic Manipulation
Martin Sedláček
1
Intro
Main Goal: Get useful generalist robots that can robustly do tasks in the real world.
2
Intro
Main Goal: Get useful generalist robots that can robustly do tasks in the real world.
3
Intro
Main Goal: Get useful generalist robots that can robustly do tasks in the real world.
4
[figure from RT-2]
Set-up
“Pick up the apple”
+
+
5
[Franka Emika Panda robot], [RealSense camera]
6
7
[demo from Pi0]
8
Adding other modalities?
9
[x, y, z, 𝛉]
[x, y, z, 𝛉]
[x, y, z, 𝛉]
Adding other modalities?
10
[figure from Brás & Neto]
Adding other modalities?
11
[figure from Richard Savery]
12
Many Different�Robots!
What’s in the black box?
13
?
What’s in the black box?
14
?
Planning?
What’s in the black box?
15
?
Planning?
ML?
Two ways:
16
Two ways:
Recent example - image matching:
17
[figure form Mast3r]
[figure from CVPR 2017 tutorial]
Two ways:
Recent example - image matching:
18
[figure form Mast3r]
X
One NN�(and A LOT more data)
[figure from CVPR 2017 tutorial]
+30%�(~2.5x)
Is image and language enough?
19
Is image and language enough?
Multi-Modal + lots of data = generalization capabilities = Foundation Model
20
Vision-Language Model (VLM)
21
[figure form Paligemma]
Vision-Language Model (VLM)
22
[figure form CLIP]
Vision-Language Model (VLM) for robotics?
23
“Pick up the bag near the edge of the table.”
?
Vision-Language-Action (VLA)
24
“Pick up the bag near the edge of the table.”
Secret sauce?
25
26
[slide from Kevin Black]
Beyond manipulation?
27
[figure form CrossFormer]
Thank you!
28