1 of 81

Embodied Language Models

From multi-task learning towards the generalist robot

Thomas Dooms, Alexander Belooussov

In this presentation, we will delve into the fascinating world of Embodied Language Models. This topic will be approached from a deep multi-task and meta-learning angle.

The first part of the lecture will cover the necessary principles. We begin by defining and explaining multi-task and meta-learning. Then, we dive into the application of these techniques in a reinforcement learning context. Lastly, we provide a review of large language models and their capabilities.

In the second part, we delve deep into language conditioning for robotic tasks. This allows robots to leverage the vast knowledge of language models improve generalisation on previously unseen tasks. Specifically, how the dataset is collected and augmented as well as what kind of architecture is employed.

Finally, all discussed topics culminate into an explanation of Palm-E, a novel embodied language and vision model by Google. Combining vision tasks, language tasks and a multitude of embodied tasks allows the model to quickly learn most unseen tasks, leading us one step closer to generalist robots.

2 of 81

Context

Research project 2 (12 sp)

Transformers United
Deep multi-task & meta-learning

CS 330: Deep Multi-Task and Meta Learning

3 of 81

Content

A brief overview of this lecture

The Basics
	Reinforcement Learning
Language Conditioning

4 of 81

Multi-Task Learning

5 of 81

Definition

Let’s refresh the basics

Supervised learning

6 of 81

Motivation

Some data modalities are very hard to acquire

datapoints

NLP

CV

RL

Medical imaging

Audio

7 of 81

Transfer learning

Solve target task 𝒯_b

By transferring knowledge learned from 𝒯_a

Without access to 𝒟_a

8 of 81

Limitations

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Kumar Ananya et al. (2022)

9 of 81

Multi-task learning

CS330: Deep multi-task & meta-learning

Chelsea Finn (2021)

Assumption

Tasks share some structure → most often, they do

Learning this structure is beneficial for both tasks

Definitions

Example

10 of 81

Multi-task architectures

Hard Sharing

All Sharing

Soft Sharing

11 of 81

Summary

Short summary of multi-task learning

Train multiple tasks together

No tradeoff between specificity and generality

Less overfitting

Higher accuracy

12 of 81

Meta learning

13 of 81

Learning to Learn

Who can figure it out?

4 🍕 6 = ?

3 🍕 5 = 18

1 🍕 2 = 3

2 🍕 3 = 8

6 🍕 1 = 12

14 of 81

Definitions

Learning to Learn with Gradients

Finn, Chelsea (2018)

Mathematically

Given data from 𝒯₁ , ..., 𝒯_n, quickly solve new task 𝒯_test

Supervised learning:

Meta-learning:

Intuitively

Find the set of parameters θ such that new tasks can be learned quickly.

15 of 81

Support & Query

Support

Query

1	2	1	2

1	2	1	2

1	2	1	2

1	2

1	2

?	?

16 of 81

Black-box

meta learning

17 of 81

Black-box

meta learning

A concrete example using RNNs

18 of 81

Optimisation based meta learning

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks�Finn, Chelsea et al. (2017)

19 of 81

Quiz Time

Which marker is the best candidate for θ?

20 of 81

Summary

What kind of learning algorithms have we discussed?

Transfer learning

Solve 𝒯_b by transferring knowledge from 𝒯_a

Multi-task learning

Solve multiple tasks 𝒯₁, 𝒯₂, … , 𝒯_n at once

Meta-learning

Given data from 𝒯₁, 𝒯₂, … , 𝒯_n quickly solve 𝒯_test

21 of 81

Questions?

22 of 81

Reinforcement Learning

23 of 81

Multi-Task RL

Cross-task generalization

Performance increases for all tasks

Why multi-task?

24 of 81

Multi-Task RL

Cross-task generalization

Easier exploration

Tasks share knowledge

Why multi-task?

25 of 81

Multi-Task RL

Cross-task generalization

Easier exploration

Sequencing for long-horizon tasks

Long tasks can be split into easier sub-tasks

Why multi-task?

26 of 81

Multi-Task RL

Cross-task generalization

Easier exploration

Sequencing for long-horizon tasks

Reset-free learning

No intervention needed

Generalize to different starting states and goals

Why multi-task?

27 of 81

Multi-Task RL

Cross-task generalization

Easier exploration

Sequencing for long-horizon tasks

Reset-free learning

Per-task sample-efficiency gains

Fewer examples per task needed

Why multi-task?

28 of 81

Multi-Task RL

Task can be defined by

New State/Action space

Different dynamics

Different reward function

(Optional) Task identifier inside state

One-hot

Language description

Goal state ⇒ Goal-conditioned RL

Task Specification

The Multi-task RL definition is very similar to the regular RL setting, so each task is still an MDP. �There are, however, many ways to define different tasks, as each change in the MDP results in a different task. �Changing the states in the MDP defines a new task for example. �So does changing the available actions, the dynamics, or the rewards.

Optionally, the task can be specified in the state through a task identifier z_i. �There are many different forms for this identifier. �Examples are one-hot vectors, or even language descriptions. �Sometimes we simply want to give a desired goal state as the task identifier. �This is then referred to as goal-conditioned RL.

Can anyone maybe tell me why we would or would not want to use one-hot vectors as task descriptors? �What about natural language?

29 of 81

Defining tasks

30 of 81

Defining tasks

31 of 81

Defining tasks

32 of 81

Multi-Task RL

Hindsight Relabeling�or Hindsight Experience Replay (HER)