2 of 26

General Tips

Read the paper

You don’t have to remember a lot from the paper, just remember which sections discuss what

If you are running out of revision time, don’t skip the paper, watch the video or read the website/blog associated with the paper (linked to this slide)

3 of 26

General Tips

Read the paper

You don’t have to remember a lot from the paper, just remember which sections discuss what

If you are running out of revision time, don’t skip the paper, read the website/blog associated with the paper

4 of 26

Diffusion Models for Imitation and Model-based RL

5 of 26

Understanding Diffusion Models:A Unified Perspective

Diffusion models start with an example piece of data.
Add noise to it.
Train a model that tries to get the original piece of data back.

6 of 26

Planning with Diffusion for Flexible Behavior Synthesis

In this case our example piece of data is a trajectory with a certain length, N. We noise this trajectory and then we train a model to denoise it (aka create valid trajectories).

7 of 26

Planning with Diffusion for Flexible Behavior Synthesis

Then we learn a model of the reward we expect to get from the trajectory and use that to guide the created trajectory towards higher reward trajectories.

8 of 26

Imitating Human Behaviour with Diffusion Models

Before we get into this paper...

9 of 26

Imitation Learning: GAIL

How do we solve the multi-modality of state conditioned action distributions?

10 of 26

Imitation Learning: GAIL

How do we solve the multi-modality of state conditioned action distributions?

Use any of the generative models we defined before!

11 of 26

Imitating Human Behaviour with Diffusion Models

This paper addresses how we can sample from diffusion models to imitate distributions well.

12 of 26

Language and Robot Control

13 of 26

Learning Transferable Visual Models From Natural Language Supervision

Contrastive Language-�Image Pre-training

Train a text & image encoder on text & image pairs. Make the two encoders learn the same latent representation.

14 of 26

CLIPort What and Where Pathways for Robotic Manipulation

CLIP + Transporter Networks

CLIP is frozen and then we use it to train a transporter network that can follow natural language commands.

15 of 26

Language Models are Few-Shot Learners

Use pre-trained language models to answer reasoning questions.

Zero-shot means just ask it the question.

One-shot means also give it one example.

Few-shot means also give it multiple examples.

16 of 26

Code as Policies:Language Model Programs for Embodied Control

Pass user prompt to large language model with prompt that includes how to use a custom api to interact with objects.

Incredibly simple idea that’s very general.

17 of 26

Offline Reinforcement Learning

18 of 26

Off-Policy Deep Reinforcement Learning without Exploration

Asks the question “how off policy can off policy actually be?”

The answer is “not very”.

“[L]earning a value estimate with off-policy data can result in large amounts of extrapolation error if the policy selects actions which are not similar to the data found in the batch.”

19 of 26

Off-Policy Deep Reinforcement Learning without Exploration

They propose the BCQ algorithm which basically restricts the policy from choosing actions that are very from what was observed.

20 of 26

IRIS:Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

21 of 26

Visual Imitation Learning

22 of 26

SFV Reinforcement Learning of Physical Skills from Videos

23 of 26

Video PreTraining (VPT):Learning to Act by Watching Unlabeled Online Videos

24 of 26

Example Paper (will not be paper on exam)

https://arxiv.org/abs/2111.00210

What problem does this aim to fix?
How does it fix this problem?
What problems does this fix introduce?
How did they run MCTS faster?
Why is there only one learner?
What is a value prefix?

25 of 26

How to Approach the Paper

How to read a paper:

Do it in multiple passes
First read the title, think about what you think it should be doing, then repeat with the abstract and conclusion

When what they say disagrees/doesn’t make sense with what you think it should be doing, update your mental model of the paper or take note of what they are saying (if you can’t wrap your head around why they do it a certain way)

Read their methods section (or whatever comes after the Background section)
Read the Background if you need context for something they are doing

26 of 26

How to Approach the Paper

How to read a paper:

What are they trying to do

Title
Abstract
Background
Conclusion

Why are they doing this

Title
Abstract
Background

How do they do it

Methods (or whatever comes after background)
Appendix