Quiz 3 Review 2
General Tips
General Tips
Diffusion Models for Imitation and Model-based RL
Understanding Diffusion Models:A Unified Perspective
Planning with Diffusion for Flexible Behavior Synthesis
In this case our example piece of data is a trajectory with a certain length, N. We noise this trajectory and then we train a model to denoise it (aka create valid trajectories).
Planning with Diffusion for Flexible Behavior Synthesis
Then we learn a model of the reward we expect to get from the trajectory and use that to guide the created trajectory towards higher reward trajectories.
Imitating Human Behaviour with Diffusion Models
Before we get into this paper...
Imitation Learning: GAIL
Imitation Learning: GAIL
Imitating Human Behaviour with Diffusion Models
This paper addresses how we can sample from diffusion models to imitate distributions well.
Language and Robot Control
Learning Transferable Visual Models From Natural Language Supervision
Contrastive Language-�Image Pre-training
Train a text & image encoder on text & image pairs. Make the two encoders learn the same latent representation.
CLIPort What and Where Pathways for Robotic Manipulation
CLIP + Transporter Networks
CLIP is frozen and then we use it to train a transporter network that can follow natural language commands.
Language Models are Few-Shot Learners
Use pre-trained language models to answer reasoning questions.
Zero-shot means just ask it the question.
One-shot means also give it one example.
Few-shot means also give it multiple examples.
Code as Policies:Language Model Programs for Embodied Control
Pass user prompt to large language model with prompt that includes how to use a custom api to interact with objects.
Incredibly simple idea that’s very general.
Offline Reinforcement Learning
Off-Policy Deep Reinforcement Learning without Exploration
Asks the question “how off policy can off policy actually be?”
The answer is “not very”.
“[L]earning a value estimate with off-policy data can result in large amounts of extrapolation error if the policy selects actions which are not similar to the data found in the batch.”
Off-Policy Deep Reinforcement Learning without Exploration
They propose the BCQ algorithm which basically restricts the policy from choosing actions that are very from what was observed.
IRIS:Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data
Visual Imitation Learning
SFV Reinforcement Learning of Physical Skills from Videos
Video PreTraining (VPT):Learning to Act by Watching Unlabeled Online Videos
Example Paper (will not be paper on exam)
https://arxiv.org/abs/2111.00210
How to Approach the Paper
How to read a paper:
How to Approach the Paper
How to read a paper: