Quiz 3 Review 1
Part 1 of Quiz 3 Review
Lectures #18-22
⇒ Part 2 will cover the rest
Some more MBRL
Variational Autoencoders (VAE)
Can also condition the decoder on other variables (conditional VAE)
Dreamer
Train on imagined trajectories!
Learn a model of the environment (predict next state)
Act in the environment to get more observations for step 1
Discrete variables better capture multi-modal distributions
Intelligent Exploration
Exploration via modeling uncertainty of Q function
Model distribution itself (difficult)
Exploration via modeling uncertainty of Q function
State counting
Map a state to a hash code, then count up states visited with that hash code. Encourage visiting states with low count hash codes
Exploration A Study of Count-Based Exploration for Deep Reinforcement Learning,Tang et al.
Prediction error
Curiosity driven exploration with self-supervised prediction, Pathak et al.�Large-scale study of Curiosity-Driven Learning, Burda et al.
Reachability - episodic curiosity through reachability
Go-Explore: a New Approach for Hard-Exploration Problems
Derailment can occur when an agent has discovered a promising state and it would be beneficial to return to that state and explore from it.
Failures of intrinsic motivation stem from two issues:
Detachment is the idea that an agent driven by intrinsic motivation could become detached from the frontiers of high intrinsic reward (IR).
Go-explore
Learning Montezuma’s Revenge from a Single Demonstration
Sim2Real Transfer
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World, Tobin et al
Solving Rubik’s Cube with a Robot Hand
ADR: 1. gradually expand training environments (curriculum), 2. Removes need for manual domain randomization -> expansion based on performance
Driving Policy Transfer via Modularity and
Abstraction
RMA: Rapid Motor Adaptation for Legged Robots
RMA: Rapid Motor Adaptation for Legged Robots