ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Inspired by the course CS332: Advanced Survey of RL, Key papers from OpenAI, Reinforcement Learning Summer School, my personal intake, STA 4273: Minimizing Expectations, CS 6789: Foundations of Reinforcement Learning, COMS E6998: Bandits and RL, CS 542: Statistical Reinforcement Learning Update: May 13, 2024
2
Please don't hestitate to make recommendations!!
3
CategoryPaperYearNoteAlgorithm NameMust-read (*)
4
BookReinforcement Learning: Theory and Algorithms
5
Dynamic Programming and Optimal Control
6
Bayesian RL: A survey
7
A tutorial on Thompson Sampling
8
Algorithms for Reinforcement Learning
9
Adaptive Algorithms and Stochastic Approximations
10
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
11
12
BanditStochastic Linear Optimization under Bandit Feedback
13
14
15
ExplorationProvably Efficient Reinforcement Learning with Linear Function Approximation2019
16
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable2016Bellman rank
17
Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
2019witness rank
18
Provably Efficient Exploration in Policy Optimization2024Optimistic PPO
19
Provably Efficient Maximum Entropy Exploration2018
Maximum-entropy policy computation
20
Learning Montezuma's Revenge from a Single Demonstration2018
Demonstration-Initialized Rollout Worker
21
Go-Explore: a New Approach for Hard-Exploration Problems2019Go-Explore
22
Episodic Curiosity through Reachability 2019
Bonus Computation
23
Curiosity-driven Exploration by Self-supervised Prediction2017ICM
24
Large-Scale Study of Curiosity-Driven Learning2019
25
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward2023
26
Model-based Reinforcement Learning and the Eluder Dimension2014PSRL
27
Near-Optimal Reinforcement Learning in Polynomial Time1998E3
28
R-max – A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning2002R-max
29
Near-optimal Regret Bounds for Reinforcement Learning2010UCRL2
30
PAC Model-Free Reinforcement Learning2006
Delayed Q-Learning
31
32
33
Introduction and Evaluating RL ProgressDeep Reinforcement Learning at the Edge of the statistical precipice
34
35
Models and Representation LearningDecoupling Representation Learning from Reinforcement Learning
36
37
38
Model-FreePlaying Atari with Deep Reinforcement Learning2013
Deep Q-Learning with Experience Replay
39
Deep Recurrent Q-Learning for Partially Observable MDPs2015
Deep Recurent Q-Network
40
Dueling Network Architectures for Deep Reinforcement Learning2015Double DQN
41
Deep Reinforcement Learning with Double Q-learning2015Double DQN
42
Prioritized Experience Replay2015
Double DQN with propotional prioritization
43
Rainbow: Combining Improvements in Deep Reinforcement Learning2017
Double DQN + prioritized replay + multi-step learning + distributional RL + noisy nets
44
Asynchronous Methods for Deep Reinforcement Learning2016A3C
45
Trust Region Policy Optimization2015TRPO
46
High-Dimensional Continuous Control Using Generalized Advantage Estimation2016GAE
47
Proximal Policy Optimization Algorithms2017PPO
48
Emergence of Locomotion Behaviours in Rich Environments2017Distributed PPO
49
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation2017AC with KTR
50
Sample efficient actor-critic with experience replay2017AC with ER
51
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor2018SAC
52
Deterministic Policy Gradient Algorithms2014DPG
53
Continuous control with deep reinforcement learning2016DDPG
54
Addressing Function Approximation Error in Actor-Critic Methods2018
55
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic2017Adaptive Q-Prop
56
Action-depedent Control Variates for Policy Optimization via Stein's Identity2018
PPO with Control Variate through Stein’s Identity
57
The Mirage of Action-Dependent Baselines in Reinforcement Learning2018
58
Bridging the Gap Between Value and Policy Based Reinforcement Learning2017Unified PCL
59
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control2018Trust PCL
60
A Natural Policy Gradient2001NPG
61
Eligibility Traces for Off-Policy Policy Evaluation2000
Eligibility(Lambda)
62
Maximum a Posteriori Policy Optimisation2018MPO
63
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control2019V-MPO
64
Reinforcement Learning with Deep Energy-Based Policies2017soft Q-learning
65
Diversity is All You Need: Learning Skills without a Reward Function2018DIAYN
66
The Value Function Polytope in Reinforcement Learning2019
67
An operator view of policy gradient methods2020
68
Mirror Descent Policy Optimization2021MDPO
69
Combining policy gradient and Q-learning2017PGQL
70
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning2018Reactor
71
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning2017Interpolated PG
72
Equivalence Between Policy Gradients and Soft Q-Learning2017
73
Evolution Strategies as a Scalable Alternative to Reinforcement Learning2017
Evolution Strategy
74
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes2019
75
Model-Free Linear Quadratic Control via Reduction to Expert Prediction2019
76
77
78
Model-basedImagination-Augmented Agents for Deep Reinforcement Learning2017I2A
79
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning2017
80
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning2018MVE-AC
81
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion2018STEVE
82
Model-Ensemble Trust-Region Policy Optimization2018ME-TRPO
83
Model-Based Reinforcement Learning via Meta-Policy Optimization2018MB-MPO
84
Recurrent World Models Facilitate Policy Evolution2018MDN-RNN
85
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm2017
86
Thinking Fast and Slow with Deep Learning and Tree Search2017Expert Iteration
87
Model-based Reinforcement Learning for Atari2020SimPLe
88
Dual Representations for Dynamic Programming2008
89
Learning to Simulate Complex Physics with Graph Networks2020GNS
90
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models2018PETS
91
Planning with Diffusion for Flexible Behavior Synthesis2022
Guided diffusion planning
92
Action-Conditional Video Prediction using Deep Networks in Atari Games2015
Encoding-Transformation-Decoding
93
Temporal Difference Learning for Model Predictive Control2022MPC
94
Mastering Atari, Go, chess and shogi by planning with a learned model2020MuZero
95
Dream to Control: Learning Behaviors by Latent Imagination2019Dreamer
96
Adaptive Discretization for Model-Based Reinforcement Learning2020
97
Model-based Reinforcement Learning and the Eluder Dimension2014
98
99
100
Linear MDPReinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound2019MatrixRL