Advanced Survey RL

	A	B	C	D	E	F
1	Inspired by the course CS332: Advanced Survey of RL, Key papers from OpenAI, Reinforcement Learning Summer School, my personal intake, STA 4273: Minimizing Expectations, CS 6789: Foundations of Reinforcement Learning, COMS E6998: Bandits and RL, CS 542: Statistical Reinforcement Learning		Update:	May 13, 2024
2	Please don't hestitate to make recommendations!!
3	Category	Paper	Year	Note	Algorithm Name	Must-read (*)
4	Book	Reinforcement Learning: Theory and Algorithms
5		Dynamic Programming and Optimal Control
6		Bayesian RL: A survey
7		A tutorial on Thompson Sampling
8		Algorithms for Reinforcement Learning
9		Adaptive Algorithms and Stochastic Approximations
10		From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
11
12	Bandit	Stochastic Linear Optimization under Bandit Feedback
13
14
15	Exploration	Provably Efficient Reinforcement Learning with Linear Function Approximation	2019
16		Contextual Decision Processes with Low Bellman Rank are PAC-Learnable	2016		Bellman rank
17		Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches	2019		witness rank
18		Provably Efficient Exploration in Policy Optimization	2024		Optimistic PPO
19		Provably Efficient Maximum Entropy Exploration	2018		Maximum-entropy policy computation
20		Learning Montezuma's Revenge from a Single Demonstration	2018		Demonstration-Initialized Rollout Worker
21		Go-Explore: a New Approach for Hard-Exploration Problems	2019		Go-Explore
22		Episodic Curiosity through Reachability	2019		Bonus Computation
23		Curiosity-driven Exploration by Self-supervised Prediction	2017		ICM
24		Large-Scale Study of Curiosity-Driven Learning	2019
25		Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward	2023
26		Model-based Reinforcement Learning and the Eluder Dimension	2014		PSRL
27		Near-Optimal Reinforcement Learning in Polynomial Time	1998		E3
28		R-max – A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning	2002		R-max
29		Near-optimal Regret Bounds for Reinforcement Learning	2010		UCRL2
30		PAC Model-Free Reinforcement Learning	2006		Delayed Q-Learning
31
32
33	Introduction and Evaluating RL Progress	Deep Reinforcement Learning at the Edge of the statistical precipice
34
35	Models and Representation Learning	Decoupling Representation Learning from Reinforcement Learning
36
37
38	Model-Free	Playing Atari with Deep Reinforcement Learning	2013		Deep Q-Learning with Experience Replay
39		Deep Recurrent Q-Learning for Partially Observable MDPs	2015		Deep Recurent Q-Network
40		Dueling Network Architectures for Deep Reinforcement Learning	2015		Double DQN
41		Deep Reinforcement Learning with Double Q-learning	2015		Double DQN
42		Prioritized Experience Replay	2015		Double DQN with propotional prioritization
43		Rainbow: Combining Improvements in Deep Reinforcement Learning	2017		Double DQN + prioritized replay + multi-step learning + distributional RL + noisy nets
44		Asynchronous Methods for Deep Reinforcement Learning	2016		A3C
45		Trust Region Policy Optimization	2015		TRPO
46		High-Dimensional Continuous Control Using Generalized Advantage Estimation	2016		GAE
47		Proximal Policy Optimization Algorithms	2017		PPO
48		Emergence of Locomotion Behaviours in Rich Environments	2017		Distributed PPO
49		Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation	2017		AC with KTR
50		Sample efficient actor-critic with experience replay	2017		AC with ER
51		Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor	2018		SAC
52		Deterministic Policy Gradient Algorithms	2014		DPG
53		Continuous control with deep reinforcement learning	2016		DDPG
54		Addressing Function Approximation Error in Actor-Critic Methods	2018
55		Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic	2017		Adaptive Q-Prop
56		Action-depedent Control Variates for Policy Optimization via Stein's Identity	2018		PPO with Control Variate through Stein’s Identity
57		The Mirage of Action-Dependent Baselines in Reinforcement Learning	2018
58		Bridging the Gap Between Value and Policy Based Reinforcement Learning	2017		Unified PCL
59		Trust-PCL: An Off-Policy Trust Region Method for Continuous Control	2018		Trust PCL
60		A Natural Policy Gradient	2001		NPG
61		Eligibility Traces for Off-Policy Policy Evaluation	2000		Eligibility(Lambda)
62		Maximum a Posteriori Policy Optimisation	2018		MPO
63		V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control	2019		V-MPO
64		Reinforcement Learning with Deep Energy-Based Policies	2017		soft Q-learning
65		Diversity is All You Need: Learning Skills without a Reward Function	2018		DIAYN
66		The Value Function Polytope in Reinforcement Learning	2019
67		An operator view of policy gradient methods	2020
68		Mirror Descent Policy Optimization	2021		MDPO
69		Combining policy gradient and Q-learning	2017		PGQL
70		The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning	2018		Reactor
71		Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning	2017		Interpolated PG
72		Equivalence Between Policy Gradients and Soft Q-Learning	2017
73		Evolution Strategies as a Scalable Alternative to Reinforcement Learning	2017		Evolution Strategy
74		Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes	2019
75		Model-Free Linear Quadratic Control via Reduction to Expert Prediction	2019
76
77
78	Model-based	Imagination-Augmented Agents for Deep Reinforcement Learning	2017		I2A
79		Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning	2017
80		Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning	2018		MVE-AC
81		Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion	2018		STEVE
82		Model-Ensemble Trust-Region Policy Optimization	2018		ME-TRPO
83		Model-Based Reinforcement Learning via Meta-Policy Optimization	2018		MB-MPO
84		Recurrent World Models Facilitate Policy Evolution	2018		MDN-RNN
85		Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm	2017
86		Thinking Fast and Slow with Deep Learning and Tree Search	2017		Expert Iteration
87		Model-based Reinforcement Learning for Atari	2020		SimPLe
88		Dual Representations for Dynamic Programming	2008
89		Learning to Simulate Complex Physics with Graph Networks	2020		GNS
90		Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models	2018		PETS
91		Planning with Diffusion for Flexible Behavior Synthesis	2022		Guided diffusion planning
92		Action-Conditional Video Prediction using Deep Networks in Atari Games	2015		Encoding-Transformation-Decoding
93		Temporal Difference Learning for Model Predictive Control	2022		MPC
94		Mastering Atari, Go, chess and shogi by planning with a learned model	2020		MuZero
95		Dream to Control: Learning Behaviors by Latent Imagination	2019		Dreamer
96		Adaptive Discretization for Model-Based Reinforcement Learning	2020
97		Model-based Reinforcement Learning and the Eluder Dimension	2014
98
99
100	Linear MDP	Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound	2019		MatrixRL