MLC RL Reading Group
Proximal Policy Optimization Algorithms
Reviewer
Perusha
Summary - Start with PG methods
Above: Most common gradient estimator used in PG
Above: Corresponding objective function for optimisation
Summary - Motivation for PPO
Summary - TRPO
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_13_advanced_pg.pdf
Quick explanation of TRPO also here: https://jonathan-hui.medium.com/rl-proximal-policy-optimization-ppo-explained-77f014ec3f12
Summary - PPO Adaptive Penalty co-eff
Summary - PPO Clipped
Summary - PPO Clipped
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_13_advanced_pg.pdf
Summary - Implementation
Summary - Experiments - First comparing clipping surrogate obj to other versions of the surrogate obj
Summary - Experiments - Next compare PPO clipping to other popular algorithms
Summary - Experiments - Humanoid high dimensional continuous control tasks
Review Questions:
Resources and References
Researcher (Prashant) - Bot for Gardenscapes
Applications in puzzle game:
�
Archaeologist
Past Papers:
Policy Gradient Methods
Trust Region Policy Optimization
Proximal Policy Optimization
Comparison of PPO with other methods
Archaeologist
Future Papers:
Hacker