Jiachen Li*, Edwin Zhang*, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William Yang Wang
Offline RL with Closed-Form Policy Improvement Operators
Motivation��Our Method��Results��Conclusion��
01��02��03��04��
TIMELINE
Motivation
To Begin
To Deliver
https://bair.berkeley.edu/blog/2020/12/07/offline/
Motivation
To Begin
To Deliver
OOD policy shift and training instability
Motivation
To Begin
To Deliver
Our Method
To Begin
To Deliver
- Joe Sparano
Good design is obvious.
Great design is transparent.
Our Method
Our Method
Intractable…
Tractable!
Results
#3
Results
THANK YOU!