1 of 11

Jiachen Li*, Edwin Zhang*, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William Yang Wang

Offline RL with Closed-Form Policy Improvement Operators

2 of 11

Motivation��Our Method��Results��Conclusion��

01��02��03��04��

TIMELINE

3 of 11

Motivation

To Begin

To Deliver

https://bair.berkeley.edu/blog/2020/12/07/offline/

4 of 11

Motivation

To Begin

To Deliver

OOD policy shift and training instability

5 of 11

Motivation

To Begin

To Deliver

6 of 11

Our Method

To Begin

To Deliver

7 of 11

- Joe Sparano

Good design is obvious.

Great design is transparent.

Our Method

8 of 11

Our Method

Intractable…

Tractable!

9 of 11

Results

#3

10 of 11

Results

11 of 11

THANK YOU!