1 of 28

Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep MBRL

Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar

2 of 28

Adaptivity is a Key Feature of Model-Based Learning

Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

3 of 28

The Local Change Adaptation (LoCA) Setup

Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

4 of 28

The Local Change Adaptation (LoCA) Setup

Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

5 of 28

The Local Change Adaptation (LoCA) Setup

Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

6 of 28

Current Deep Model-Based RL Methods Are Not Adaptive!

Wan, Yi, et al. "Towards evaluating adaptivity of model-based reinforcement learning methods." International Conference on Machine Learning. PMLR, 2022.

7 of 28

Replay Buffer Against Catastrophic Forgetting

10 of 28

Interference-Forgetting Dilemma

11 of 28

Interference-Forgetting Dilemma

12 of 28

Local Forgetting (LoFo) Replay Buffer

13 of 28

Local Forgetting (LoFo) Replay Buffer

14 of 28

Local Forgetting (LoFo) Replay Buffer

15 of 28

Local Forgetting (LoFo) Replay Buffer

16 of 28

LoFo Buffer

17 of 28

Experiments

18 of 28

Dreamer w/ the LoFo Buffer

19 of 28

Dreamer w/ the LoFo Buffer

20 of 28

Dreamer’s Reward Estimates

21 of 28

Dreamer’s Reward Estimates

22 of 28

Dreamer’s Reward Estimates

23 of 28

Dreamer’s Reward Estimates

24 of 28

Limitations and Future Work

Demonstration of the Possibility of Local Adaptivity ✅

25 of 28

Limitations and Future Work

Demonstration of the Possibility of Local Adaptivity ✅
General Strategy - LoFo ✅

Forgetting Samples that are Spatially Close, but Temporally Far

26 of 28

Limitations and Future Work

Demonstration of the Possibility of Local Adaptivity ✅
General Strategy - LoFo ✅

Forgetting Samples that are Spatially Close, but Temporally Far

More Complex Domains

Initial Dataset with a Random Policy ❌

27 of 28

Limitations and Future Work

Demonstration of the Possibility of Local Adaptivity ✅
General Strategy - LoFo ✅

Forgetting Samples that are Spatially Close, but Temporally Far

More Complex Domains

Initial Dataset with a Random Policy ❌

Quadratic Time Complexity

Simplest form of Nearest-Neighbours ❌

28 of 28

Thank you!

Contact Info: ali-rahimi.kalahroudi@mila.quebec