1 of 28

Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep MBRL

Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar

2 of 28

Adaptivity is a Key Feature of Model-Based Learning

  • Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

3 of 28

The Local Change Adaptation (LoCA) Setup

  • Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

4 of 28

The Local Change Adaptation (LoCA) Setup

  • Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

5 of 28

The Local Change Adaptation (LoCA) Setup

  • Van Seijen, Harm, et al. "The LoCA regret: a consistent metric to evaluate model-based behavior in reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 6562-6572.

6 of 28

Current Deep Model-Based RL Methods Are Not Adaptive!

  • Wan, Yi, et al. "Towards evaluating adaptivity of model-based reinforcement learning methods." International Conference on Machine Learning. PMLR, 2022.

7 of 28

Replay Buffer Against Catastrophic Forgetting

8 of 28

Stale Data

9 of 28

Stale Data

10 of 28

Interference-Forgetting Dilemma

11 of 28

Interference-Forgetting Dilemma

12 of 28

Local Forgetting (LoFo) Replay Buffer

13 of 28

Local Forgetting (LoFo) Replay Buffer

14 of 28

Local Forgetting (LoFo) Replay Buffer

15 of 28

Local Forgetting (LoFo) Replay Buffer

16 of 28

LoFo Buffer

17 of 28

Experiments

18 of 28

Dreamer w/ the LoFo Buffer

19 of 28

Dreamer w/ the LoFo Buffer

20 of 28

Dreamer’s Reward Estimates

21 of 28

Dreamer’s Reward Estimates

22 of 28

Dreamer’s Reward Estimates

23 of 28

Dreamer’s Reward Estimates

24 of 28

Limitations and Future Work

  • Demonstration of the Possibility of Local Adaptivity ✅

25 of 28

Limitations and Future Work

  • Demonstration of the Possibility of Local Adaptivity ✅
  • General Strategy - LoFo ✅
    • Forgetting Samples that are Spatially Close, but Temporally Far

26 of 28

Limitations and Future Work

  • Demonstration of the Possibility of Local Adaptivity ✅
  • General Strategy - LoFo ✅
    • Forgetting Samples that are Spatially Close, but Temporally Far
  • More Complex Domains
    • Initial Dataset with a Random Policy ❌

27 of 28

Limitations and Future Work

  • Demonstration of the Possibility of Local Adaptivity ✅
  • General Strategy - LoFo ✅
    • Forgetting Samples that are Spatially Close, but Temporally Far
  • More Complex Domains
    • Initial Dataset with a Random Policy ❌
  • Quadratic Time Complexity
    • Simplest form of Nearest-Neighbours ❌

28 of 28

Thank you!

Contact Info: ali-rahimi.kalahroudi@mila.quebec