JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 12

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

James MacGlashan, Evan Archer*, Alisa Devlic*, �Takuma Seno*, Craig Sherstan*, Peter R. Wurman, Peter Stone

* Equal Contribution

2 of 12

Why isn’t my RL agent working?

Insufficient state features?

Imbalanced reward objectives?

Poor exploration?

Values are not propagating?

Insufficient network capacity?

Difference between training and test?

Unstable value bootstrapping?

Environment bugs?

Sharp value space?

3 of 12

Ask the Q-function?

Why did you take action <action>?

Its predicted �value is �243.74839

Okay, but why not action �<other action>?

Its predicted �value was only �239.25245

4 of 12

Ask the Q-function?

Why did you take action <action>?

Its predicted �value is �243.74839

Okay, but why not action �<other action>?

Its predicted �value was only �239.25245

Q-functions summarize many long-term future outcomes into �a single (uninformative) number

5 of 12

Value decomposition

In many environments, reward functions are a weighted sum of components:

Resulting in the relationship

Actor-critic algorithms can be adapted �to learn each component Q-function

6 of 12

Conventional actor critic

Critic NN

Q’, R

Critic training

Critic NN

Policy NN

Policy loss

Actor training

7 of 12

Actor critic with value decomposition

Critic NN

Q₁

Q₁’, R₁

Critic training

…

Q₂

Q_k

Q₂’, R₂

Q_k’, R_k

…

Critic NN

Policy NN

Policy loss

Actor training

π(s)

Q₁

…

Q₂

Q_k

w₁, w₂, …, w_k

Key idea: these multiple predictions facilitate the diagnosis and correction of RL problems

8 of 12

Determining influence of different rewards

Influence per environment step

Influence summaries over training

9 of 12

Detect and control exploration inhibiting rewards

10 of 12

No sacrifice in overall performance

11 of 12

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

James MacGlashan, Evan Archer*, Alisa Devlic*, �Takuma Seno*, Craig Sherstan*, Peter R. Wurman, Peter Stone

* Equal Contribution

12 of 12

9/9/21