Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
James MacGlashan, Evan Archer*, Alisa Devlic*, �Takuma Seno*, Craig Sherstan*, Peter R. Wurman, Peter Stone
* Equal Contribution
Why isn’t my RL agent working?
2
Insufficient state features?
Imbalanced reward objectives?
Poor exploration?
Values are not propagating?
Insufficient network capacity?
Difference between training and test?
Unstable value bootstrapping?
Environment bugs?
Sharp value space?
© 2022, Sony AI
Ask the Q-function?
3
Why did you take action <action>?
Its predicted �value is �243.74839
Okay, but why not action �<other action>?
Its predicted �value was only �239.25245
© 2022, Sony AI
Ask the Q-function?
4
Why did you take action <action>?
Its predicted �value is �243.74839
Okay, but why not action �<other action>?
Its predicted �value was only �239.25245
Q-functions summarize many long-term future outcomes into �a single (uninformative) number
© 2022, Sony AI
Value decomposition
5
In many environments, reward functions are a weighted sum of components:
Resulting in the relationship
Actor-critic algorithms can be adapted �to learn each component Q-function
© 2022, Sony AI
Conventional actor critic
6
S
Critic NN
Q
TD
Q’, R
Critic training
A
Critic NN
Q
S
Policy NN
Policy loss
Actor training
S
π
© 2022, Sony AI
Actor critic with value decomposition
7
S
Critic NN
Q1
TD
Q1’, R1
Critic training
A
…
Q2
Qk
TD
Q2’, R2
Qk’, Rk
TD
…
…
Critic NN
S
Policy NN
Policy loss
Actor training
S
π(s)
Q1
…
Q2
Qk
x
w1, w2, …, wk
Key idea: these multiple predictions facilitate the diagnosis and correction of RL problems
© 2022, Sony AI
Determining influence of different rewards
8
Influence per environment step
Influence summaries over training
© 2022, Sony AI
Detect and control exploration inhibiting rewards
9
© 2022, Sony AI
No sacrifice in overall performance
10
© 2022, Sony AI
Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
James MacGlashan, Evan Archer*, Alisa Devlic*, �Takuma Seno*, Craig Sherstan*, Peter R. Wurman, Peter Stone
* Equal Contribution
© 2021 Sony AI, Confidential
9/9/21