RBE595 RL Research Paper Review
Keith Chester, Bob DeMont
Emergent tool use from multi-agent interaction
Worcester Polytechnic Institute
Hide and Seek through Reinforcement Learning
Worcester Polytechnic Institute
Background Challenges
Worcester Polytechnic Institute
Related Previous Work
environment learned emergent behaviors
like passing in soccer simulations
were incentivized for the use
Worcester Polytechnic Institute
Key Concepts
Worcester Polytechnic Institute
Whats New
Worcester Polytechnic Institute
Setup
play area
Hiders | Seekers |
+1 if none are seen | +1 if they find a hider |
-1 if seen | -1 if they don’t find a hider |
Worcester Polytechnic Institute
Agent Actions
8
Worcester Polytechnic Institute
Policy
Worcester Polytechnic Institute
Results
Run away and chase
Hiders lock boxes
Seeker’s use ramps
Hiders hide ramps
Seekers perform block surfing
Hiders lock blocks to prevent surfing
25MM
100MM
110MM
380MM
450MM
10MM
Worcester Polytechnic Institute
Emerging Strategies
Strategy 1: Run and Chase
Strategy 2: Tools to Hide
Worcester Polytechnic Institute
Emerging Strategies
Strategy 3:Seeker Tool Use
to Counter Hider Tool Use
Strategy 4: Hider Counter Defense
to Seeker Tool Use
Worcester Polytechnic Institute
Emerging Strategies
Strategy 5: Seekers Counter
to Hiders Counter
Strategy 6:Hiders Counter
to Seekers Counter
Worcester Polytechnic Institute
Sensitivity Analysis
“providing further evidence that multi-agent interaction is a promising path towards self-supervised skill acquisition”
Worcester Polytechnic Institute
Evaluation
Worcester Polytechnic Institute
Comparison
Worcester Polytechnic Institute
Conclusions
Worcester Polytechnic Institute
Future Research
Worcester Polytechnic Institute