Cooperation and Fairness in �Multi-Agent Reinforcement Learning
Jasmine Jerry Aloor
Dynamics, Infrastructure Networks, and Mobility (DINaMo) Research Group
Department of Aeronautics and Astronautics
MIT
1
August 2 2024
Multi-agent vehicular systems coordinating complex missions
2
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Advanced air mobility (AAM)
Sensing and monitoring
Ridesharing
Spacecraft operations
(Photo: Business Wire)
Multi-Agent Reinforcement Learning
3
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Fast inference times
Source: TowardsDataScience
MARL optimized for efficiency alone
4
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
!
Resource-constrained, thus minimizes shared costs
Task distribution could be inequitable
Quest for efficiency alone can often mean fairness is sacrificed
MARL optimized for fairness and efficiency
5
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Can agents learn to complete tasks fairly?
Without significantly sacrificing efficiency
E.g., improving fairness without increasing�total distance
Research Questions
MARL Agent Training Framework
6
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Environment
i
Obstacles
* Based on the InforMARL framework
Nayak, Siddharth, et al. "Scalable multi-agent reinforcement learning through intelligent information aggregation." International Conference on Machine Learning. PMLR, 2023.
Observations for a scalable, decentralized system
7
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Agent 1’s
observation vector
Environment
Our approach
Agent-Goal assignment schemes
8
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Random (RA)
Optimal distance cost (OA)
Min-max fair (FA)
9
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
At every time step,�for each agent, compute fairness metric
Environment
Agent distances
We consider the distance traveled by agents to reach their goals as the resource to be treated fairly
Reward computation
10
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Compute each�agent’s rewards
Environment
At every timestep, each agent gets the following rewards:
Goal Reaching Reward
11
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Once agent reaches goal,�given goal reward &�marked “done”
Environment
When an individual agent reaches their assigned goal, it receives a one-time goal-reaching reward
Overall MARL Agent Training Framework
12
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
We train models using three agents, three goals, and three obstacles in a fixed environment size
Model variants
13
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
We use all goal assignment schemes and add the fairness reward to create four models:
Decentralized execution framework
14
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Difference from training setup
Example experiment of trained models
15
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
The agents start from the upper half and navigate to goals located on the bottom left part of the environment
FairAssign,FairRew�(FA, FR)
FairAssign,NoFairRew�(FA, nFR)
OptAssign,NoFairRew�(OA, nFR)
RandAssign,NoFairRew�(RA, nFR)
Example experiment of trained models
16
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
FairAssign,FairRew�(FA, FR)
FairAssign,NoFairRew�(FA, nFR)
OptAssign,NoFairRew�(OA, nFR)
RandAssign,NoFairRew�(RA, nFR)
Key Takeaway
Agents learn fair behavior without a significant decrease in efficiency
when trained with a fair goal assignment and a fairness reward
Effect of goal assignments and fairness in reward
17
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
We compare the test performance of our four models trained with 3 agents
We calculate the median fairness metric and distance traveled relative to the Random Assigned model
3 agent environment
10 agent environment
Effect of goal assignments and fairness in reward
18
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
3 agent environment
10 agent environment
Key Takeaways
Impact of congestion on overall fairness and efficiency
19
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Impact of congestion on overall fairness and efficiency
20
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Models trained with 3 agents with 3 obstacles and 2 walls
3 agent environment
10 agent environment
Multi-Agent Formation Scenarios
21
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Our approach can be extended to agents coordinating and forming various shapes
landmark
position
Multi-Agent Formation Scenario: Test performance
22
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Best fairness metric
With 3 agents
With 10 agents
Comparable distances to efficient model
23
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Limitations and Future Works
Future Works
24
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
This work was supported in part by NASA under grant #80NSSC23M0220 and the University Leadership Initiative (grants #80NSSC21M0071 and #80NSSC20M0163), but this article solely reflects the opinions and conclusions of its authors and not any NASA entity.
�J. Aloor was also supported in part by a Mathworks Fellowship. The authors would like to thank the MIT SuperCloud and the Lincoln Laboratory Supercomputing Center for providing high‐performance computing resources.
Sid Nayak
Sydney Dolan
Victor Qin
Hamsa Balakrishnan
Acknowledgements
Joint work by:
25
August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar
Summary and Takeaways
14%
increase in efficiency
5%
increase in fairness
Compared to Random Assignment
7%
decrease in the efficiency
21%
increase in fairness
Compared to Optimal Assignment
Thank you!
Questions?