1 of 25

Cooperation and Fairness in �Multi-Agent Reinforcement Learning

Jasmine Jerry Aloor

Dynamics, Infrastructure Networks, and Mobility (DINaMo) Research Group

Department of Aeronautics and Astronautics

MIT

1

August 2 2024

2 of 25

Multi-agent vehicular systems coordinating complex missions

2

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Advanced air mobility (AAM)

Sensing and monitoring

Ridesharing

Spacecraft operations

(Photo: Business Wire)

3 of 25

Multi-Agent Reinforcement Learning

3

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • Ability to model cooperative behaviors
  • Efficient in unstructured environments
  • Advantages over optimization-based methods:

Fast inference times

  • Centralized Training Decentralized Execution (CTDE) for cooperative tasks

4 of 25

MARL optimized for efficiency alone

4

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

!

Resource-constrained, thus minimizes shared costs

Task distribution could be inequitable

  • Some agents receive an unfair advantage
  • Others starve for resources

Quest for efficiency alone can often mean fairness is sacrificed

5 of 25

MARL optimized for fairness and efficiency

5

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Can agents learn to complete tasks fairly?

Without significantly sacrificing efficiency

E.g., improving fairness without increasing�total distance

Research Questions

6 of 25

MARL Agent Training Framework

6

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • Our environment comprises agents navigating to different goals*
  • Episodes start with entities initialized randomly
  • The policy provides an action for each agent 𝑖 in the first step
  • The agent executes the action and observes the change in state

Environment

i

Obstacles

* Based on the InforMARL framework

Nayak, Siddharth, et al. "Scalable multi-agent reinforcement learning through intelligent information aggregation." International Conference on Machine Learning. PMLR, 2023.

7 of 25

Observations for a scalable, decentralized system

7

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • Centralized MARL agents are provided with all goal positions
  • Not feasible here:
    • Fixed input sizes, not scalable
    • Lower level of privacy
    • Assumes global knowledge

Agent 1’s

observation vector

Environment

Our approach

  • Use observation range for the closest two goals’ info
  • Provide a goal occupancy flag
    • informs agents how close any agent is to that goal
    • Indicates available goals
  • Utilize a GNN to process neighboring agents’ information

8 of 25

Agent-Goal assignment schemes

8

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Random (RA)

Optimal distance cost (OA)

Min-max fair (FA)

  • Commonly-used method to improve fairness
  • Minimizes the total distance cost
  • Minimizes the maximum�distance traveled by any agent

9 of 25

 

9

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • Compute the coefficient of variation of distances as 𝐶𝑉 = 𝜎/𝜇
  • Choose our fairness metric:

At every time step,�for each agent, compute fairness metric

Environment

Agent distances

We consider the distance traveled by agents to reach their goals as the resource to be treated fairly

10 of 25

Reward computation

10

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  1. The distance-based reward to an assigned goal,
  2. Models with a fairness metric, have a fairness reward,
  3. Penalize agents that collide with other agents or obstacles, -C

Compute each�agent’s rewards

Environment

At every timestep, each agent gets the following rewards:

11 of 25

Goal Reaching Reward

11

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Once agent reaches goal,�given goal reward &�marked “done

Environment

When an individual agent reaches their assigned goal, it receives a one-time goal-reaching reward

12 of 25

Overall MARL Agent Training Framework

12

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

We train models using three agents, three goals, and three obstacles in a fixed environment size

13 of 25

Model variants

13

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

We use all goal assignment schemes and add the fairness reward to create four models:

  1. Random goal assignment with no fairness reward (RA, nFR).
    1. This model serves as our baseline.

  • Optimal distance cost goal assignment with no fairness reward (OA, nFR)

  • Fair goal assignment with no fairness reward (FA, nFR)

  • Fair goal assignment with fairness reward (FA, FR)

14 of 25

Decentralized execution framework

14

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • At the start of each episode, we initialize 𝑁 agents and 𝑁 goals randomly in the environment
  • Each agent can go to any goal in the environment
  • Agents have their local observation vector and the neighborhood graph network
  • we do not assign any goals;
  • the agents rely on their local observations and the goal assignments learned
  • do not provide any rewards or penalties

Difference from training setup

15 of 25

Example experiment of trained models

15

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

The agents start from the upper half and navigate to goals located on the bottom left part of the environment

FairAssign,FairRew�(FA, FR)

FairAssign,NoFairRew�(FA, nFR)

OptAssign,NoFairRew�(OA, nFR)

RandAssign,NoFairRew�(RA, nFR)

16 of 25

Example experiment of trained models

16

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

FairAssign,FairRew�(FA, FR)

FairAssign,NoFairRew�(FA, nFR)

OptAssign,NoFairRew�(OA, nFR)

RandAssign,NoFairRew�(RA, nFR)

Key Takeaway

Agents learn fair behavior without a significant decrease in efficiency

when trained with a fair goal assignment and a fairness reward

17 of 25

Effect of goal assignments and fairness in reward

17

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

We compare the test performance of our four models trained with 3 agents

  • Test on 3, 5, 7, and 10 agent environments over 100 episodes

We calculate the median fairness metric and distance traveled relative to the Random Assigned model

3 agent environment

10 agent environment

18 of 25

Effect of goal assignments and fairness in reward

18

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

3 agent environment

10 agent environment

Key Takeaways

  • Models trained with fair assignments (FA,*) effectively navigate trade-offs in fairness and efficiency
  • Including a fairness reward modestly improves the fairness metric
  • Scalable to any number of agents

19 of 25

Impact of congestion on overall fairness and efficiency

19

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

  • Environment is crowded with the increased number of agents
  • Decreases free space for navigating in straight lines
  • Greater chances of collisions

20 of 25

Impact of congestion on overall fairness and efficiency

20

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Models trained with 3 agents with 3 obstacles and 2 walls

  • Test on 3, 5, 7, and 10 agent environments over 100 episodes

3 agent environment

10 agent environment

  • In complex environments, agents observe nearby goals, but these may not be “fair” goals
  • Agents deliberately loiter to improve the fairness metric over goal rewards

21 of 25

Multi-Agent Formation Scenarios

21

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Our approach can be extended to agents coordinating and forming various shapes

  • Various shapes are created using a set of "expected positions" around one or two landmark positions
  • The agents arrange themselves on or near these expected positions to form different shapes
  • Prevents the need for retraining model for different shapes

landmark

position

22 of 25

Multi-Agent Formation Scenario: Test performance

22

 

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Best fairness metric

With 3 agents

With 10 agents

Comparable distances to efficient model

23 of 25

23

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Limitations and Future Works

  • Increased fairness at the expense of efficiency when the number of entities in the environment is large

  • Scalability challenges in dense environments: increased travel distances, complex navigation behaviors

  • Predefined formations: formations are fixed in space
  • Consideration of other measures of fairness
  • Development of heuristics for large-scale, highly-congested environments
  • Dynamic formations using multiple vehicles

Future Works

24 of 25

24

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

This work was supported in part by NASA under grant #80NSSC23M0220 and the University Leadership Initiative (grants #80NSSC21M0071 and #80NSSC20M0163), but this article solely reflects the opinions and conclusions of its authors and not any NASA entity.

�J. Aloor was also supported in part by a Mathworks Fellowship. The authors would like to thank the MIT SuperCloud and the Lincoln Laboratory Supercomputing Center for providing high‐performance computing resources.

Sid Nayak

Sydney Dolan

Victor Qin

Hamsa Balakrishnan

Acknowledgements

Joint work by:

25 of 25

25

August 2 2024 NASA ULI Safe Aviation Autonomy Monthly Research Seminar

Summary and Takeaways

  • Agents can learn fair assignments without needing to significantly sacrifice efficiency

14%

increase in efficiency

5%

increase in fairness

Compared to Random Assignment

7%

decrease in the efficiency

21%

increase in fairness

Compared to Optimal Assignment

  • Achieves perfect coverage even when tested with a higher number of agents
  • Greater level of decentralization, less dependent on centralized oracle
  • Generalizable to different formation shapes

Thank you!

Questions?