Collective Artificial Intelligence
From Independent to Cooperative Models
Alfonso Amayuelas
Committee: William Wang (chair), Shiyu Chang, Xifeng Yan
PhD Major Area Exam, Spring 2025
05/02/25
Department of Computer Science
Department of Computer Science
Agenda
Department of Computer Science
LLM Agents
Environment
Reasoning
Tool/Action Selection
Action Execution
Department of Computer Science
Multi-Agent Systems (MAS)
[1] Du, Yilun, et al. "Improving factuality and reasoning in language models through multiagent debate." Forty-first International Conference on Machine Learning. 2023.
“A multi-agent system (MAS) is then defined as a
collection of agents designed to interact through orchestration, enabling collective intelligence.”
Department of Computer Science
Key Principles
[2 ] Su, Yu, et al. "Language Agents: Foundations, Prospects, and Risks." Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 2024.
Social Simulations
Software Development
Increased reasoning at inference time
Current Application Areas:
Department of Computer Science
Where are we?
[3] Cemri, Mert, et al. "Why do multi-agent llm systems fail?." arXiv preprint arXiv:2503.13657 (2025).
Why?
An analysis over > 200 traces across 7 MAS frameworks with humans and LLM judge shows:
Despite growing enthusiasm, performance gains of MAS remain minimal
Implications?
MAS Frameworks: MetaGPT, ChatDev, HyperAgent, Appworld, AG2, Magnetic-One, OpenManus…
Department of Computer Science
Towards Collective AI
2022
2023
2024
2025
2026+
Release of Chatbots
Complex Organizations
& Cooperative Learning
Multi-Agent LLM Frameworks
Advanced Post-Training
Large-Scale RL
Moving towards AI agents that interact and learn with the environment and other AI models
Agents
MCP
PhD
Beginning
Department of Computer Science
A Dynamic LLM-powered Agent Network for Task-Oriented Agent Collaboration (DyLAN) [4]
Zijun Liu
Yanzhe Zhang
Peng Li
Yang Liu
Diyi Yang
Presented at COLM 2024
Department of Computer Science
Optimizing Multi-Agent System Creation
Contributions
Question: How can we automatically optimize LLM-MAS creation?
→ Does dynamic agent selection and communication improve task performance?
Department of Computer Science
DyLAN Framework
Dynamic LLM-Powered Agent Network
Department of Computer Science
Temporal Feed-Forward Networks
Definition
Department of Computer Science
How T-FFN Enables Dynamic Collaboration (1/2)
Forward Message Passing = Inference
Agent Team Reformation
Department of Computer Science
Agent Selection Using T-FFN (Team Optimization) (2/2)
Backward Message Passing (Evaluation):
Selection Algorithm:
→ Provides a task-oriented and dynamic teams
Department of Computer Science
Experimental Results
Model = GPT-4
DyLAN outperforms strong baselines:
Accuracy on WebShop
Accuracy on MMLU
Accuracy on HumanEval
Benchmarks: HumanEval (coding), WebShop (decision-making), MMLU (general-reasoning), MATH (arithmetic-reasoning).
Department of Computer Science
Ablations & Analysis
Performance improvement on MMLU
Varying the number of
agents after optimization
Ablation w/o early stopping (es) or
team reformation (atr)
Department of Computer Science
Contributions & Discussion
Discussion
Contributions
Department of Computer Science
MAGIS:
LLM-Based Multi-Agent Framework For Github Issue Resolution [5]
Wei Tao
Yucheng Zhou
Yanlin Wang
Wenqiang Zhang
Hongyu Zhang
Yu Cheng
Presented at NeurIPS 2024
Department of Computer Science
MAS Application
Question: Can LLMs help solve issues better at scale and pass@1?
→ e.g LLMs are solving 2-4% of issues without agentic behaviors
Motivation: Solving real-world Github Issues is a challenging problem for LLMs
Why: LLMs struggle with�
Department of Computer Science
Empirical Finding on LLM Failures
Why do LLMs Fail on Github issues?
Model | # Files Corr. | # Functions Corr. |
GPT-4 | -25.15* | -25.15* |
MAGIS | -1.55* | -1.55* |
GPT4 solve 2% of issues on SWE-Bench compared to 67% on HumanEval (function-level code generation)
(RQ1) Why is the performance on Github issues limited?
(Claude-2 statistically significant positive relation)
Logistic Regression Coefficient
Coverage Ratio
Department of Computer Science
MAGIS Framework
Introduces 4 agent roles, mirroring real software teams:
Workflow: Planning → Coding → Review → Merge
MAGIS: A collaborative Multi-Agent Framework
Department of Computer Science
Experiments
MAGIS Significantly Improves Issue Resolution
Evaluation: SWE-Bench comprises 2294 real issues
(RQ2) Framework Effectiveness
Ablations
(RQ3) Planning Effectiveness
(RQ4) Coding Effectiveness
Evaluation: SWE-Bench comprises 2294 real issues
(RQ2) Framework Effectiveness
Ablations
(RQ3) Planning Effectiveness
(RQ4) Coding Effectiveness
Department of Computer Science
Contributions & Discussion
Discussion
Takeaways
Department of Computer Science
Coevolving with the Other You:
Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning [6]
Hao Ma
Tianyi Hu
Zhiqiang Pu
Boyin Liu
Xiaolin Ali
yanyan Liang
Min Chen
Presented at NeurIPS 2024
Department of Computer Science
Motivation
Why is RL fine-tuning hard for LLMs?
This leads to the need for stable and adaptive training paradigm → New training methods
Department of Computer Science
Background: Proximal Policy Optimization (PPO)
Definition: An RL Algorithm used to trained LLMs (typically on Human Feedback = RLHF)
Intuition: We want to avoid having too large of a policy update (using only 1st order optimization)
How it works
Key Contributions
Department of Computer Science
The CORY Framework
+ pioneer’s response
Cooperative Multi-Agent RL for LLMs: CORY
Principles
Department of Computer Science
Mechanisms in CORY
Role Exchange
Collective Reward
Knowledge Transfer
Department of Computer Science
CORY as a multi-objective RL
Tradeoff in traditional single-agent RL → Multi-objective RL
Pareto Frontier Perspective:
What is the hypothesis behind CORY’s surpassing single-agent RL?
Department of Computer Science
Experiments (1/2): Objective Task
Evaluation on GSM8k
Task: GSM8K
Metrics: Task reward, KL divergence, Combined reward, Pass@k
Model: Llama 7b-chat
Results
Takeaway: CORY generalizes better
Department of Computer Science
Experiments (2/2): Subjective Task
Task: IMDB sentiment completion (GPT2-Large)
Metrics: Task reward, KL divergence, Combined reward
Results
Department of Computer Science
Ablation study
Ablations (IMDM reviews):
Department of Computer Science
Contributions & Discussion
Discussion
Contributions
Department of Computer Science
About my PhD
Department of Computer Science
PhD Research so far
Agent Communication
and Cooperation
Understanding LLMs
Investigate the capabilities of Largage Language Models. How they gain them? Why they work? And how to improve them? (Focus on reasoning and robustness)
In the future, LLMs will accomplish multiple tasks communicating with other LLMs. How robust and reliable is it?
How can we enable better cooperation?
LLMs are paradigm change that enable a wide range of new applications. What are these applications?
AI Applications
Department of Computer Science
Massive Potential for Automated Society
https://princeton-nlp.github.io/language-agent-impact/
Step 1 – Automate Repetitive Digital Work: Agents can learn routine tasks but still lack human-level reliability.
Step 2 – Collaborate with Humans: Success in hybrid tasks needs strong communication and social skills.
Step 3 – Explore Creatively: Advanced tasks require self-driven exploration and innovation.
Department of Computer Science
Roadmap
MAE
Spring ‘25
Summer ‘25
Fall ‘25
Winter ‘26
Spring ‘26
Summer ‘26
AI4Science Project
Generating better synthesis recipes
Summer Internship
@ Morgan Stanley
Multi-Agent Learning
Research Visit
@ KAUST
Multi-Agent Systems Creation
Project on MAS Application
Last Internship
TBD
PhD
Theoretical Approaches
Department of Computer Science
Thank you
Q & A
Department of Computer Science
Department of Computer Science
References
[1] Du, Yilun, et al. "Improving factuality and reasoning in language models through multiagent debate." Forty-first International Conference on Machine Learning. 2023.
[2] Su, Yu, et al. "Language Agents: Foundations, Prospects, and Risks." Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 2024.
[3] Cemri, Mert, et al. "Why do multi-agent llm systems fail?." arXiv preprint arXiv:2503.13657 (2025).
[4] Liu, Zijun, et al. "A dynamic LLM-powered agent network for task-oriented agent collaboration." First Conference on Language Modeling. 2024, COLM 2024
�[5] Tao, Wei, et al. "Magis: Llm-based multi-agent framework for github issue resolution." Advances in Neural Information Processing Systems 37 (2024): 51963-51993.
[6] Coevolving with the other you: Fine-tuning llm with sequential cooperative multi-agent reinforcement learning." Advances in Neural Information Processing Systems 37 (2024): 15497-15525, NeurIPS 2024
Department of Computer Science