LLM-based Agents and Evaluation
Angana Borah
2nd year CSE PhD candidate, University of Michigan
Advised by: Dr. Rada Mihalcea
Bio
Overview
Intro to LLM-based agents
Definition of an Agent
Philosophical definition - “agent” possesses desires, beliefs, intentions, and the ability to act - individual autonomy.
“Survival of the Fittest” : if an individual wants to survive in the external environment, one must adapt to the surroundings efficiently.
Definition of an AI Agent
Surroundings
Perceive
Make decisions
Actions
Respond with
Traditional AI agents vs LLM-based agents
Aspect | Traditional AI Agents | LLM-based Agents |
Knowledge-source | Pre-programmed or domain-specific | Learned from diverse, large-scale data |
Reasoning | Symbolic and Rule-based | Probabilistic and Implicit |
Adaptability | Limited to pre-defined behaviors | Dynamic, adaptable to varied tasks |
Interaction | Reactive (mostly mechanical) | Context-aware, human-like |
Environment | Physical or simulated spaces | Textual or conversational (can be extended to physical or using tools) |
LLM Agent
Why are LLMs suitable as agents?
Autonomy
Reactivity
Proactiveness
Social Ability
LLM Agent
Why are LLMs suitable as agents?
LLM Agent
Why are LLMs suitable as agents?
LLM Agent
Why are LLMs suitable as agents?
LLM Agent
Why are LLMs suitable as agents?
LLM Agent
General framework - three key components:
LLM Agents Evolution
LLM-based Agent Applications
Single Agent
Single Agent - Task oriented
Single Agent - Innovation oriented
Single Agent - Life Cycle oriented
Multi Agent
Multi Agent
Multi Agent - Cooperative Engagement
Example - Cooperative Engagement for Culture-Aware Image Captioning
Multi Agent - Adversarial Interaction
Example - Adversarial Interaction to reduce the spread of Misinformation
Motivation
Example - Adversarial Interaction to reduce the spread of Misinformation
Hypothesis
Multi Agent - Challenges
Computational overhead
Limited context in LLMs
Incorrect consensus
Human-Agent Interaction
Evaluation of LLM-based Agents
What are the importance evaluation dimensions to consider for evaluation?
Evaluation of LLM-based Agents
Four dimensions
Utility
Sociability
Values
Continually evolving
Evaluation of LLM-based Agents
Utility
Evaluation of LLM-based Agents
AgentBench: aggregates challenges from diverse real-world scenarios and introduces a systematic benchmark to assess LLM’s task completion capabilities
Utility
Evaluation of LLM-based Agents
Sociability
Evaluation of LLM-based Agents
Values
Example - Evaluating Multi Agent LLMs for Implicit Bias
Motivation:
Implicit Biases:
Multi-agent LLM interactions:
Example - Evaluating Multi Agent LLMs for Implicit Bias
Examples of implicit biases
Example - Evaluating Multi Agent LLMs for Implicit Bias
Evaluation of LLM-based Agents
Continually evolving
Resources
Frameworks and Libraries
Ready-to-use Agents
NLP Tools
Thank you!