1 of 40

LLM-based Agents and Evaluation

Angana Borah

2nd year CSE PhD candidate, University of Michigan

Advised by: Dr. Rada Mihalcea

2 of 40

Bio

  • 2nd year PhD candidate
    • advised by Dr. Rada Mihalcea
    • University of Michigan Ann Arbor, USA
  • Research Interests:
    • Understanding LLM behavior
      • Taking inspiration from existing cognitive science and social psychology theories
    • Societal Implications of LLMs
      • Analyze societal issues bias, misinformation in LLMs and potential mitigation techniques
    • Agent LLMs (LLM-LLM and Human-LLM interaction)
    • Evaluation of NLP methods

3 of 40

Overview

  • Intro to LLM-based agents
  • Applications of LLM-based agents
  • LLM Agents Evaluation
  • Demo on AI Agents

4 of 40

Intro to LLM-based agents

5 of 40

Definition of an Agent

Philosophical definition - “agent” possesses desires, beliefs, intentions, and the ability to act - individual autonomy.

“Survival of the Fittest” : if an individual wants to survive in the external environment, one must adapt to the surroundings efficiently.

6 of 40

Definition of an AI Agent

  • An AI agent - concretization of the philosophical concept of an agent in the context of AI.
  • AI agents are artificial entities capable of perceiving surroundings, making decisions and taking actions in response.

Surroundings

Perceive

Make decisions

Actions

Respond with

7 of 40

Traditional AI agents vs LLM-based agents

Aspect

Traditional AI Agents

LLM-based Agents

Knowledge-source

Pre-programmed or domain-specific

Learned from diverse, large-scale data

Reasoning

Symbolic and Rule-based

Probabilistic and Implicit

Adaptability

Limited to pre-defined behaviors

Dynamic, adaptable to varied tasks

Interaction

Reactive (mostly mechanical)

Context-aware, human-like

Environment

Physical or simulated spaces

Textual or conversational (can be extended to physical or using tools)

8 of 40

LLM Agent

Why are LLMs suitable as agents?

Autonomy

Reactivity

Proactiveness

Social Ability

9 of 40

LLM Agent

Why are LLMs suitable as agents?

  • Autonomy:
    • Generate human-like text
    • Engage in conversations
    • Perform tasks without step-by-step instructions

10 of 40

LLM Agent

Why are LLMs suitable as agents?

  • Reactivity:
    • Respond to changing requests through text
    • Expand the perceptual space - using multimodal fusion techniques.
    • Expand action space - embodiment and tools

11 of 40

LLM Agent

Why are LLMs suitable as agents?

  • Proactiveness:
    • Goal oriented action by taking initiatives
    • Reasoning abilities
    • Goal reformulation

12 of 40

LLM Agent

Why are LLMs suitable as agents?

  • Social Ability: LLM agents can interact with other agents

13 of 40

LLM Agent

General framework - three key components:

  • Brain - central controller - memorizing, thinking, decision making
  • Perception - interpreting and analyzing sensory inputs from external environment
  • Action - executes tasks using text output, appropriate tools and embodied action

14 of 40

LLM Agents Evolution

15 of 40

LLM-based Agent Applications

16 of 40

Single Agent

17 of 40

Single Agent - Task oriented

  • Web scenario
    • Web navigation problem, web tasks such as filling out forms, online shopping and sending emails
    • Eg. Mind2Web, WebGum
  • Life scenario
    • For daily household chores - understanding implicit instructions and apply knowledge
    • Applying world knowledge embedded in training data for real world interaction.

18 of 40

Single Agent - Innovation oriented

  • Intellectually demanding field like cutting edge science
  • Two main limitations
    • Inherent complexity (domain specificity)
    • Lack of suitable training data in scientific domains (tools can help!)

19 of 40

Single Agent - Life Cycle oriented

  • Agents that can continuously explore, develop new skills and maintain a long term life cycle in an open, unknown world
  • Eg. Voyager (first LLM-based embodied lifelong learning agent in Minecraft, based on the long-term goal of “discovering as many diverse things as possible”)

20 of 40

Multi Agent

  • Single Agent - isolated entities
  • Society of Mind (Marvin Minsky): Theory of Intelligence - “Intelligence emerges from the interactions of many smaller agents with specific functions.

21 of 40

Multi Agent

  • Two types (typically): (1) Cooperative Interaction for Complementarity and (2) Adversarial Interaction for Advancement

22 of 40

Multi Agent - Cooperative Engagement

  • Synergistic Complementarity
  • Ordered: adhere to specific rules - sequential manner. Eg. CAMEL (dual agent role playing system)
  • Disordered: each agent is free to express their perspectives and opinions openly. Uncontrolled and no sequence. Eg. ChatLLM network (a neural network with agents taking on each node)

23 of 40

Example - Cooperative Engagement for Culture-Aware Image Captioning

24 of 40

Multi Agent - Adversarial Interaction

  • Motivation: fostering change among agents can naturally occur through competition, argumentation, and debate
  • Competitive environment
  • Dynamically change strategies
  • Most advantageous or rational actions in response to other agents.
  • Eg. AlphaGo Zero

25 of 40

Example - Adversarial Interaction to reduce the spread of Misinformation

Motivation

  • Misinformation perception and it’s spread may vary for demographic groups - Echo chamber.
  • Simulating demographic LLMs to understand these effects.

26 of 40

Example - Adversarial Interaction to reduce the spread of Misinformation

Hypothesis

  • Homogeneous groups increase the spread of misinformation

27 of 40

Multi Agent - Challenges

Computational overhead

Limited context in LLMs

Incorrect consensus

28 of 40

Human-Agent Interaction

29 of 40

Evaluation of LLM-based Agents

What are the importance evaluation dimensions to consider for evaluation?

30 of 40

Evaluation of LLM-based Agents

Four dimensions

Utility

Sociability

Values

Continually evolving

31 of 40

Evaluation of LLM-based Agents

  • Success rates of task completion
  • Outcomes on various foundational capabilities
  • AgentBench: aggregates challenges from diverse real-world scenarios and introduces a systematic benchmark to assess LLM’s task completion capabilities

Utility

32 of 40

Evaluation of LLM-based Agents

AgentBench: aggregates challenges from diverse real-world scenarios and introduces a systematic benchmark to assess LLM’s task completion capabilities

Utility

33 of 40

Evaluation of LLM-based Agents

  • Language communication proficiency
  • Cooperation and negotiation abilities
  • Role-playing capability

Sociability

34 of 40

Evaluation of LLM-based Agents

  • 3Hs: Harmlessness, Helpfulness and Honesty
  • Align with human societal values
  • Capable of adapting to specific demographics, cultures and contexts.

Values

35 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

Motivation:

Implicit Biases:

  • Under-researched; most studies focus on explicit biases in text generation.
  • Implicit biases often emerge in actions or tasks, not in text outputs.

Multi-agent LLM interactions:

  • Used to simulate societal dynamics and solve coordination tasks using collective intelligence.
  • These systems often exhibit emergent social behaviors.
  • Multi-agent interactions reveal implicit biases through real-time actions, not just statements.

36 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

Examples of implicit biases

  • Associating certain genders with certain occupations
  • Males are often associated with more leadership, technical and physically strenuous roles
  • Females are associated with organizational, creative and family-oriented roles

37 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

38 of 40

Evaluation of LLM-based Agents

  • As an agent autonomously evolve over time, resources required and continuous human intervention could be reduced.
  • However, it is important to note how much autonomy we provide an agent - ethical and social concerns.

Continually evolving

39 of 40

Resources

Frameworks and Libraries

Ready-to-use Agents

NLP Tools

40 of 40

Thank you!