1 of 40

LLM-based Agents and Evaluation

Angana Borah

2nd year CSE PhD candidate, University of Michigan

Advised by: Dr. Rada Mihalcea

2 of 40

Bio

2nd year PhD candidate

advised by Dr. Rada Mihalcea
University of Michigan Ann Arbor, USA

Research Interests:

Understanding LLM behavior

Taking inspiration from existing cognitive science and social psychology theories

Societal Implications of LLMs

Analyze societal issues bias, misinformation in LLMs and potential mitigation techniques

Agent LLMs (LLM-LLM and Human-LLM interaction)
Evaluation of NLP methods

3 of 40

Overview

Intro to LLM-based agents
Applications of LLM-based agents
LLM Agents Evaluation
Demo on AI Agents

4 of 40

Intro to LLM-based agents

5 of 40

Definition of an Agent

Philosophical definition - “agent” possesses desires, beliefs, intentions, and the ability to act - individual autonomy.

“Survival of the Fittest” : if an individual wants to survive in the external environment, one must adapt to the surroundings efficiently.

6 of 40

Definition of an AI Agent

An AI agent - concretization of the philosophical concept of an agent in the context of AI.
AI agents are artificial entities capable of perceiving surroundings, making decisions and taking actions in response.

Surroundings

Perceive

Make decisions

Actions

Respond with

7 of 40

Traditional AI agents vs LLM-based agents

Aspect	Traditional AI Agents	LLM-based Agents
Knowledge-source	Pre-programmed or domain-specific	Learned from diverse, large-scale data
Reasoning	Symbolic and Rule-based	Probabilistic and Implicit
Adaptability	Limited to pre-defined behaviors	Dynamic, adaptable to varied tasks
Interaction	Reactive (mostly mechanical)	Context-aware, human-like
Environment	Physical or simulated spaces	Textual or conversational (can be extended to physical or using tools)

8 of 40

LLM Agent

Why are LLMs suitable as agents?

Autonomy

Reactivity

Proactiveness

Social Ability

9 of 40

LLM Agent

Why are LLMs suitable as agents?

Autonomy:

Generate human-like text
Engage in conversations
Perform tasks without step-by-step instructions

10 of 40

LLM Agent

Why are LLMs suitable as agents?

Reactivity:

Respond to changing requests through text
Expand the perceptual space - using multimodal fusion techniques.
Expand action space - embodiment and tools

11 of 40

LLM Agent

Why are LLMs suitable as agents?

Proactiveness:

Goal oriented action by taking initiatives
Reasoning abilities
Goal reformulation

12 of 40

LLM Agent

Why are LLMs suitable as agents?

Social Ability: LLM agents can interact with other agents

13 of 40

LLM Agent

General framework - three key components:

Brain - central controller - memorizing, thinking, decision making
Perception - interpreting and analyzing sensory inputs from external environment
Action - executes tasks using text output, appropriate tools and embodied action

The Rise and Potential of Large Language Model Based Agents: A Survey

14 of 40

LLM Agents Evolution

A Survey on Large Language Model based Autonomous Agents

15 of 40

LLM-based Agent Applications

The Rise and Potential of Large Language Model Based Agents: A Survey

16 of 40

Single Agent

The Rise and Potential of Large Language Model Based Agents: A Survey

17 of 40

Single Agent - Task oriented

Web scenario

Web navigation problem, web tasks such as filling out forms, online shopping and sending emails
Eg. Mind2Web, WebGum

Life scenario

For daily household chores - understanding implicit instructions and apply knowledge
Applying world knowledge embedded in training data for real world interaction.

18 of 40

Single Agent - Innovation oriented

Intellectually demanding field like cutting edge science
Two main limitations

Inherent complexity (domain specificity)
Lack of suitable training data in scientific domains (tools can help!)

19 of 40

Single Agent - Life Cycle oriented

Agents that can continuously explore, develop new skills and maintain a long term life cycle in an open, unknown world
Eg. Voyager (first LLM-based embodied lifelong learning agent in Minecraft, based on the long-term goal of “discovering as many diverse things as possible”)

https://voyager.minedojo.org/

20 of 40

Multi Agent

Single Agent - isolated entities
Society of Mind (Marvin Minsky): Theory of Intelligence - “Intelligence emerges from the interactions of many smaller agents with specific functions.”

21 of 40

Multi Agent

Two types (typically): (1) Cooperative Interaction for Complementarity and (2) Adversarial Interaction for Advancement

22 of 40

Multi Agent - Cooperative Engagement

Synergistic Complementarity
Ordered: adhere to specific rules - sequential manner. Eg. CAMEL (dual agent role playing system)
Disordered: each agent is free to express their perspectives and opinions openly. Uncontrolled and no sequence. Eg. ChatLLM network (a neural network with agents taking on each node)

23 of 40

Example - Cooperative Engagement for Culture-Aware Image Captioning

The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning

24 of 40

Multi Agent - Adversarial Interaction

Motivation: fostering change among agents can naturally occur through competition, argumentation, and debate
Competitive environment
Dynamically change strategies
Most advantageous or rational actions in response to other agents.
Eg. AlphaGo Zero

25 of 40

Example - Adversarial Interaction to reduce the spread of Misinformation

Motivation

Misinformation perception and it’s spread may vary for demographic groups - Echo chamber.
Simulating demographic LLMs to understand these effects.

26 of 40

Example - Adversarial Interaction to reduce the spread of Misinformation

Hypothesis

Homogeneous groups increase the spread of misinformation

Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions

27 of 40

Multi Agent - Challenges

Computational overhead

Limited context in LLMs

Incorrect consensus

28 of 40

Human-Agent Interaction

The Rise and Potential of Large Language Model Based Agents: A Survey

29 of 40

Evaluation of LLM-based Agents

What are the importance evaluation dimensions to consider for evaluation?

30 of 40

Evaluation of LLM-based Agents

Four dimensions

Utility

Sociability

Values

Continually evolving

31 of 40

Evaluation of LLM-based Agents

Success rates of task completion
Outcomes on various foundational capabilities
AgentBench: aggregates challenges from diverse real-world scenarios and introduces a systematic benchmark to assess LLM’s task completion capabilities

Utility

32 of 40

Evaluation of LLM-based Agents

AgentBench: aggregates challenges from diverse real-world scenarios and introduces a systematic benchmark to assess LLM’s task completion capabilities

Utility

33 of 40

Evaluation of LLM-based Agents

Language communication proficiency
Cooperation and negotiation abilities
Role-playing capability

Sociability

34 of 40

Evaluation of LLM-based Agents

3Hs: Harmlessness, Helpfulness and Honesty
Align with human societal values
Capable of adapting to specific demographics, cultures and contexts.

Values

35 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

Motivation:

Implicit Biases:

Under-researched; most studies focus on explicit biases in text generation.
Implicit biases often emerge in actions or tasks, not in text outputs.

Multi-agent LLM interactions:

Used to simulate societal dynamics and solve coordination tasks using collective intelligence.
These systems often exhibit emergent social behaviors.
Multi-agent interactions reveal implicit biases through real-time actions, not just statements.

36 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

Examples of implicit biases

Associating certain genders with certain occupations
Males are often associated with more leadership, technical and physically strenuous roles
Females are associated with organizational, creative and family-oriented roles

37 of 40

Example - Evaluating Multi Agent LLMs for Implicit Bias

38 of 40

Evaluation of LLM-based Agents

As an agent autonomously evolve over time, resources required and continuous human intervention could be reduced.
However, it is important to note how much autonomy we provide an agent - ethical and social concerns.

Continually evolving

39 of 40

Resources

Frameworks and Libraries

Ready-to-use Agents

NLP Tools

LangChain:�A powerful framework for building LLM-based applications with memory, agents, and chains.

Hugging Face Transformers:�Widely used library for training and deploying transformer-based models.

GitHub Repository
Documentation

LlamaIndex (formerly GPT Index):�Framework for augmenting LLMs with long-term memory and external knowledge.

AgentGPT:�A project to build autonomous AI agents capable of task planning and execution.

GitHub Repository
Live Demo

2. Ready-to-Use Agents

ChatGPT Plugins and OpenAI Function Calling:�Prebuilt functionalities to develop agents capable of complex reasoning.

OpenAI Function Calling Guide

AutoGPT:�An experimental open-source application for autonomous agents powered by GPT models.

GitHub Repository

BabyAGI:�A minimalistic framework for building task-driven LLM agents.

GitHub Repository

Rasa:�Open-source framework for building conversational agents (chatbots).

GitHub Repository
Documentation

AllenNLP:�NLP research library with prebuilt modules for various tasks, including agent interaction.

GitHub Repository

3. NLP Tools for Agent Building

Knowledge Integration

Haystack:�Framework for building end-to-end NLP pipelines, including question-answering systems.

GitHub Repository

FAISS:�Vector search library to add retrieval capabilities to agents.

GitHub Repository

Dialog and Interaction

DialoGPT:�A conversational model built for generating engaging dialogues.

GitHub Repository

ConvLab-2:�A toolkit for building, training, and evaluating dialogue systems.

GitHub Repository

40 of 40

Thank you!