1 of 74

Generative Agents

11.06.23

2 of 74

Setup - Reality show

Scientific Peer Reviewer

  • 25 LLMs (ChatGPT) and 1 central LLM
  • The criteria they optimize for is believability
  • assigned an initial persona prompt along with a memory (db)
  • interactions with objects (observations), other agents (conversations), reflections and plans

Moreover, emoticons are used to show the state of an agent

2

3 of 74

Observations

Scientific Peer Reviewer

The limit for what an agent can observe is based on a visibility threshold encoded in the sandbox environment

Observations include actions that

  • the agents themselves are performing
  • actions they note other agents performing

3

4 of 74

Retrieval

Scientific Peer Reviewer

  • Recency - answers what is recently retrieved; exponentially decayed with a factor of 0.995
  • Relevance (or cosine similarity) to the current event
  • Importance - central LLM gauges importance of observations

4

5 of 74

Reflections

Scientific Peer Reviewer

  • When? - cumulative events of importance greater than threshold have occurred
  • How? - central LLM is passed memory asked to generate high-level questions

[Reflection Tag] Then the agent answers these questions using retrieval on their memory

5

6 of 74

Planning - coherent actions

Scientific Peer Reviewer

  • Starting time, duration and location
    • (Before Task) Starting time and duration: Retrieve previous day interactions and memories to generate a plan
      • (Location) Tree of recent and relevant places to decide location
    • (During Task) Hour level and every 5 minute level activities: after they start performing the task - Why?

  • Because Conversations & Updating Plans:
    • agent is in the vicinity of another agent
    • prompted to retrieve their relationship with the other agent
    • decide to update plans

6

7 of 74

Movement

Scientific Peer Reviewer

  • Based on planning
  • Vicinity of other LLMs while performing tasks

7

8 of 74

Metrics - Believability

Scientific Peer Reviewer

  • 5 areas - self-knowledge, memory, plans, reactions (ex: your breakfast is burning, what do you do - isn’t this is a property of the LLM), reflections
  • compare ablations and human annotators assuming characters (after inspecting the memory stream)
  • another set of evaluators rank responses to questions at the end of the simulation

8

9 of 74

Emergent Behaviors

Scientific Peer Reviewer

Gauge emergent behaviors such as information diffusion: network density

  • 2*|E|/|V| (|V| - 1) {how many edges are in set compared to the total possible combination of edges}

9

10 of 74

Strengths

Scientific Peer Reviewer

  • The setup: large number of moving components
    • Why agents move? (Planning, Conversations & Update Planning)
    • Storing reflections
    • How to utilize the environment for planning? - parse tree

10

11 of 74

Weaknesses (by authors)

Scientific Peer Reviewer

  • (Memory Constraints): As memory increased the agents started using less typical spaces for planning
  • (Retrieval): Isabella conforms with suggestions for her v-day party because retrieval module works on recency
    • Authors call this “politeness”

11

12 of 74

Weaknesses (by me)

Scientific Peer Reviewer

  • Reproducibility: alternative collaborate on a task such as mathematical theorem solving, with deterministic outcomes
  • Collaborative work: is only an emergent behavior and not explicitly tested

  • (Self-contradicting explanations):
  • an agent has no knowledge of another agent ‘Rajeev’ despite several interactions v/s
  • retrieve encoded knowledge, assigning attributes of a frequent name in the training corpus to an agent with the same name

12

13 of 74

Rating Time!

Scientific Peer Reviewer

Strength

  • Introduction of reflection and planning kind of memories

Threat

  • Discuss no ways to handle long term memories (simulation is only 2 days gametime)

Weakness

  • Reproducibility

Opportunity

  • Collaborative work (e.g. Mathematical theorem solving)

13

14 of 74

Rating Time!

Scientific Peer Reviewer

Strength

  • Introduction of reflection and planning kind of memories

Threat

  • Discuss no ways to handle long term memories

Weakness

  • Reproducibility

Opportunity

  • Collaborative work (e.g. Mathematical theorem solving)

Final Rating: 2/5

14

15 of 74

Historian

Interactive Agents

15

16 of 74

Historian

Interactive Agents

16

17 of 74

Historian

Interactive Agents

Building believable proxies of human behavior

17

18 of 74

Historian

What are Believable Agents?

18

19 of 74

Historian

Chess

19

20 of 74

Historian

DOTA 2

20

21 of 74

Historian

DOTA 2

21

22 of 74

Historian

Simulated Worlds

22

23 of 74

Historian

Simulated Worlds

Breakup ← relationship(dating,x,y)

23

24 of 74

Historian

Real World

24

25 of 74

Historian

Real World

25

26 of 74

Historian

Takeaways

  • Different worlds and believable agents in those worlds
    • Chess
    • DOTA 2
    • Simulation
    • Real world
  • Question:

How can we leverage LLMs to create believable agents in different worlds?

26

27 of 74

Historian

References

  • Park, Joon Sung, et al. "Generative agents: Interactive simulacra of human behavior." arXiv preprint arXiv:2304.03442 (2023).
  • Laird, J., & VanLent, M. (2001). Human-Level AI’s Killer Application: Interactive Computer Games. AI Magazine, 22(2), 15. https://doi.org/10.1609/aimag.v22i2.1558
  • https://aiimpacts.org/historic-trends-in-chess-ai/
  • https://hxim.github.io/Stockfish-Evaluation-Guide/index.html?p=2r5/5RR1/1bPk4/1P5p/6b1/3B4/2K5/8%20b%20-%20-%206%2068
  • https://saumikn.com/blog/a-brief-guide-to-stockfish-nnue/
  • Isbister, K., & Nass, C. (2000). Consistency of personality in interactive characters: Verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies, 53(2), 251–267. https://doi.org/10.1006/ijhc.2000.0368
  • https://gamerant.com/the-witcher-3-best-characters/#ciri
  • McCoy, J.A. 2012, All the world's a stage: A playable model of social interaction inspired by dramaturgical analysis, University of California, Santa Cruz.
  • Rodney A. Brooks, Cynthia Breazeal, Matthew Marjanović, Brian Scassellati, and Matthew M. Williamson. 1999. The cog project: building a humanoid robot. Computation for metaphors, analogy, and agents. Springer-Verlag, Berlin, Heidelberg, 52–87.
  • Hanson, David & Imran, Alishba & Morales, Gerardo & Krisciunas, Vytas & Sagi, Aditya & Malali, Aman & Mohbe, Rushali & Upadrashta, Raviteja. (2022). Open Arms: Open-Source Arms, Hands & Control. 10.48550/arXiv.2205.12992.
  • https://en.wikipedia.org/wiki/File:Einstein-Hubo.jpg

27

28 of 74

Agents

Data Analyst

  • How are memories �stored?��
  • Mostly Json Files

28

29 of 74

Agent Memories (Per simulation day)

Data Analyst

29

30 of 74

Agent Observations (Per simulation day)

Event

Reflection

Data Analyst

30

31 of 74

Agent Observations (Per simulation day)

Chat

Data Analyst

31

32 of 74

Agent Movement (Per simulation timestep)

Data Analyst

32

33 of 74

Initial Memory

Data Analyst

33

34 of 74

Environment

Data Analyst

  • Root node: Entire world
  • Children nodes: Locations (e.g., houses, cafe, stores)
  • Leaf nodes: Objects (e.g., table, bookshelf).

34

35 of 74

Agent Spatial Memories (Per simulation day)

Data Analyst

  • Subgraph of the entire environment graph

35

36 of 74

Evaluation

Data Analyst

  • Controlled Evaluation (Individual-level)�
  • End-to-end Evaluation (Group-level)

36

37 of 74

Controlled Evaluation

Data Analyst

  • Interview agents based on
    • Self-Knowledge
    • Memory Retrieval
    • Plans
    • Reactions
    • Reflections

37

38 of 74

Controlled Evaluation

Data Analyst

  • “Believability” of responses�
  • Comparison between 4 agent-architectures and human-authored response�
  • 100 sets of rank data => TrueSkill Rank Rating

38

39 of 74

Elo Rank Rating

K-factor: Constant for maximum change, Score SA: Actual score {1, 0.5, 0}, EA: Expected score (Probability of A winning)

Data Analyst

  • Calculates relative rating of �two player zero-sum games �(like chess)�
  • Performance inferred from�Win (1)/draw (0.5)/lose(0)�

RA = 𝛍A�𝝈B = 𝝈A

39

40 of 74

Elo Rank Rating

Data Analyst

  • Only works for 2 player games�
  • K-factor adjustment�
  • New players take long too converge to their correct skill rating�
  • Initial rating choice

40

41 of 74

TrueSkill Rank Rating

Data Analyst

  • Degree of uncertainty (𝝈)�
  • New players have large 𝝈 while �experienced players have small 𝝈�
  • TrueSkill Rating = 𝛍 - 3𝝈�
  • Initial TrueSkill Rating = 0

41

42 of 74

TrueSkill Rank Rating - Simple 1v1 case

Data Analyst

42

43 of 74

TrueSkill Rank Rating - Simple 1v1 case

Data Analyst

43

44 of 74

TrueSkill Rank Rating

Data Analyst

44

45 of 74

TrueSkill Rank Rating - Update Rules

Approximate Message Passing

Factor Graphs

Data Analyst

Proprietary details

45

46 of 74

TrueSkill Rank Rating - Simple 1v1 case

β2 : Variance of Performance around skill of each player�ε: Draw margin

Data Analyst

Expected Win

Expected Draw

46

47 of 74

Controlled Evaluation

Data Analyst

Scores

  • Games ⇔ Human Evaluations�

47

48 of 74

Controlled Evaluation

Data Analyst

𝜇 = 29.89; 𝜎 = 0.72

𝜇 = 26.88; 𝜎 = 0.69

𝜇 = 25.64; 𝜎 = 0.68

𝜇 = 22.95; 𝜎 = 0.69

𝜇 = 21.21; 𝜎 = 0.70

48

49 of 74

End-to-End Evaluation: Emergent Social Behaviours

  • Experiments conducted over a two day simulation of the environment with 25 agents�
  • Relationship formation, Information Diffusion and Agent Coordination������������
  • Agent Coordination seen through Maria, a friend of Isabella, helping her set up the party

Network

Density (Relationships)

Information Diffusion (% population)

Network

Density (Relationships)

Information Diffusion �(% population)

0.164 (52 connections)

Sam’s Mayoral Candidacy : 1 (4%)��Isabella’s party : 1 (4%)��

0.74 (222 connections)

Sam’s Mayoral Candidacy : 8 (32%)��Isabella’s party : 13 (52%)

Initial State

End State

50 of 74

Hacker

50

51 of 74

Hacker

51

52 of 74

Main Modifications:

  • Injected memory of our class & this paper
  • Make them work 24/7
  • Lock them into library to force them working
  • Changed model from text-davinci-003 to GPT3.5 for 10 times reduced price

Code:

https://github.com/user074/generative_agents

Hacker

52

53 of 74

Generative Agents: Interactive Simulacra of Human Behavior

Academic Researcher

  • Introduced "Generative Agents" that simulate human-like behaviors.
  • Agents showcase the ability to mimic human behavior in tasks like negotiation, gaming, and simple day-to-day interactions.

53

54 of 74

Generative Agents: Interactive Simulacra of Human Behavior

Academic Researcher

  • Introduced "Generative Agents" that simulate human-like behaviors.
  • Agents showcase the ability to mimic human behavior in tasks like negotiation, gaming, and simple day-to-day interactions.
  • Introducing a new generative robotic agent among human-like agents to have a deeper understanding of human-robot interactions.

54

55 of 74

Generative Agents for Human-Robot Interaction Simulations

Academic Researcher

GOAL: Understanding human-robot interactions to ensure robots are in-tune with human emotions, enhance user satisfaction, and integrate into society as trusted collaborators.

55

56 of 74

Content

Academic Researcher

  • Introduction
  • Components
    • Agents
    • Interactions
    • Simulator
  • Challenges and Ideas
  • Conclusion

56

57 of 74

Components

Academic Researcher

57

58 of 74

Generative Robotic Agent

Academic Researcher

Each agent’s identity, including occupation and human-robot interactions, is described in a one-paragraph natural language seed memory.

58

59 of 74

Agent Behavior and Interaction

Academic Researcher

59

60 of 74

Simulation Environment

Academic Researcher

60

61 of 74

Content

Academic Researcher

  • Introduction
  • Components
    • Agents
    • Interactions
    • Simulator
  • Challenges and Ideas
  • Conclusion

61

62 of 74

Learning Agent Modeling

Academic Researcher

CHALLENGES:

  • "Sim-to-Real gap": behaviors learned in simulation may not transfer seamlessly to the real world. For instance, precisely modeling dynamical systems is challenging due to variable factors like weight changes, friction, and unpredictable external forces.

62

63 of 74

Learning Agent Modeling

Academic Researcher

CHALLENGES:

  • "Sim-to-Real gap": behaviors learned in simulation may not transfer seamlessly to the real world. For instance, precisely modeling dynamical systems is challenging due to variable factors like weight changes, friction, and unpredictable external forces.

IDEAS:

  • Continuous and lifelong learning: developing models that allow robots to adapt and learn throughout their operational life, without forgetting previous lessons.

63

64 of 74

Learning Agent Control

Academic Researcher

CHALLENGES:

  • Learning robust robot control policies from sensory data requires strong generalization across diverse and unpredictable environments.
  • Simulators struggle to replicate the diverse and unpredictable variability of real-world environments that robots may not encounter during training.

64

65 of 74

Learning Agent Control

Academic Researcher

CHALLENGES:

  • Learning robust robot control policies from sensory data requires strong generalization across diverse and unpredictable environments.
  • Simulators struggle to replicate the diverse and unpredictable variability of real-world environments that robots may not encounter during training.

IDEAS:

  • Leverage NeRF (Neural Radiance Fields) or 3D Gaussian splatting to create highly realistic, data-driven scene reconstructions, bridging the visual gap in simulations.

65

66 of 74

Learning Agent Interaction

Academic Researcher

CHALLENGES:

  • Capturing the nuances of human behavior, which is influenced by emotions, past experiences, cultural backgrounds, and current physiological states.
  • Developing control policies that integrate emotional/physiological states.

66

67 of 74

Learning Agent Interaction

Academic Researcher

CHALLENGES:

  • Capturing the nuances of human behavior, which is influenced by emotions, past experiences, cultural backgrounds, and current physiological states.
  • Developing control policies that integrate emotional/physiological states.

IDEAS:

  • Design reinforcement learning rewards based on task completion and the emotional satisfaction of the human agent.
  • Meta-learning: train robots to learn how to learn such that they quickly adapt.

67

68 of 74

Conclusion

Academic Researcher

Integrating generative human and robotic agents allows:

  • Enhanced human-robot interactions, with more user-friendly and adaptive robots.
  • Accelerated training of robots without the need for extensive real-world testing.
  • A safer and more controlled environment to test robot prototypes.

68

69 of 74

Problems in Video Games

Industry Practitioner / Entrepreneur

Programmed conversation

Illogical reaction

Homogeneous characters

69

70 of 74

Why?

Industry Practitioner / Entrepreneur

  • NPCs need to be programmed
  • Programming is done by engineers
  • Script is needed for reaction
  • Script is written by creators

Image by DALL·E 3

70

71 of 74

Our solution: Generative Agents!

Industry Practitioner / Entrepreneur

Image from RPG Maker

High-level setup/background

Description generation

Agent generation

71

72 of 74

Our product

Industry Practitioner / Entrepreneur

Image by DALL·E 3

72

73 of 74

Our vision

Industry Practitioner / Entrepreneur

Image by DALL·E 3

Our collaborators

73

74 of 74

Safety and security

Industry Practitioner / Entrepreneur

  • Value-aligned agents
  • Maintaining an audition
  • Investment in Ethics and AI-alignment related research

74