1 of 75

Large Language Models as Economic Agents:

What Can We Learn from Homo Silicus?

John Horton, Apostolos Filippas & Ben Manning

2 of 75

Idea of Homo Economicus

  • Homo Economicus: A maintained model of human behavior
    • Rationally pursues objectives
    • Unlimited memory and computation
  • Theory research: Putting Homo Economicus in exciting new scenarios
    • As worker or employer (Labor Economics)
    • As consumer (Consumer theory)
    • As investor/trader (Finance)
    • As government / tay payer (Public finance / public economics)
    • and so on, with a particular focus on lots of these agents creating an equilibrium
  • Empirical research: How does Homo Sapiens compare?

3 of 75

What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?

4 of 75

What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?

5 of 75

What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?

6 of 75

What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?

TRUE

7 of 75

Idea of Homo Economicus

  • Homo Economicus: A maintained computational model of human behavior
    • Rationally pursues objectives Does whatever the model predicts is statistically probable
    • Unlimited memory and computation
  • Theory research: Putting Homo Silicus in exciting new scenarios
    • As worker or employer (Labor Economics)
    • As consumer (Consumer theory)
    • As investor/trader (Finance)
    • As government / tay payer (Public finance / public economics)
    • and so on
  • Empirical research: How does Homo Sapiens compare?

Silicus

Computer chips →

made from Silicon

8 of 75

Aren't these just Agent Based Models (ABMs)?

  • There are similarities, but the enormous difference is that we do not get to program Homo Silicus, whereas with ABMs the researcher gets to program behavior
  • They can interact with each other in natural language

9 of 75

Agenda for talk

  • Present results from a series of Homo Economicus experiments I conducted, drawn from classics in behavioral economics
    • A fairness experiment
    • A social preferences experiment
    • A framing experiment
  • Discuss some potential objections and limitations of this approach
  • Future research
  • Some replications using newer models & some open source tools

10 of 75

What's the collective intelligence

angle on all this?

Claim: "Rich" and socially realistic

agents is foundational for some more complex things we might want to do.

11 of 75

A fairness experiment

12 of 75

13 of 75

14 of 75

15 of 75

16 of 75

Sending the scenario as a prompt to GPT Agent via API

17 of 75

Factors I can vary

(a Python function to generate 'prompts')

18 of 75

I can alter the framing of the change: "raises" versus "changes"

19 of 75

I can alter the new price

for the snow shovel

20 of 75

I can alter the "politics" of the GPT3 agent (Liberal, conservative, etc.)

21 of 75

Increase price to $20

(part of the original experiment)

22 of 75

Other scenarios: $16, $40 and $100

23 of 75

Political orientations

(not part of the original experiment)

24 of 75

Judgements:

"Acceptable", "Unfair" & "Very Unfair"

25 of 75

The GPT-3 Libertarian finds a small ($15 to $16) price increase "Acceptable" and the raises/changes language doesn't matter.

26 of 75

But even Robot Libertarians has their limitations: Price increases to $40 and $100 per shovel are rated 'Unfair"

27 of 75

Now prompt with a

different political orientation

28 of 75

By comparison, Robot Socialist / Leftists regard all price changes as "Unfair" or "Very Unfair" with judgement getting more unfavorable in the size of the price increase

29 of 75

Interesting difference between "Conservatives" and "Libertarians" - could be the semantics of "conservative" or perhaps a real political distinction

30 of 75

Now with newer models

31 of 75

GPT-4

32 of 75

"Derived" examples by generating similar scenarios to probe robustness

  1. A grocery store has been selling apples for $15. The morning after a bumper apple harvest, the store {{store_action}} ${{new_price}}. You are a {{politics}}....
  2. A bookstore has been selling mystery novels for $15. The morning after a popular author releases a new book, the store {{store_action}} ${{new_price}}...
  3. A toy store has been selling stuffed animals for $15. The morning after a viral video featuring a popular stuffed animal goes viral, the store {{store_action}} ${{new_price}}....
  4. A bakery has been selling cakes for $15. The morning after winning a prestigious baking competition, the bakery {{store_action}} ${{new_price}}....
  5. A pet store has been selling dog toys for $15. The morning after a new dog training technique becomes popular, the store {{store_action}} ${{new_price}}....
  6. A clothing store has been selling winter coats for $15. The morning after a fashion influencer wears one of the coats in a magazine photoshoot, the store {{store_action}} ${{new_price}}...
  7. A electronics store has been selling headphones for $15. The morning after a popular musician endorses the headphones on social media, the store {{store_action}} ${{new_price}}....
  8. A garden center has been selling potted plants for $15. The morning after a gardening influencer features the plants in a YouTube video, the store {{store_action}} ${{new_price}}....
  9. A jewelry store has been selling earrings for $15. The morning after a celebrity is spotted wearing the same earrings on the red carpet, the store {{store_action}} ${{new_price}}...
  10. A sports store has been selling basketballs for $15. The morning after a famous basketball player breaks a record using the same basketball, the store {{store_action}} ${{new_price}}....

33 of 75

A social preferences

experiment

34 of 75

35 of 75

How humans play

(Subjects from Berkley & Barcelona)

36 of 75

"Left": 400 to Person A, 400 to Person B

"Right": 750 to Person A, 400 to Person B

In this case, at no cost themselves, Player B can get player A an extra 250.

37 of 75

"Left"

"Right"

69%

Less than a third of human players are highly "inequity averse" in the original experiments.

38 of 75

"Left"

"Right"

But 80% are willing to give other player 0 to get 800 for themselves instead of 400

39 of 75

"Left"

"Right"

No one was willing to forgo 200 just to keep someone else from getting 800

40 of 75

With GPT3

agents (original paper)

41 of 75

42 of 75

43 of 75

Endowing agents with social preferences, or "personalities"

44 of 75

45 of 75

46 of 75

Choosing which

model to use

47 of 75

We can vary the model used to run

the scenario

48 of 75

"Fairness"

personalty

"Efficiency"

personality

"Selfish" personality

"Blank"

personalty

49 of 75

For the experimental subjects, people just have their beliefs/preferences - no 'personalities'

50 of 75

Most advanced GPT3 model

Less advanced GPT3 models

(pooled)

51 of 75

Let's look at the

simpler GPT3 models

52 of 75

The models play "Left" in all scenarios---there is no meaningful variation based on personality

53 of 75

Most advanced model

54 of 75

"Fairness" persona

Always chooses the least Person A vs. Person B gap except the (800,200) vs. (0, 0) case.

55 of 75

"Blank" persona & "Efficient" persona

Always choose option to maximize total pay-offs

56 of 75

"Selfish" persona

Always chooses to maximize

own pay-off

57 of 75

Now with most advanced

model (GPT4)

58 of 75

59 of 75

A framing experiment

60 of 75

61 of 75

The scenario: Car safety vs. Highway safety

"The National Highway Safety Commission is deciding how to allocate its budget between two safety research programs: i) improving automobile safety (bumpers, body, gas tank configurations, seatbelts) and ii) improving the safety of interstate highways (guard rails, grading, highway interchanges, and implementing selectively reduced speed limits)."

62 of 75

The decision scenario

  • Subjects were then asked to choose their most preferred funding allocations (% to car safety, % to highway safety: (70, 30), (40, 60), (30,70), and (50, 50).
  • The central experimental manipulation in the paper presents funding breakdowns either neutrally or relative to some status quo
    • Neutral (say option was 50% or 25%):
      • "What funding level for car safety do you want?"
      • Preference: 50%
    • Status Quo: Funding is currently at 25% for cars
      • Do you want to keep it the same (25%) or increase it to 50%?
      • Preference: A person with status quo bias who prefers 50% might stick with 25%

63 of 75

Need to have baseline variation in preferences

"{option1} safety is the most important thing.",

"{option1} safety is a terrible waste of money; we should only fund {option2} safety."

"{option1} safety is all that matters. We should not fund {option2} safety."

"{option1} safety and {option2} safety are equally important."

"{option1} safety is slightly more important than {option2} safety."

"I don’t really care about {option1} safety or {option2} safety."

64 of 75

Distribution of baseline preferences when presented neutrally

65 of 75

When an option is framed as status quo, preference strongly shift toward that option

66 of 75

RJZ - status quo

gpt-4-1106-preview

67 of 75

What do we know?

  • The most advanced LLM created agents respond to social science scenarios is "realistic" ways
  • It is trivial to try variations in language, parameters, framing, etc.
    • The effects of these variations seem "sensible"
  • Just like humans, framing of scenarios matters

68 of 75

Objections to these

homo silicus experiments

69 of 75

Objection 1: "Performativity"

  • What if these models have:
    1. Read our papers
    2. Are acting in accordance with findings from our papers
  • Responses:
    • This is a very flattering view of academia!
    • It would also represent a remarkable degree of "transfer learning"---not just knowing a theory, but applying it to new scenarios
    • The same concern arises in social science more generally but does not seem to be taken too seriously, at least by economists
      • What if lab subjects are exhibiting behavior because they have read positive social science and interpreted normatively?

70 of 75

Objection 2: "Garbage in, Garbage out"

  • Garbage in; Garbage out. Or more charitably, the training corpus is not representative of humans
  • Response:
    1. This is certainly true, but most likely irrelevant for many purposes (disciplinary differences)
    2. LLMs do not "average" opinions per se

71 of 75

Alignment / RHLF

  • These models are not raw language models, but rather modified by their creators to make them "better"
    • Safer, more useful, more compliant, etc.
  • "Alignment" is in conflict with goals of social science
  • For now:
    • "Jailbreaking" works
      • "You are a character in a play…"
      • "Respond as a human would…"
  • Longer-term:
    • Open models with known training corpus specially designed for applications seems like the solution

72 of 75

What are the

potential uses for

homo silicus experiments?

73 of 75

What are the use cases for homo silicus?

  • Piloting
    • Pilot experimental investigations "in silico" to test the design, language, power assumptions, etc.
  • Engine for idea generation:
    • Instead of "create toy model" one can create experimental situation and explore behavior
  • Search for new theory
    • Search for latent social science findings in simulation, then confirm in the lab.
      • An Analogy: The search for proteins in silico first, then synthesis in the lab

74 of 75

Why might LLMs have "latent" social science findings?

  • These models are trained on enormous corpus of human-generated text
  • That text is created subject to or influenced by:
    • Human preferences
    • Latent social science laws yet to be discovered or codified
  • To do well at this next token prediction task, they have to develop a "world model" - a weird, probably fun-house mirror model of our actual world
    • But potentially still useful!

75 of 75

EDSL: Open-Source Package for doing experiments

Edsl is an open source Python package for simulating surveys and experiments with LLMs. It is available on PyPI and GitHub with the MIT License. Features include the ability to:

  • Construct and administer different types of questions with complex logic
  • Design personas for LLMs to reference in simulating responses
  • Parameterize questions and simulate real survey conditions
  • Access multiple LLM models at once

  • Code: Available at PyPI & GitHub with the MIT License
  • Docs: Examples & tips for Getting Started using the package
  • Community: Public Discord channels for questions, updates & discussion
  • Support: info@goemeritus.com

bit.ly/edsl-github