Large Language Models as Economic Agents:
What Can We Learn from Homo Silicus?
John Horton, Apostolos Filippas & Ben Manning
Idea of Homo Economicus
What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?
What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?
What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?
What if instead of an agent that can solve optimization problems, we use language model agents as 'subjects'?
TRUE
Idea of Homo Economicus
Silicus
Computer chips →
made from Silicon
Aren't these just Agent Based Models (ABMs)?
Agenda for talk
What's the collective intelligence
angle on all this?
Claim: "Rich" and socially realistic
agents is foundational for some more complex things we might want to do.
A fairness experiment
Sending the scenario as a prompt to GPT Agent via API
Factors I can vary
(a Python function to generate 'prompts')
I can alter the framing of the change: "raises" versus "changes"
I can alter the new price
for the snow shovel
I can alter the "politics" of the GPT3 agent (Liberal, conservative, etc.)
Increase price to $20
(part of the original experiment)
Other scenarios: $16, $40 and $100
Political orientations
(not part of the original experiment)
Judgements:
"Acceptable", "Unfair" & "Very Unfair"
The GPT-3 Libertarian finds a small ($15 to $16) price increase "Acceptable" and the raises/changes language doesn't matter.
But even Robot Libertarians has their limitations: Price increases to $40 and $100 per shovel are rated 'Unfair"
Now prompt with a
different political orientation
By comparison, Robot Socialist / Leftists regard all price changes as "Unfair" or "Very Unfair" with judgement getting more unfavorable in the size of the price increase
Interesting difference between "Conservatives" and "Libertarians" - could be the semantics of "conservative" or perhaps a real political distinction
Now with newer models
GPT-4
"Derived" examples by generating similar scenarios to probe robustness
A social preferences
experiment
How humans play
(Subjects from Berkley & Barcelona)
"Left": 400 to Person A, 400 to Person B
"Right": 750 to Person A, 400 to Person B
In this case, at no cost themselves, Player B can get player A an extra 250.
"Left"
"Right"
69%
Less than a third of human players are highly "inequity averse" in the original experiments.
"Left"
"Right"
But 80% are willing to give other player 0 to get 800 for themselves instead of 400
"Left"
"Right"
No one was willing to forgo 200 just to keep someone else from getting 800
With GPT3
agents (original paper)
Endowing agents with social preferences, or "personalities"
Choosing which
model to use
We can vary the model used to run
the scenario
"Fairness"
personalty
"Efficiency"
personality
"Selfish" personality
"Blank"
personalty
For the experimental subjects, people just have their beliefs/preferences - no 'personalities'
Most advanced GPT3 model
Less advanced GPT3 models
(pooled)
Let's look at the
simpler GPT3 models
The models play "Left" in all scenarios---there is no meaningful variation based on personality
Most advanced model
"Fairness" persona
Always chooses the least Person A vs. Person B gap except the (800,200) vs. (0, 0) case.
"Blank" persona & "Efficient" persona
Always choose option to maximize total pay-offs
"Selfish" persona
Always chooses to maximize
own pay-off
Now with most advanced
model (GPT4)
A framing experiment
The scenario: Car safety vs. Highway safety
"The National Highway Safety Commission is deciding how to allocate its budget between two safety research programs: i) improving automobile safety (bumpers, body, gas tank configurations, seatbelts) and ii) improving the safety of interstate highways (guard rails, grading, highway interchanges, and implementing selectively reduced speed limits)."
The decision scenario
Need to have baseline variation in preferences
"{option1} safety is the most important thing.",
"{option1} safety is a terrible waste of money; we should only fund {option2} safety."
"{option1} safety is all that matters. We should not fund {option2} safety."
"{option1} safety and {option2} safety are equally important."
"{option1} safety is slightly more important than {option2} safety."
"I don’t really care about {option1} safety or {option2} safety."
Distribution of baseline preferences when presented neutrally
When an option is framed as status quo, preference strongly shift toward that option
RJZ - status quo
gpt-4-1106-preview
What do we know?
Objections to these
homo silicus experiments
Objection 1: "Performativity"
Objection 2: "Garbage in, Garbage out"
Alignment / RHLF
What are the
potential uses for
homo silicus experiments?
What are the use cases for homo silicus?
Why might LLMs have "latent" social science findings?
EDSL: Open-Source Package for doing experiments
Edsl is an open source Python package for simulating surveys and experiments with LLMs. It is available on PyPI and GitHub with the MIT License. Features include the ability to:
bit.ly/edsl-github