Using machine learning to understand our society
Ram Rachum
ram@rachum.com
Deck: r.rachum.com/talk-deck
Video: r.rachum.com/talk-video
Using machine learning to understand our society
Using machine learning to understand our society
What I'll talk about
Ram Rachum
What I'll talk about
I want to use machine learning to answer big questions about human culture.
Video by Mathieu Poliquin
The most interesting thing about our culture is the relationships between people and the groups that they form.
Corporate AI Research: The big players
Google Brain
Google AI
DeepMind
OpenAI
(Microsoft, Elon Musk)
Meta AI
(Facebook AI Research)
Protein folding, Alpha Go,
WaveNet,
TensorFlow
DALL-E 2, GPT-3,
AI Gym
Torch, FastMRI, DCGAN
60% accurate
This was a great video.
But there's something missing in the social dynamics between the agents.
What I'll talk about
Machine learning
Supervised learning
Unsupervised learning
Reinforcement learning
Single-agent RL
Multi-agent RL
What is Reinforcement Learning? 1/5
* 80%
accurate
What is Reinforcement Learning? 2/5
Picture: Wikipedia, CC-BY-SA
NN
Input
Output
NN
Input
Output
SL:
Env
RL:
What is Reinforcement Learning? 3/5
SL: Get an input, give the correct output.
RL:
Picture credit: Ian Maddox, CC BY-SA 4.0
What is Reinforcement Learning? 4/5
SL: Use training data curated by humans. Time-consuming and expensive.
RL: Generate an unlimited amount of training data for free using self-play.
What is Reinforcement Learning? 5/5
The secret sauce is Temporal Difference Learning.
To learn more, I recommend:
Reinforcement Learning: An Introduction by Barto and Sutton.
Multi-Agent Reinforcement Learning
Our self-driving car survives in a world that contains:
What I'll talk about
What’s missing in hide-and-seek?
We don’t want perfect team players. We want imperfect team players; just like real people.
No researcher has been able to get selfish agents to cooperate*.
As soon as agents are left to their own devices, they immediately abuse their environment.
Let's see an example.
👶 Random walk
🤤 Apples are tasty
🐷 Pig out on apples!
😨 The trees die
😇Eat less apples
😊 Sustainability
😱 Trees die and we eat less
😈 Others kill trees
😃 Trees survive
Single agent
Multi
agent
I want to create a world with multiple RL-driven agents.
I want to see the best and the worst of society.
I want to use these experiments to:
My long-term research goals: Simulate society
Timeline of my research
Edgar Duéñez-Guzmán (@duenez)
Breaking down the long-term research plan
I met with Edgar and shared my plan.
I want to create a world with multiple RL-driven agents.
After this works, I can continue on my long-term research goals.
My short-term research goals: Emergent reciprocity
What I'll talk about
The great challenge: Cooperation
Let’s remove all the elements of the game, except the decision to be nice or mean.
What’s left?
Prisoner’s Dilemma
Only two moves: Cooperate and Defect.
Both cooperate: Both get +1 reward.
Both defect: Both get -1 reward.
One cooperates, one defects:
Cooperator gets -2 reward, defector gets +2.
Iterated Prisoner’s Dilemma
Tit-for-Tat is simple and effective.
Tit-for-Tat is the “hello world” of ✨reciprocity✨
We want RL agents to learn Tit-for-Tat, but they just don’t get it.
Iterated Prisoner’s Dilemma
Catch 22
Even if you wanted to be a good person, you’d be crushed by the crowd.
The great challenge: Cooperation
Unicellular vs multicellular
Picture credits: Andrei Savitsky BY-SA 4.0 Intl, Jagiellonian University Medical College CC BY-SA 3.0
The great challenge: Cooperation
Selfish
Cooperating
Lone wolf. Maximizes its own success, doesn't care about others.
"Survival of the fittest", "rational agent" in game theory. Capitalism.
Social creature. Cares about other agents and takes risks for them. Avoids selfish agents.
Tit-for-tat, reputation models, greenbeard.
WE DON'T UNDERSTAND
THIS
WE UNDERSTAND THIS
(kind of)
Selfish
Cooperating
Divergence
(Randomness)
Catch 22
Randomness:
Random acts of kindness do happen.
The challenge is building a lasting relationship.
Selfish
Cooperating
Convergence
Selfish
Cooperating
Divergence
Convergence
Selfish
Cooperating
???
Divergence
Convergence
The great challenge: Cooperation
I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.
I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.
I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.
Things I'm working on now:
Thanks for listening 😊