1 of 67

Using machine learning to understand our society

Ram Rachum

ram@rachum.com

Research site: r.rachum.com

Monthly updates: r.rachum.com/announce

Deck: r.rachum.com/talk-deck

Video: r.rachum.com/talk-video

2 of 67

Using machine learning to understand our society

You’ll need ~0 knowledge in Machine Learning

66 slides, lots of video demos

Oversimplified: 80% accurate, 100% interesting

3 of 67

Using machine learning to understand our society

4 of 67

What I'll talk about

My personal story

Big picture: What I'm trying to achieve

Crash course in Multi-Agent Reinforcement Learning

My research in practice: What I’ve done, next steps

The great challenge: Cooperation (4 parts)

5 of 67

Ram Rachum

Too cool for school

Attended classes without enrolling

Python developer, open-source activist

Projects: PySnooper, PythonTurtle, �CPython, Django, PyPy…
PSF Fellow

12 years as a software developer:

From freelancer to employee
Recently ex-Google

6 of 67

What I'll talk about

My personal story

Big picture: What I'm trying to achieve

Crash course in Multi-Agent Reinforcement Learning

My research in practice: What I’ve done, next steps

The great challenge: Cooperation (4 parts)

7 of 67

I want to use machine learning to answer big questions about human culture.

Video by Mathieu Poliquin

8 of 67

9 of 67

https://arxiv.org/abs/1312.5602

10 of 67

The most interesting thing about our culture is the relationships between people and the groups that they form.

11 of 67

Corporate AI Research: The big players

Google Brain

Google AI

DeepMind

OpenAI

(Microsoft, Elon Musk)

Meta AI

(Facebook AI Research)

Protein folding, Alpha Go,

WaveNet,

TensorFlow

DALL-E 2, GPT-3,

AI Gym

Torch, FastMRI, DCGAN

60% accurate

12 of 67

13 of 67

https://arxiv.org/abs/1909.07528

14 of 67

This was a great video.

But there's something missing in the social dynamics between the agents.

15 of 67

What I'll talk about

My personal story

Big picture: What I'm trying to achieve

Crash course in Multi-Agent Reinforcement Learning

My research in practice: What I’ve done, next steps

The great challenge: Cooperation (4 parts)

16 of 67

Machine learning

Supervised learning

Unsupervised learning

Reinforcement learning

Single-agent RL

Multi-agent RL

17 of 67

What is Reinforcement Learning? 1/5

* 80%

accurate

18 of 67

19 of 67

https://deepmind.com/blog/article/MuZeros-first-step-from-research-into-the-real-world

https://arxiv.org/abs/2202.06626

20 of 67

What is Reinforcement Learning? 2/5

Picture: Wikipedia, CC-BY-SA

NN

Input

Output

NN

Input

Output

SL:

Env

RL:

21 of 67

What is Reinforcement Learning? 3/5

SL: Get an input, give the correct output.

RL:

No “correct”, only reward.
Get an input, give an output.
Never know for sure whether the output was good.

Picture credit: Ian Maddox, CC BY-SA 4.0

22 of 67

What is Reinforcement Learning? 4/5

SL: Use training data curated by humans. Time-consuming and expensive.

RL: Generate an unlimited amount of training data for free using self-play.

23 of 67

What is Reinforcement Learning? 5/5

The secret sauce is Temporal Difference Learning.

To learn more, I recommend:

Reinforcement Learning: An Introduction by Barto and Sutton.

24 of 67

Multi-Agent Reinforcement Learning

Our self-driving car survives in a world that contains:

Other self-driving cars.

Human drivers, varying driving styles.

Our competitor's aggressive self-driving cars.

25 of 67

What I'll talk about

My personal story

Big picture: What I'm trying to achieve

Crash course in Multi-Agent Reinforcement Learning

My research in practice: What I’ve done, next steps

The great challenge: Cooperation (4 parts)

26 of 67

What’s missing in hide-and-seek?

Groups were predetermined and equal.

Agents got collective reward.

Agents couldn’t change groups.

We don’t want perfect team players. We want imperfect team players; just like real people.

27 of 67

No researcher has been able to get selfish agents to cooperate*.

As soon as agents are left to their own devices, they immediately abuse their environment.

Let's see an example.

28 of 67

29 of 67

We all know this behavior from real life. It's something that's very frustrating to us, and it's something that we think about a lot. When we see people behaving like that, we sometimes think "that's just how it is" or "people are like that" or maybe we blame it on other people, like we would never behave that way.

Now that we have MARL, we can explore this behavior in an objective way, without throwing blame or shrugging it off. It's now a mathematical problem that we can model and find solutions to.

Despite of the tragic demonstration we've just seen, people can cooperate and solve big problems. We can use MARL to reproduce these successes and answer the question: What is needed for large-scale cooperation between people?

This amazing opportunity is why I’m so excited about the research, and why I decided to take a huge risk for it.

30 of 67

👶 Random walk

🤤 Apples are tasty

🐷 Pig out on apples!

😨 The trees die

😇Eat less apples

😊 Sustainability

😱 Trees die and we eat less

😈 Others kill trees

😃 Trees survive

Single agent

Multi

agent

31 of 67

https://arxiv.org/pdf/2107.06857.pdf

32 of 67

I want to create a world with multiple RL-driven agents.

The agents start completely selfish, and they'll learn to socialize.

Their social behavior should be authentic and unscripted.

I want to see them cooperate; and also fight. I want to see a social order. I want to see groups form up, and then dissolve.

I want to see the best and the worst of society.

I want to use these experiments to:

Get a better understanding of human society and how we cooperate around large-scale problems like global warming.

Find a path towards safe Artificial General Intelligence.

My long-term research goals: Simulate society

33 of 67

Timeline of my research

2006: Learned Repeating Games from Robert "Israel" Aumann.

2009-2011: Ran experiments in Iterated Prisoner's Dilemma.

2019: Ran experiments in multi-agent reinforcement learning.

Jan 2021: Edgar started mentoring me.

Mar 2022: Started working on the research full-time.

34 of 67

Edgar Duéñez-Guzmán (@duenez)

Staff Research Engineer at DeepMind

Ph.D. in CompSci, background in Bio

MARL team

Authored dozens of relevant papers

Interested in using AI to fight discrimination, among other things

35 of 67

https://arxiv.org/pdf/2110.11404.pdf

36 of 67

37 of 67

Breaking down the long-term research plan

I met with Edgar and shared my plan.

He sent me papers to read. I read them. He sent more.

He helped me understand what's known in the field and what the open questions are.

We used this information to plan an MVP.

38 of 67

I want to create a world with multiple RL-driven agents.

The agents need to be completely selfish.

I want the agents to reciprocate. I want agents to be good to each other, and recognize when others are good to them.

After this works, I can continue on my long-term research goals.

My short-term research goals: Emergent reciprocity

39 of 67

What I'll talk about

My personal story

Big picture: What I'm trying to achieve

Crash course in Multi-Agent Reinforcement Learning

My research in practice: What I’ve done, next steps

The great challenge: Cooperation (4 parts)

40 of 67

The great challenge: Cooperation

The key to cooperation is ✨reciprocity✨

Cooperation in our daily lives.

What science knows about cooperation and what’s missing.

How I plan to teach agents to ✨reciprocate✨

41 of 67

Let’s remove all the elements of the game, except the decision to be nice or mean.

What’s left?

42 of 67

Prisoner’s Dilemma

Only two moves: Cooperate and Defect.

Both cooperate: Both get +1 reward.

Both defect: Both get -1 reward.

One cooperates, one defects:

Cooperator gets -2 reward, defector gets +2.

43 of 67

44 of 67

Iterated Prisoner’s Dilemma

Tit-for-Tat is simple and effective.

Tit-for-Tat is the “hello world” of ✨reciprocity✨

We want RL agents to learn Tit-for-Tat, but they just don’t get it.

45 of 67

Iterated Prisoner’s Dilemma

A great interactive demo by Nicky Case:

ncase.me/trust

46 of 67

Catch 22

Even if you wanted to be a good person, you’d be crushed by the crowd.

47 of 67

The great challenge: Cooperation

The key to cooperation is ✨reciprocity✨

Cooperation in our daily lives.

What science knows about cooperation and what’s missing.

How I plan to teach agents to ✨reciprocate✨

48 of 67

Our most basic behaviors are cooperation.

When we're angry and hurt other people on purpose, it's cooperation.

The cells in our body are cooperating.

49 of 67

Unicellular vs multicellular

Picture credits: Andrei Savitsky BY-SA 4.0 Intl, Jagiellonian University Medical College CC BY-SA 3.0

50 of 67

The great challenge: Cooperation

The key to cooperation is ✨reciprocity✨

Cooperation in our daily lives.

What science knows about cooperation and what’s missing.

How I plan to teach agents to ✨reciprocate✨

51 of 67

Selfish

Cooperating

Lone wolf. Maximizes its own success, doesn't care about others.

"Survival of the fittest", "rational agent" in game theory. Capitalism.

Social creature. Cares about other agents and takes risks for them. Avoids selfish agents.

Tit-for-tat, reputation models, greenbeard.

WE DON'T UNDERSTAND

THIS

WE UNDERSTAND THIS

(kind of)

52 of 67

Selfish

Cooperating

Divergence

(Randomness)

53 of 67

Catch 22

54 of 67

Randomness:

Genetic mutations
Exploration factor
Different histories

55 of 67

Random acts of kindness do happen.

The challenge is building a lasting relationship.

56 of 67

Selfish

Cooperating

Convergence

57 of 67

58 of 67

Selfish

Cooperating

Divergence

Convergence

59 of 67

Selfish

Cooperating

???

Divergence

Convergence

60 of 67

The great challenge: Cooperation

The key to cooperation is ✨reciprocity✨

Cooperation in our daily lives.

What science knows about cooperation and what’s missing.

How I plan to teach agents to ✨reciprocate✨

61 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

62 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

63 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

64 of 67

http://r.rachum.com/paper-fruit-slots

65 of 67

GridRoyale

An experiment I did in 2020

Play with it:

grid-royale.herokuapp.com

66 of 67

Things I'm working on now:

Running experiments using PettingZoo and SB3

Learning how to write research papers

Designing experiments in emergent communication

67 of 67

Thanks for listening 😊

Research site: r.rachum.com

Monthly updates: r.rachum.com/announce