1 of 67

Using machine learning to understand our society

Ram Rachum

ram@rachum.com

Research site: r.rachum.com

Monthly updates: r.rachum.com/announce

2 of 67

Using machine learning to understand our society

  • You’ll need ~0 knowledge in Machine Learning

  • 66 slides, lots of video demos

  • Oversimplified: 80% accurate, 100% interesting

3 of 67

Using machine learning to understand our society

4 of 67

What I'll talk about

  1. My personal story

  • Big picture: What I'm trying to achieve

  • Crash course in Multi-Agent Reinforcement Learning

  • My research in practice: What I’ve done, next steps

  • The great challenge: Cooperation (4 parts)

5 of 67

Ram Rachum

  • Too cool for school
    • Attended classes without enrolling

  • Python developer, open-source activist
    • Projects: PySnooper, PythonTurtle, �CPython, Django, PyPy…
    • PSF Fellow

  • 12 years as a software developer:
    • From freelancer to employee
    • Recently ex-Google

6 of 67

What I'll talk about

  • My personal story

  • Big picture: What I'm trying to achieve

  • Crash course in Multi-Agent Reinforcement Learning

  • My research in practice: What I’ve done, next steps

  • The great challenge: Cooperation (4 parts)

7 of 67

I want to use machine learning to answer big questions about human culture.

Video by Mathieu Poliquin

8 of 67

9 of 67

10 of 67

The most interesting thing about our culture is the relationships between people and the groups that they form.

11 of 67

Corporate AI Research: The big players

Google Brain

Google AI

DeepMind

OpenAI

(Microsoft, Elon Musk)

Meta AI

(Facebook AI Research)

Protein folding, Alpha Go,

WaveNet,

TensorFlow

DALL-E 2, GPT-3,

AI Gym

Torch, FastMRI, DCGAN

60% accurate

12 of 67

13 of 67

14 of 67

This was a great video.

But there's something missing in the social dynamics between the agents.

15 of 67

What I'll talk about

  • My personal story

  • Big picture: What I'm trying to achieve

  • Crash course in Multi-Agent Reinforcement Learning

  • My research in practice: What I’ve done, next steps

  • The great challenge: Cooperation (4 parts)

16 of 67

Machine learning

Supervised learning

Unsupervised learning

Reinforcement learning

Single-agent RL

Multi-agent RL

17 of 67

What is Reinforcement Learning? 1/5

* 80%

accurate

18 of 67

19 of 67

20 of 67

What is Reinforcement Learning? 2/5

Picture: Wikipedia, CC-BY-SA

NN

Input

Output

NN

Input

Output

SL:

Env

RL:

21 of 67

What is Reinforcement Learning? 3/5

SL: Get an input, give the correct output.

RL:

  • No “correct”, only reward.
  • Get an input, give an output.
  • Never know for sure whether the output was good.

Picture credit: Ian Maddox, CC BY-SA 4.0

22 of 67

What is Reinforcement Learning? 4/5

SL: Use training data curated by humans. Time-consuming and expensive.

RL: Generate an unlimited amount of training data for free using self-play.

23 of 67

What is Reinforcement Learning? 5/5

The secret sauce is Temporal Difference Learning.

To learn more, I recommend:

Reinforcement Learning: An Introduction by Barto and Sutton.

24 of 67

Multi-Agent Reinforcement Learning

Our self-driving car survives in a world that contains:

  • Other self-driving cars.

  • Human drivers, varying driving styles.

  • Our competitor's aggressive self-driving cars.

25 of 67

What I'll talk about

  • My personal story

  • Big picture: What I'm trying to achieve

  • Crash course in Multi-Agent Reinforcement Learning

  • My research in practice: What I’ve done, next steps

  • The great challenge: Cooperation (4 parts)

26 of 67

What’s missing in hide-and-seek?

  • Groups were predetermined and equal.

  • Agents got collective reward.

  • Agents couldn’t change groups.

We don’t want perfect team players. We want imperfect team players; just like real people.

27 of 67

No researcher has been able to get selfish agents to cooperate*.

As soon as agents are left to their own devices, they immediately abuse their environment.

Let's see an example.

28 of 67

29 of 67

30 of 67

👶 Random walk

🤤 Apples are tasty

🐷 Pig out on apples!

😨 The trees die

😇Eat less apples

😊 Sustainability

😱 Trees die and we eat less

😈 Others kill trees

😃 Trees survive

Single agent

Multi

agent

31 of 67

32 of 67

I want to create a world with multiple RL-driven agents.

  • The agents start completely selfish, and they'll learn to socialize.

  • Their social behavior should be authentic and unscripted.

  • I want to see them cooperate; and also fight. I want to see a social order. I want to see groups form up, and then dissolve.

I want to see the best and the worst of society.

I want to use these experiments to:

  1. Get a better understanding of human society and how we cooperate around large-scale problems like global warming.

  • Find a path towards safe Artificial General Intelligence.

My long-term research goals: Simulate society

33 of 67

Timeline of my research

  • 2006: Learned Repeating Games from Robert "Israel" Aumann.

  • 2009-2011: Ran experiments in Iterated Prisoner's Dilemma.

  • 2019: Ran experiments in multi-agent reinforcement learning.

  • Jan 2021: Edgar started mentoring me.

  • Mar 2022: Started working on the research full-time.

34 of 67

Edgar Duéñez-Guzmán (@duenez)

  • Staff Research Engineer at DeepMind

  • Ph.D. in CompSci, background in Bio

  • MARL team

  • Authored dozens of relevant papers

  • Interested in using AI to fight discrimination, among other things

35 of 67

36 of 67

37 of 67

Breaking down the long-term research plan

I met with Edgar and shared my plan.

  • He sent me papers to read. I read them. He sent more.

  • He helped me understand what's known in the field and what the open questions are.

  • We used this information to plan an MVP.

38 of 67

I want to create a world with multiple RL-driven agents.

  • The agents need to be completely selfish.

  • I want the agents to reciprocate. I want agents to be good to each other, and recognize when others are good to them.

After this works, I can continue on my long-term research goals.

My short-term research goals: Emergent reciprocity

39 of 67

What I'll talk about

  • My personal story

  • Big picture: What I'm trying to achieve

  • Crash course in Multi-Agent Reinforcement Learning

  • My research in practice: What I’ve done, next steps

  • The great challenge: Cooperation (4 parts)

40 of 67

The great challenge: Cooperation

  1. The key to cooperation is ✨reciprocity✨

  • Cooperation in our daily lives.

  • What science knows about cooperation and what’s missing.

  • How I plan to teach agents to ✨reciprocate✨

41 of 67

Let’s remove all the elements of the game, except the decision to be nice or mean.

What’s left?

42 of 67

Prisoner’s Dilemma

Only two moves: Cooperate and Defect.

Both cooperate: Both get +1 reward.

Both defect: Both get -1 reward.

One cooperates, one defects:

Cooperator gets -2 reward, defector gets +2.

43 of 67

44 of 67

Iterated Prisoner’s Dilemma

Tit-for-Tat is simple and effective.

Tit-for-Tat is the “hello world” of reciprocity

We want RL agents to learn Tit-for-Tat, but they just don’t get it.

45 of 67

Iterated Prisoner’s Dilemma

A great interactive demo by Nicky Case:

ncase.me/trust

46 of 67

Catch 22

Even if you wanted to be a good person, you’d be crushed by the crowd.

47 of 67

The great challenge: Cooperation

  • The key to cooperation is ✨reciprocity✨

  • Cooperation in our daily lives.

  • What science knows about cooperation and what’s missing.

  • How I plan to teach agents to ✨reciprocate✨

48 of 67

  • Our most basic behaviors are cooperation.

  • When we're angry and hurt other people on purpose, it's cooperation.

  • The cells in our body are cooperating.

49 of 67

Unicellular vs multicellular

Picture credits: Andrei Savitsky BY-SA 4.0 Intl, Jagiellonian University Medical College CC BY-SA 3.0

50 of 67

The great challenge: Cooperation

  • The key to cooperation is ✨reciprocity✨

  • Cooperation in our daily lives.

  • What science knows about cooperation and what’s missing.

  • How I plan to teach agents to ✨reciprocate✨

51 of 67

Selfish

Cooperating

Lone wolf. Maximizes its own success, doesn't care about others.

"Survival of the fittest", "rational agent" in game theory. Capitalism.

Social creature. Cares about other agents and takes risks for them. Avoids selfish agents.

Tit-for-tat, reputation models, greenbeard.

WE DON'T UNDERSTAND

THIS

WE UNDERSTAND THIS

(kind of)

52 of 67

Selfish

Cooperating

Divergence

(Randomness)

53 of 67

Catch 22

54 of 67

Randomness:

  • Genetic mutations
  • Exploration factor
  • Different histories

55 of 67

Random acts of kindness do happen.

The challenge is building a lasting relationship.

56 of 67

Selfish

Cooperating

Convergence

57 of 67

58 of 67

Selfish

Cooperating

Divergence

Convergence

59 of 67

Selfish

Cooperating

???

Divergence

Convergence

60 of 67

The great challenge: Cooperation

  • The key to cooperation is ✨reciprocity✨

  • Cooperation in our daily lives.

  • What science knows about cooperation and what’s missing.

  • How I plan to teach agents to ✨reciprocate✨

61 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

62 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

63 of 67

I will change my own behavior, against all my instincts, Other agent changes behavior, against all its instincts.

64 of 67

65 of 67

GridRoyale

An experiment I did in 2020

Play with it:

grid-royale.herokuapp.com

66 of 67

Things I'm working on now:

  • Running experiments using PettingZoo and SB3

  • Learning how to write research papers

  • Designing experiments in emergent communication

67 of 67

Thanks for listening 😊

Research site: r.rachum.com

Monthly updates: r.rachum.com/announce