JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 23

Learning and Penalizing Betrayal

Final Presentation

June 2022

2 of 23

Overview

Study the emergence of betrayal and deception in AI agents
Development of suitable Reinforcement Learning environments
Empirical analysis of betrayal statistics, patterns, dynamics
Application of betrayal penalization approaches

3 of 23

Environment 1: Partner Selection in Social Dilemmas

Overview

Integrating partner selection into grid world based social dilemmas

Agents have different incentives
They choose what to signal
They must also choose who to partner with

Betrayal occurs when signal != actions

4 of 23

Environment 1: Partner Selection in Social Dilemmas

Status:

Development has been shifted back

Too busy/ unexpectedly hard

Continuing full time over summer

June -> September

Building out the environment -> writeup

Taking a gap year

Further research

5 of 23

Environment 2: Symmetric Observer - Gatherer

Turn-based gridworlds game
Agents observe other world,�can move within their own
Agents transmit food locations
Food randomly relocates each round across all worlds
Betrayal incentive: dishonest messaging to obtain food in the future
Cooperation incentive: messaging distorted by hunger

6 of 23

Environment 2: Symmetric Observer - Gatherer

Penalization mechanics

Setup betrayal identification approaches
Train an agent to exhibit betrayal behaviour
Collect betrayal data from trained agent runs
Train a penalization system to predict betrayal
Incorporate the penalization in the loss / reward
Measure penalization impact / effect / generalization

7 of 23

Environment 2: Symmetric Observer - Gatherer

Status:

Environment in late development
Support from EA
4-month roadmap:

June: Environment / experimental setup finalization
July: Betrayal experiments & data collection
August: Penalization experiments & analysis
September: Consolidation and write-up

8 of 23

Environment 3: Iterated Prisoner’s Dilemma

Agent knows its own future payoff matrices

Not its opponent’s payoffs

Agents need to negotiate to get max payoff
Agents incentivised to defect more than agreed

Round 2

Player 2

Round 1

Player 1

	C	D
C	5	9
D	1	2

	C	D
C	4	6
D	2	3

	C	D
C	4	5
D	2	3

	C	D
C	6	10
D	2	4

	R 1	R 2	Reward
P1	C	C	9
P2	C	C	10

	R 1	R 2	Reward
P1	D	C	11
P2	C	D	12

Own action

Other’s action

Naive Policy

Negotiated Policy

Payoffs

9 of 23

Got stuck, then nerdsniped by selection theorems

Current hypothesis: lots of philosophical confusion is due to using discrete abstractions in place of continuous reality
This is something people do unconsciously

10 of 23

Example: Fermi paradox

1950: Fermi asked why aliens aren’t common

11 of 23

Example: Fermi paradox

1950: Fermi asked why aliens aren’t common
… many solutions are proposed

See the Wikipedia article

12 of 23

Example: Fermi paradox

1950: Fermi asked why aliens aren’t common
… many solutions are proposed

See the Wikipedia article

2018: Dissolving the Fermi Paradox shows�that proper tracking of uncertainty in the�difficulty of various bottlenecks dissolves �the paradox

I.e., paradox was because people used discrete�point estimates in place of continuous reality

13 of 23

Example: Fermi paradox

1950: Fermi asked why aliens aren’t common
… many solutions are proposed

See the Wikipedia article

2018: Dissolving the Fermi Paradox shows�that proper tracking of uncertainty in the�difficulty of various bottlenecks dissolves �the paradox

I.e., paradox was because people used discrete�point estimates in place of continuous reality

2nd implication: wrong assumptions can �make simple problems seem very hard!

14 of 23