1 of 40

AI Safety Fieldbuilding

ALIGN Japan – Ryan Kidd

2 of 40

Ryan Kidd

2017-2022 PhD in Physics, UQ

2022-now Co-Director, MATS

2023-now Co-Founder, LISA

2023-now Regrantor, Manifund

3 of 40

Artificial general intelligence (AGI)

  1. What is it?
  2. When will it be here?
  3. How will it change the world?

4 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

Metaculus criteria:

  • Passes 2 h adversarial Turing test
  • Assembles model car
  • 75% on all MMLU tasks + 90% on avg.
  • 90% top-1 accuracy on APPS

5 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

6 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

7 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

Automation of labor

8 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

Economic growth

Transformative AI: causes 20-30% GDP growth per year

Doubling time: ~3 years

9 of 40

Artificial general intelligence (AGI)

  • What is it?
  • When will it be here?
  • How will it change the world?

Societal change

Types of superintelligence:

Quality: Smarter than humans

Speed: Faster than humans

Collective: More numerous or organized than humans

10 of 40

Mission

  • AGI may be dangerous and near
  • AI safety is talent-constrained
  • We can help!

Metaculus: 9% AI x-risk by 2100

11 of 40

Mission

  • AGI may be dangerous and near
  • AI safety is talent-constrained
  • We can help!

12 of 40

Mission

  • AGI may be dangerous and near
  • AI safety is talent-constrained
  • We can help!

Manhattan

Apollo

AGI

TAI

+28% per year

13 of 40

Mission

  • AGI may be dangerous and near
  • AI safety is talent-constrained
  • We can help!

14 of 40

Three futures

  1. Privatized AGI
  2. Nationalized AGI
  3. AGI winter

15 of 40

Three futures

  • Privatized AGI
  • Nationalized AGI
  • AGI winter

AI Safety Institutes

16 of 40

Three futures

  • Privatized AGI
  • Nationalized AGI
  • AGI winter

17 of 40

Three futures

  • Privatized AGI
  • Nationalized AGI
  • AGI winter

In what year would AI systems be able to replace 99% of current fully remote jobs?

18 of 40

Three futures

Privatized AGI AI gov + evals + infosec + lab safety teams

Nationalized AGI International coalition building + open source AI alignment

AGI winter “Provably safe AI” + all the above

19 of 40

What can we do?

ALIGN Japan

20 of 40

Goals

  • Accelerate high-impact scholars
  • Support high-impact mentors
  • Grow AI safety research field

21 of 40

What do new researchers need?

  • Strong technical skills
  • High-quality mentorship
  • Basic understanding of threat models
  • Community + support
  • For jobs: publications, fast coding

22 of 40

What do mentors need?

  • Research assistants
  • Hiring pipeline
  • Experience as a manager
  • Support to scale projects

23 of 40

What does the AI safety field need?

  • Connectors: Build new paradigms
  • Iterators: Further current paradigms
  • Amplifiers: Scale projects

Organization

Talent needs

Scaling lab safety teams

Iterators > Amplifiers

Growing technical safety orgs (10-30 FTE)

Amplifiers > Iterators

Small technical safety orgs (<10 FTE)

Iterators > ML engineers

Independent research

Iterators > Connectors

24 of 40

What does the AI safety field need?

  • Connectors: Paul Christiano, Buck Shlegeris, Evan Hubinger, Alex Turner, etc.
  • Iterators: Ethan Perez, Neel Nanda, Dan Hendrycks, etc.
  • Amplifiers: Bill Zito, Oliver Zhang, Emma Abele, Chris Akin, etc.

Organization

Talent needs

Scaling lab safety teams

Iterators > Amplifiers

Growing technical safety orgs (10-30 FTE)

Amplifiers > Iterators

Small technical safety orgs (<10 FTE)

Iterators > ML engineers

Independent research

Iterators > Connectors

25 of 40

What does the AI safety field need?

26 of 40

MATS’ strategy

ALIGN Japan

27 of 40

Program elements

28 of 40

MATS Program

  • Mentored research projects
  • Berkeley office + community
  • Seminars + workshops
  • Research management
  • Extension program (up to 4 months)

29 of 40

Evaluation milestones

  • Jul 19: Research plan (2-pages)
    • Theory of change
    • Timeline + output

  • Aug 23: Symposium talk (5 min + Q&A)
    • What did you do and why?
    • Results + relevance

  • Aug 23: Mentor reports + scholar self-reports

30 of 40

Research management

  • Help scholars set goals, be productive, and use mentor time well
  • Help mentors track scholar research
  • Help MATS achieve program outcomes

31 of 40

How have we done so far?

ALIGN Japan

32 of 40

MATS history

  • 213 scholars
  • 47 mentors

33 of 40

What are we doing now?

MATS Team Orientation

34 of 40

MATS 6.0 research interest

  • 1221 scholar applications
  • 60 mentor applications

35 of 40

MATS 6.0 research portfolio

and more!

36 of 40

MATS 6.0 mentors

Interpretability

  • Neel Nanda & �Arthur Conmy
  • Lee Sharkey
  • Alex Turner
  • Stephen Casper
  • Adrià Garriga Alonso
  • Jesse Hoogland
  • Lawrence Chan

  • Erik Jenner
  • Jason Gross
  • Jessica Rumbelow
  • Adam Shai
  • Nandi Schoots
  • Lucius Bushnaq & �Jake Mendel

Oversight + control

  • Ethan Perez & Akbir Khan
  • Fabien Roger
  • Buck Shlegeris
  • David Lindner
  • Sebastian Farquhar
  • Mantas Mazeika
  • Scott Emmons
  • Shi Feng

37 of 40

MATS 6.0 mentors

Evaluations

  • Evan Hubinger
  • Owain Evans
  • Hjalmar Wijk
  • Marius Hobbhahn
  • Jérémy Scheurer
  • Francis Rhys Ward
  • Steven Basart
  • Nico Miailhe

Governance

  • Timothy Fist
  • Mauricio Baker
  • Lisa Thiergart
  • Matija Franklin & Philip Moreira Tomei

Value alignment

  • Micah Carroll
  • Brad Knox
  • Tsvi Benson-Tilsen

Cooperative AI

  • Christian Schroeder de Witt
  • Jan Kulveit

Provably safe AI

  • Yoshua Bengio & MILA

38 of 40

MATS 6.0 scholars

Median: Male, 26, Masters student,� 520/600 on CodeSignal

18%

81%

39 of 40

Ryan’s Manifund requests for proposals

40 of 40

AI Safety Fieldbuilding

ALIGN Japan – Ryan Kidd