1 of 21

Will AI end everything?

A guide to guessing

Katja Grace, AI Impacts

2 of 21

What are we talking about?

Losing the future

Not an abstract thing

3 of 21

How to guess?

(a guess)

Priors, arguments, people

(See wiki.aiimpacts.org for more argument detail)

4 of 21

Argument from competent bad AI agents

First pass

  1. There will be very smart AI
  2. Very smart AI will have goals
  3. The goals will be bad
  4. 1, 2, 3 => future is bad

5 of 21

Two new things

2. Agents

  • More
  • New values

1. Cognitive labor

  • More
  • New distribution

6 of 21

Argument from competent bad AI agents

Quantified (ish)

How much cognitive labor will be working toward bad futures vs. good futures?

  • How much new cognitive labor?
  • How much of it had by AI agents?
  • What fraction of their goals involve bad futures?
  • What part of all this goes to good futures?

If more cognitive labor working toward bad future than good future at some time, will the future be bad?

7 of 21

  1. How much new cognitive labor?

A lot more than human cognitive labor, eventually

So let’s ignore

8 of 21

2. How much cognitive labor is controlled by new AI agents?

A rough proxy:

2.a. How likely is an AI system to be an agent?

2.b. How much of its own cognition does AI system ‘control’?

2.c. How much non-agentic AI cognitive labor will be controlled by AI systems?

9 of 21

2.a. How likely is a system to be an agent?

Considerations:

  • Economic forces (but just pressure to the middle?)
  • Spontaneous emergence
  • Coherence arguments
  • Copying humans
  • Agents are more efficient (so get more bang for their cognitive labor)
  • More agentic systems may outcompete less agentic

10 of 21

2.b. How much of its own cognition does an AI system ‘control’?

2.c. How much non-agentic AI cognitive labor will be controlled by AI systems?

11 of 21

3. What fraction of AI goals are bad?

  1. Hard to find good goals
  2. Easy to find bad yet appealing goals
  3. Basically can’t give AI systems goals anyway

12 of 21

3.A. Why is it hard to find good goals? (Part 1)

  • Most goals are in conflict because of convergent incentives
  • Value space is large
    1. Counter: so is car space
  • ‘Value is fragile’
    • Counter: is it?

Face from thispersondoesnotexist.com

13 of 21

3.A. Why is it hard to find good goals? (Part 2)

4) Against: short-term goals

a) Counter: selection effects

5) Against: why expect agents to learn the world really well but values really badly?

a) Counter: ‘sharp left turn’

14 of 21

3.B. Why is it easy to find appealing bad goals?

  • Externalities
  • Error/not trying

15 of 21

3.C. Why can’t we give AI systems goals anyway?

  • No known procedure
  • Reasons to expect procedures in general to not work
    • Finite training set is consistent with many functions
    • Deceptively aligned mesa-optimizers

16 of 21

3. What fraction of AI goals are bad?

  • Hard to find good goals
  • Easy to find bad yet appealing goals
  • Basically can’t give AI systems goals anyway

17 of 21

Argument from competent bad AI agents

Quantified (ish)

How much cognitive labor will be working toward bad futures vs. good futures?

  • How much new cognitive labor?
  • How much of it had by AI agents?
  • What fraction of their goals involve bad futures?
  • What part of all this goes to good futures?

If more cognitive labor working toward bad future than good future at some time, will the future be bad?

Share to bad AI agent goals: 14%

My overall guesses:

18 of 21

If more cognitive labor goes to bad future, will the future be bad?

Two scenarios: fast, slow

  • Different bad goals
  • Do other resources and powers matter? E.g. Money?

19 of 21

Considerations regarding the whole argument

  • Key concepts are vague
  • ‘Multiple stage fallacy’
  • Biases in argument collection
  • Are these situations ‘existentially safe?’

20 of 21

How to guess?

(a guess)

Priors, arguments, people

(See wiki.aiimpacts.org for more argument detail)

21 of 21

Conclusions

  • Probability of doom
  • Some of these parameters can be changed
  • Space for improvement