1 of 21

Will AI end everything?

A guide to guessing

Katja Grace, AI Impacts

2 of 21

What are we talking about?

Losing the future

Not an abstract thing

3 of 21

How to guess?

(a guess)

Priors, arguments, people

(See wiki.aiimpacts.org for more argument detail)

4 of 21

Argument from competent bad AI agents

First pass

There will be very smart AI
Very smart AI will have goals
The goals will be bad
1, 2, 3 => future is bad

5 of 21

Two new things

2. Agents

More
New values

1. Cognitive labor

More
New distribution

6 of 21

Argument from competent bad AI agents

Quantified (ish)

How much cognitive labor will be working toward bad futures vs. good futures?

How much new cognitive labor?
How much of it had by AI agents?
What fraction of their goals involve bad futures?
What part of all this goes to good futures?

If more cognitive labor working toward bad future than good future at some time, will the future be bad?

7 of 21

How much new cognitive labor?

A lot more than human cognitive labor, eventually

So let’s ignore

8 of 21

2. How much cognitive labor is controlled by new AI agents?

A rough proxy:

2.a. How likely is an AI system to be an agent?

2.b. How much of its own cognition does AI system ‘control’?

2.c. How much non-agentic AI cognitive labor will be controlled by AI systems?

9 of 21

2.a. How likely is a system to be an agent?

Considerations:

Economic forces (but just pressure to the middle?)
Spontaneous emergence
Coherence arguments
Copying humans
Agents are more efficient (so get more bang for their cognitive labor)
More agentic systems may outcompete less agentic

10 of 21

2.b. How much of its own cognition does an AI system ‘control’?

2.c. How much non-agentic AI cognitive labor will be controlled by AI systems?

11 of 21

3. What fraction of AI goals are bad?

Hard to find good goals
Easy to find bad yet appealing goals
Basically can’t give AI systems goals anyway

12 of 21

3.A. Why is it hard to find good goals? (Part 1)

Most goals are in conflict because of convergent incentives
Value space is large

Counter: so is car space

‘Value is fragile’

Counter: is it?

Face from thispersondoesnotexist.com

13 of 21

3.A. Why is it hard to find good goals? (Part 2)

4) Against: short-term goals

a) Counter: selection effects

5) Against: why expect agents to learn the world really well but values really badly?

a) Counter: ‘sharp left turn’

14 of 21

3.B. Why is it easy to find appealing bad goals?

Externalities
Error/not trying

15 of 21

3.C. Why can’t we give AI systems goals anyway?

No known procedure
Reasons to expect procedures in general to not work

Finite training set is consistent with many functions
Deceptively aligned mesa-optimizers

16 of 21

3. What fraction of AI goals are bad?

Hard to find good goals
Easy to find bad yet appealing goals
Basically can’t give AI systems goals anyway

17 of 21

Argument from competent bad AI agents

Quantified (ish)

How much cognitive labor will be working toward bad futures vs. good futures?

How much new cognitive labor?
How much of it had by AI agents?
What fraction of their goals involve bad futures?
What part of all this goes to good futures?

If more cognitive labor working toward bad future than good future at some time, will the future be bad?

Share to bad AI agent goals: 14%

My overall guesses:

18 of 21

If more cognitive labor goes to bad future, will the future be bad?

Two scenarios: fast, slow

Different bad goals
Do other resources and powers matter? E.g. Money?

19 of 21

Considerations regarding the whole argument

Key concepts are vague
‘Multiple stage fallacy’
Biases in argument collection
Are these situations ‘existentially safe?’

20 of 21

How to guess?

(a guess)

Priors, arguments, people

(See wiki.aiimpacts.org for more argument detail)

21 of 21

Conclusions

Probability of doom
Some of these parameters can be changed
Space for improvement