1 of 16

Forming Inside Views on AI Safety

(Without Stress!)

2 of 16

What is an inside view?

Both obviously ridiculous
A “true” inside view is ludicrous

Caricature:

Inside View = I have a deep argument from first principles about why AI Safety matters, and fully understand every single step and all of its implications.
Deferring = AI Safety matters because Eliezer Yudkowsky says so - I don't need to know anything more than that.

3 of 16

Why I Disagree With This Caricature

Outside view

People who’ve thought much longer than I have still disagree

The world is complicated

Eg, understanding AI Timelines requires economics, AI hardware, international relations, tech financing, deep learning, politics, etc
We can never fully not defer

Lies on a spectrum - good to be closer to an inside view, but know you’ll never get there

4 of 16

What does an inside view look like?

Inside view = zooming in
Eg: It is valuable to work on reducing AGI x-risk

AGI will happen in the next 50 years (>50% prob)
If AGI is created, by default it will likely want to cause x-risk
If AGI exists and wants to cause x-risk it will likely succeed
There are actions we can take today that will make AGI x-risk less likely

Features:

Expand into sub-claims
Probabilistic
Progress!
But still has black boxes

5 of 16

Exercise 1: Practice Expanding (5 mins)

Pick a high-level question that feels important to you, and practice expanding it into sub-claims. See how far you can get.

Example Qs:

It is valuable to work on reducing AGI X-risk
A misaligned AGI could/couldn't cause x-risk
We will/won't get AGI by 2070
Deep Learning is/isn't sufficient to get AGI without further breakthroughs
Inner alignment is/isn't a big deal
Reducing AI x-risk is/isn't tractable
The world could/couldn't coordinate to not build AGI

6 of 16

Why Care?

Truth-tracking

Surprisingly hard/overrated

Motivation

Varies but can be v important

Research skill

Very important but not the same as truth-tracking

Community epistemics

Information cascades = bad

7 of 16

Misconceptions

Cannot do anything until I have figured everything out
Need to figure this out urgently
Ought to be able to get there easily
I need to find the one true agenda/perspective
I can never defer to anyone on anything

8 of 16

How this hurt me

This caused me a lot of stress
Thought I needed to find the “one true agenda”

Before graduating!

Almost gave up on AI Safety
Turns out I can still do good research without a “true” inside view

9 of 16

Healthily Forming Inside Views

You don't have to form an inside view to work on AI Safety/think it's important

Comparative Advantage

It'll happen naturally over time

Most decisions are reversible

Inside views are on a continuum
Expect it to take a long time to do it "right"

PhDs take years

10 of 16

Concrete Actions

Getting started:

Read + Summarise
Talk + Paraphrase
Goal: To understand, not to agree

Then evaluate and critique

Improving:

Keep zooming in
Generate counter-arguments

Tip: Set a 5 minute timer!

11 of 16

Example: Zooming In

AGI will happen in the next 50 years (>50% prob)
An arbitrarily good language model is human-level intelligent
Current techniques will keep getting better with more compute + more data, because scaling laws
We will have enough compute to get there by 2070

12 of 16

Example: Counter-Arguments

If AGI is created, by default it will likely want to cause x-risk
No economic incentive to create a dangerous system
We'll get warning shots
Alignment will be easy
AGI won't be an agent (ie it can't want things)

13 of 16

Exercise 2: Practice Improving (15 min)

Take your inside view from earlier
Practice one of the techniques

Zooming in
Generate counter-arguments
Read + Summarise

Alternative exercise: Read and summarise “Why AI alignment could be hard with modern deep learning” by Ajeya Cotra

14 of 16

Tips

You have permission to disagree
Don't be a monk
Everything is on a spectrum
Intelligent deferring
Form deep inside views in domains
ML skill is neither necessary nor sufficient

15 of 16

Closing Thoughts

Lies on a spectrum

True inside is impossible - don't be a perfectionist
Still worthwhile, strive to improve

Looks like iteratively zooming in
Concrete actions: read + summarise; talk + paraphrase; zoom in; generate counter-arguments
Takes time, don't be a monk
You don't have to form an inside view

16 of 16

Post-Talk

Main recommendation - practice!
Resources + Useful Links: bit.ly/insideviewresources
Experiment: Sign-Up to be put into self-organised discussion groups: bit.ly/insideviewdiscussion