1 of 16

Forming Inside Views on AI Safety

(Without Stress!)

2 of 16

What is an inside view?

  • Both obviously ridiculous
  • A “true” inside view is ludicrous

Caricature:

  • Inside View = I have a deep argument from first principles about why AI Safety matters, and fully understand every single step and all of its implications.
  • Deferring = AI Safety matters because Eliezer Yudkowsky says so - I don't need to know anything more than that.

3 of 16

Why I Disagree With This Caricature

  • Outside view
    • People who’ve thought much longer than I have still disagree
  • The world is complicated
    • Eg, understanding AI Timelines requires economics, AI hardware, international relations, tech financing, deep learning, politics, etc
    • We can never fully not defer
  • Lies on a spectrum - good to be closer to an inside view, but know you’ll never get there

4 of 16

What does an inside view look like?

  • Inside view = zooming in
  • Eg: It is valuable to work on reducing AGI x-risk
    1. AGI will happen in the next 50 years (>50% prob)
    2. If AGI is created, by default it will likely want to cause x-risk
    3. If AGI exists and wants to cause x-risk it will likely succeed
    4. There are actions we can take today that will make AGI x-risk less likely
  • Features:
    • Expand into sub-claims
    • Probabilistic
    • Progress!
    • But still has black boxes

5 of 16

Exercise 1: Practice Expanding (5 mins)

Pick a high-level question that feels important to you, and practice expanding it into sub-claims. See how far you can get.

Example Qs:

  • It is valuable to work on reducing AGI X-risk
  • A misaligned AGI could/couldn't cause x-risk
  • We will/won't get AGI by 2070
  • Deep Learning is/isn't sufficient to get AGI without further breakthroughs
  • Inner alignment is/isn't a big deal
  • Reducing AI x-risk is/isn't tractable
  • The world could/couldn't coordinate to not build AGI

6 of 16

Why Care?

  1. Truth-tracking
    1. Surprisingly hard/overrated
  2. Motivation
    • Varies but can be v important
  3. Research skill
    • Very important but not the same as truth-tracking
  4. Community epistemics
    • Information cascades = bad

7 of 16

Misconceptions

  • Cannot do anything until I have figured everything out
  • Need to figure this out urgently
  • Ought to be able to get there easily
  • I need to find the one true agenda/perspective
  • I can never defer to anyone on anything

8 of 16

How this hurt me

  • This caused me a lot of stress
  • Thought I needed to find the “one true agenda”
    • Before graduating!
  • Almost gave up on AI Safety
  • Turns out I can still do good research without a “true” inside view

9 of 16

Healthily Forming Inside Views

  • You don't have to form an inside view to work on AI Safety/think it's important
    • Comparative Advantage
  • It'll happen naturally over time
    • Most decisions are reversible
  • Inside views are on a continuum
  • Expect it to take a long time to do it "right"
    • PhDs take years

10 of 16

Concrete Actions

Getting started:

  • Read + Summarise
  • Talk + Paraphrase
  • Goal: To understand, not to agree
    • Then evaluate and critique

Improving:

  • Keep zooming in
  • Generate counter-arguments

Tip: Set a 5 minute timer!

11 of 16

Example: Zooming In

  • AGI will happen in the next 50 years (>50% prob)
  • An arbitrarily good language model is human-level intelligent
  • Current techniques will keep getting better with more compute + more data, because scaling laws
  • We will have enough compute to get there by 2070

12 of 16

Example: Counter-Arguments

  • If AGI is created, by default it will likely want to cause x-risk
  • No economic incentive to create a dangerous system
  • We'll get warning shots
  • Alignment will be easy
  • AGI won't be an agent (ie it can't want things)

13 of 16

Exercise 2: Practice Improving (15 min)

  • Take your inside view from earlier
  • Practice one of the techniques
    • Zooming in
    • Generate counter-arguments
    • Read + Summarise

14 of 16

Tips

  • You have permission to disagree
  • Don't be a monk
  • Everything is on a spectrum
  • Intelligent deferring
  • Form deep inside views in domains
  • ML skill is neither necessary nor sufficient

15 of 16

Closing Thoughts

  • Lies on a spectrum
    • True inside is impossible - don't be a perfectionist
    • Still worthwhile, strive to improve
  • Looks like iteratively zooming in
  • Concrete actions: read + summarise; talk + paraphrase; zoom in; generate counter-arguments
  • Takes time, don't be a monk
  • You don't have to form an inside view

16 of 16

Post-Talk

  • Main recommendation - practice!
  • Resources + Useful Links: bit.ly/insideviewresources
  • Experiment: Sign-Up to be put into self-organised discussion groups: bit.ly/insideviewdiscussion