1 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
2 | # | Shorthand | Summary | ||||||||
3 | Section A ("strategic challenges") | ||||||||||
4 | 1 | Human level is nothing special / data efficiency | AGI will not be upper-bounded by human ability or human learning speed (similarly to AlphaGo). Things much smarter than human would be able to learn from less evidence than humans require. | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree |
5 | 2 | Unaligned superintelligence could easily take over | A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure. | Agree | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree |
6 | 3 | Can't iterate on dangerous domains | At some point there will be a 'first critical try' at operating at a 'dangerous' level of intelligence, and on this 'first critical try', we need to get alignment right. | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Unclear |
7 | 4 | Can't cooperate to avoid AGI | We can't just "decide not to build AGI" | Agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Unclear | Mostly disagree | Mostly disagree |
8 | 5 | Narrow AI is insufficient | We can't just build a very weak system | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Disagree | Disagree |
9 | 6 | Pivotal act is necessary | We need to align the performance of some large task, a 'pivotal act' that prevents other people from building an unaligned AGI that destroys the world. | Mostly agree | Unclear | Unclear | Unclear | Mostly disagree | Mostly disagree | Disagree | Disagree |
10 | 7 | There are no weak pivotal acts because a pivotal act requires power | It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness. | Agree | Agree | Mostly agree | Unclear | Mostly disagree | Disagree | Disagree | Disagree |
11 | 8 | Capabilities generalize out of desired scope | The best and easiest-found-by-optimization algorithms for solving problems we want an AI to solve, readily generalize to problems we'd rather the AI not solve | Agree | Agree | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree |
12 | 9 | A pivotal act is a dangerous regime | The builders of a safe system would need to operate their system in a regime where it has the capability to kill everybody or make itself even more dangerous, but has been successfully designed to not do that. | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear | Mostly disagree | Disagree |
13 | |||||||||||
14 | Section B.1: The distributional leap | ||||||||||
15 | 10 | Large distributional shift to dangerous domains | On anything like the standard ML paradigm, you would need to somehow generalize optimization-for-alignment you did in safe conditions, across a big distributional shift to dangerous conditions. | Agree | Agree | Agree | Agree | Agree | Agree | Mostly agree | Disagree |
16 | 11 | Sim to real is hard | There's no known case where you can entrain a safe level of ability on a safe environment where you can cheaply do millions of runs, and deploy that capability to save the world | Agree | Agree | Agree | Agree | Agree | Agree | Unclear | Mostly disagree |
17 | 12 | High intelligence is a large shift | Operating at a highly intelligent level is a drastic shift in distribution from operating at a less intelligent level | Agree | Agree | Agree | Agree | Agree | Agree | Unclear | Mostly disagree |
18 | 13 | Some problems only occur above an intelligence threshold | Many alignment problems of superintelligence will not naturally appear at pre-dangerous, passively-safe levels of capability. | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear | Mostly disagree |
19 | 14 | Some problems only occur in dangerous domains | Some problems seem like their natural order of appearance could be that they first appear only in fully dangerous domains. | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree |
20 | 15 | Capability gains from intelligence are correlated | Fast capability gains seem likely, and may break lots of previous alignment-required invariants simultaneously. | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Unclear | Mostly disagree |
21 | |||||||||||
22 | Section B.2: Central difficulties of outer and inner alignment. | ||||||||||
23 | 16 | Inner misalignment | Outer optimization even on a very exact, very simple loss function doesn't produce inner optimization in that direction | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Unclear |
24 | 17 | Can't control inner properties | On the current optimization paradigm there is no general idea of how to get particular inner properties into a system, or verify that they're there, rather than just observable outer ones you can run a loss function over | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly disagree | Mostly disagree |
25 | 18 | No ground truth | There's no reliable Cartesian-sensory ground truth (reliable loss-function-calculator) about whether an output is 'aligned' | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree |
26 | 19 | Pointers problem | There is no known way to use the paradigm of loss functions, sensory inputs, and/or reward inputs, to optimize anything within a cognitive system to point at particular things within the environment | Agree | Agree | Agree | Agree | Agree | Unclear | Unclear | Unclear |
27 | 20 | Flawed human feedback | Human raters make systematic errors - regular, compactly describable, predictable errors. | Agree | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear |
28 | 21 | Capabilities go further | Capabilities generalize further than alignment once capabilities start to generalize far. | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Mostly disagree |
29 | 22 | No simple alignment core | There is a simple core of general intelligence but there is no analogous simple core of alignment | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear | Unclear | Mostly disagree |
30 | 23 | Corrigibility is anti-natural | Corrigibility is anti-natural to consequentialist reasoning | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree |
31 | 24 | Sovereign vs corrigibility | There are two fundamentally different approaches you can potentially take to alignment [a sovereign optimizing CEV or a corrigible agent], which are unsolvable for two different sets of reasons. Therefore by ambiguating between the two approaches, you can confuse yourself about whether alignment is necessarily difficult. | Agree | Mostly agree | Unclear | Unclear | Unclear | Mostly disagree | Mostly disagree | Mostly disagree |
32 | |||||||||||
33 | Section B.3: Central difficulties of sufficiently good and useful transparency / interpretability. | ||||||||||
34 | 25 | Real interpretability is out of reach | We've got no idea what's actually going on inside the giant inscrutable matrices and tensors of floating-point numbers. | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Unclear |
35 | 26 | Interpretability is insufficient | Knowing that a medium-strength system of inscrutable matrices is planning to kill us, does not thereby let us build a high-strength system that isn't planning to kill us. | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Unclear |
36 | 27 | Selecting for undetectability | Optimizing against an interpreted thought optimizes against interpretability. | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear | Unclear | Unclear |
37 | 28 | Large option space | A powerful AI searches parts of the option space we don't, and we can't foresee all its options. | Agree | Agree | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree |
38 | 29 | Real world is an opaque domain | AGI outputs go through a huge opaque domain before they have their real consequences, so we cannot evaluate consequences based on outputs. | Agree | Agree | Agree | Agree | Mostly agree | Mostly agree | Unclear | Mostly disagree |
39 | 30 | Powerful vs understandable | No humanly checkable output is powerful enough to save the world. | Agree | Agree | Unclear | Mostly disagree | Disagree | Disagree | Disagree | Disagree |
40 | 31 | Hidden deception | You can't rely on behavioral inspection to determine facts about an AI which that AI might want to deceive you about | Agree | Agree | Agree | Agree | Agree | Agree | Unclear | Unclear |
41 | 32 | Language is insufficient or unsafe | Imitating human text can only be powerful enough if it spawns an inner non-imitative intelligence. | Agree | Agree | Agree | Mostly agree | Unclear | Unclear | Mostly disagree | Mostly disagree |
42 | 33 | Alien concepts | The AI does not think like you do, it is utterly alien on a staggering scale. | Agree | Agree | Mostly agree | Unclear | Unclear | Unclear | Unclear | Disagree |
43 | |||||||||||
44 | Section B.4: Miscellaneous unworkable schemes. | ||||||||||
45 | 34 | Multipolar collusion | Humans cannot participate in coordination schemes between superintelligences. | Agree | Agree | Agree | Agree | Unclear | Unclear | Unclear | |
46 | 35 | Multi-agent is single-agent | Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you're playing them against each other. | Agree | Mostly agree | Unclear | Unclear | Unclear | Unclear | Mostly disagree | |
47 | 36 | Human flaws make containment difficult | Only relatively weak AGIs can be contained; the human operators are not secure systems. | Agree | Agree | Agree | Agree | Agree | Agree | Mostly agree | |
48 | |||||||||||
49 | Section C ("civilizational inadequacy") | ||||||||||
50 | 37 | Optimism until failure | People have a default assumption of optimism in the face of uncertainty, until encountering hard evidence of difficulty. | Agree | Agree | Agree | Mostly agree | Mostly agree | Mostly agree | Mostly disagree | |
51 | 38 | Lack of focus on real safety problems | AI safety field is not being productive on the lethal problems. The incentives are for working on things where success is easier. | Mostly agree | Mostly agree | Mostly agree | Unclear | Unclear | Mostly disagree | Mostly disagree | |
52 | 39 | Can't train people in security mindset | This ability to "notice lethal difficulties without Eliezer Yudkowsky arguing you into noticing them" currently is an opaque piece of cognitive machinery to me, I do not know how to train it into others. | Mostly agree | Unclear | Unclear | Unclear | Mostly disagree | Mostly disagree | Disagree | |
53 | 40 | Can't just hire geniuses to solve alignment | You cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them. | Agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Unclear | Unclear | |
54 | 41 | You have to be able to write this list | Reading this document cannot make somebody a core alignment researcher, you have to be able to write it. | Mostly agree | Mostly agree | Mostly agree | Mostly agree | Unclear | Mostly disagree | Disagree | |
55 | 42 | There's no plan | Surviving worlds probably have a plan for how to survive by this point. | Unclear | Unclear | Unclear | Mostly disagree | Disagree | Disagree | Disagree | |
56 | 43 | Unawareness of the risks | Not enough people have noticed or understood the risks. | Agree | Mostly agree | Mostly agree | Unclear | Unclear | Unclear | Disagree |