Cheat sheet - AI X-risk - Public

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z	AA	AB	AC	AD	AE	AF	AG	AH	AI	AJ	AK	AL	AM	AN	AO	AP	AQ	AR	AS	AT	AU	AV	AW	BA	BB
1	README: This is intended to provide the user with an exhaustive overview of how different risk models consider the AI alignment problem. If you are a contributor, please follow the conventions below. DISCLAIMER: This sheet reflects the personal opinions of the author; hopefully, they correspond somewhat to reality, but keep in mind it is not the ground truth. Other confounding factors could be: TODO READING CONVENTIONS: I tried to represent the author's views, to the best of my understanding. "-" means the author does not provide information about that variable. [blank] means I haven't found anything yet, but I am very unconfident the author doesn't have an opinion (or I haven't gotten around to filling the case yet).	Explanations	Where is the misalignment: in the developer, the given goal or the instantiation?			How did it come to this? Technical causes: what happened in the code. Social causes: what allowed bad code to be.										What is the AI capable of?														What is the relevant timeframe? Should we take action now, consider the possibility, leave that task to our successors?			How serious is the risk?					How does destroying the world look like?												What can we learn from this risk model?
2	WRITING CONVENTIONS: Do not edit text. If you disagree with the value of a cell: make a commentary to propose a new value and explain your position. If you want to make a minor amendment, make the cell yellow-orange, depending on the importance of the change. If you want to make a major amendment (e.g. the current value is misinforming for an experienced reader), make the cell red. Try to keep commentaries about the same cell in a single thread. If you want to add a column or make changes of the same scale, mark the topmost cell of the relevant column (the same column for a split, the column to its right for a new category) in magenta. If you disagree with an explanation or want to add to one, make a commentary and mark the explanation in blue.	Explanations	Type of misalignment			Sources of misalignment										Capabilities														Timelines			Gravity					Takeoff parameters												Takeaways
3	Typical use case: You have a potential solution. You want to think about it.	Brief summary	The developers have bad goals (and the AI is at least somewhat aligned with this badness	The AI is unaligned with its given goal	The AI is given a bad goal (unaligned with the developers)	Achieving the given goal is best done using distasteful strategies	The AI finds a goal during training which is not the correct one		This cognitive pattern is self-reinforcing							In what measure it can influence the physical world		Ability to think well, fast, much, wide							Propensity to exist, physical requirements			Propensity to have an objective (TODO: distinguish behavioural and cognitive agency)									Propensity of the failure scenario to be impossible to recover from - by default, most are irreversible, and that is often considered a requirement for existential risks	Propensity of the relevant AIs to be similar (TODO: distinguish algorithmic, computational, behavioral similarity)												TODO: expand, fill
4	- Consider which scenario it might prevent (find the proper row)	Brief summary	Intent	Inner	Outer	Technical causes				Social causes						Power		Cognition							Existence			Agency					Existential				Irreversibility	Similarity between AIs	Interaction between AIs					Speed					Warning shots
5	- Consider what dimensions of the scenario your solution modifies (columns)					Specification gaming	Goal misgeneralization		Daemons	Bad actors	Coordination							Generality		Research	Planification		Awareness																Quantity	Type				Fast	Discontinuous	Concentrated	Intelligence explosion	RSI
6	- See in what measure the scenario is improved.						Instrumental goals				Polarity									AI research
7	Hopefully, this will remind you of dimensions you might otherwise overlook, and of which scenarios your solution does not apply to.	Brief summary					Many minds happen to stumble upon these goals - because they're generally useful	The AI stumbled upon this goal and won't let go of it			Only one AI is relevant	Multiple AIs are relevant	Very many AIs must be taken in consideration	E.g. AI arms race	Non-agentic systematic dynamics lead to poor outcomes - all the other coordination problems. Catch-all term	The AI is able to grow in power by itself	The AI has been granted power by its makers	The AI is relevant to a wide variety of tasks	The AI is specialized to a single (class of) task(s)	The AI can make better AIs than itself	The AI can make long-term plans	The AI can make plans in order to achieve objectives	The AI understands information relevant to its own abilities, available actions and plans	The AI understands information relevant to its own existence, physical instantiation, development context	The AI can be easily duplicated	The AI requires a certain amount of computational power	The AI requires a certain amount of memory	The AI is very agentic	The AI is not agentic				Humanity suffers a lot, or maximally (e.g. prolonged life in misery	Humanity no longer exists	Humans have no power to steer their future	Humanity does not achieve what it could (e.g. never space colonization, never breaking out the matrix, never uploading - conditioned on that being possible)		Propensity of the relevant AIs to look alike - maximal for a singleton	Number of relevant AIs - only relevant for a specific kind of homogeneity
8		Brief summary					Convergent instrumental goals	Crystallized proxies			Uni	Some	Very multi	Competitive pressure	Moloch	Acquiring power	Put in position of power	Much	Expert	Self-improvement - RSI	Long-term planning	Agentic planning	Strategic awareness	Situational awareness	Duplicability	Computation speed	Memory size	Much	None	Short (<10y)	Medium (<50y)	Long (>50y)	S-Risk	Annihilation	Disempowerment	Loss of potential		Homogeneity		Cooperation	Coordination	Competition	Adversariality
9																				RSI

10	Risk model	Brief summary
11	Sources: TODO Literature reviews: Clarifying AI X-risk (Deepmind) ... Additional ideas for relevant variables: Distinguishing AI takeover scenarios (authors) https://www.lesswrong.com/posts/3DFBbPFZyscrAiTKS/my-overview-of-the-ai-alignment-landscape-threat-models (Neel Nanda) ...
12	Carlsmith - Is Power-seeking AI an existential risk?	Power-seeking AIs takeover the world. Quite exhaustive and detailed in the report.	Possible	Yes	Unnecessary	Agnostic				Sufficient	Irrelevant			Facilitates	-	Necessary, likely	Likely	Irrelevant		Possible	Necessary	Necessary	Necessary	Likely	Irrelevant			Yes	Unlikely		(>10% of it happening by 2070)				Yes	Yes	Likely	Irrelevant						Not really the point					Heh	Read the report and learn. Be careful of power-seeking AIs, there's a lot to unpack.
13	Christiano1 - You get what you measure	We make AIs to optimize measures. The measures are optimized. The measures were poor proxies of our desires.	Yes	Unnecessary	Likely	Yes	No	No (Yes, at the human level)	No	Possible	No	No	Yes	-	Yes	Unnecessary	Yes	Unnecessary	Possible									Unnecessary	Likely	No			(Possible?)	Unlikely	Yes		Yes		Many (behavioral)					No					Yes
14	Christiano2 - Influence-seeking behaviour is scary	AIs that try to acquire power work better than others. They acquire power. This is bad.	No	Yes	Unnecessary	Possible	Yes	-	Possible	No	No	Possible?	Yes	-	Very yes	Yes	Possible	Possible?	-									Yes	Unlikely	No?				Possible	Yes		Yes	Quite (behavioral)	Many (behavioral)					No
15	Hubinger - How likely is deceptive alignment?	The only AIs we allow to try to look good. Pretending is easier than being good. They aren't good.	No	Yes	Unnecessary	No (but "gaming the training signal")	Unnecessary	Yes	-	Irrelevant						Not necessary		Irrelevant						Relevant (high-path dependence)	Irrelevant			Necessary	No	-									1											AIs might pretend to be aligned. -> security mindset?	Distinguish high and low path-dependence.
16	Critch1 - Production Web	AI enables processes to be efficient, regardless of instantiation. For instance, companies become productive regardless of specific goals ; productive goals consume resources vital for humanity's survival. Slow takeoff.	Unnecessary	Agnostic		-				Unnecessary	No	Possible?	Yes	Aye	Very yes	Yes	Possible	Unnecessary	-	-	Unnecessary?	Unnecessary (Moloch does it)	-		Yes	-		Unnecessary	Possible	Unlikely			No	As it happens	As a start			Quite (behavioral)	Much (behavioral)	Yes	-	Yes								Identify control loops as points of leverage to prevent a Critch scenario.
17	Critch2 - Flash Economy	AI enables processes to be efficient, regardless of instantiation. For instance, companies become productive regardless of specific goals ; productive goals consume resources vital for humanity's survival. Fast takeoff.	Id.															Yes	-	Likely	Unnecessary?	Yes	-		Very yes	-		Unnecessary	Possible?	(A couple years after start of the scenario)		No	No	As it happens	As a start			Much (algorithmic)	Many (algorithmic)			Possible							Opposite
18	Cohen et al. - Advanced artificial agents intervene in the provision of reward	Advanced AI strives to wirehead itself. Catastrophic consequences ensue.	No	Yes	No	Yes	No	In some manner	No	No						-		Irrelevant		Unnecessary	Unnecessary	Yes	At least somewhat	Very	Irrelevant			Necessary	No	-			No	-	Yes, at least over the AI	-		(Singleton)	1										-	Beware trying to tame something that is more intelligent than you.
19	Soares - A central AI alignment problem: capabilities generalization, and the sharp left turn
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100