Summary of Feedback on JC Alignment Report

	A	B	C	D	E	F	G	H	I	J
1	Reviewer	Main Objections	Timelines	Incentives	Alignment difficulty	High-Impact Failures	Disempowerment	Catastrophe	Overall ~p(doom by 2070), not necessarily calculated from previous premises	Link to review

2	Aschenbrenner	Longer timelines. Correction is relatively easy absent foom or long-term correlated deception, given multi-polarity, cybersecurity, compute control, and help from AI and other new tech. Catastrophe unlikely conditional on no correction, because values of uncorrected systems are probably only subtly misaligned, and might be better than human values anyway.	25%	80%	40%	65%	20%	50%	0.5%	https://docs.google.com/document/d/1fGOPLuDUX9HlCsf07hitdTqp1DNffP0fWzMRH_BzSfA/edit
3	Garfinkel	Longer timelines. Wants more clarity about the role of planning in the argument, and about the difference that modularity might make. Skepticism about instrumental convergence hypothesis as stated. Skepticism about analogy with human evolution. Wary of reliance on the notion of "objectives." Plausible that it will be easy to select for non-power-seeking behavior in practice.	30%	80%	15%	50%	30%	75%	0.4%	https://docs.google.com/document/d/1FlGPHU3UtBRj4mBPkEZyBQmAuZXnyvHU-yaH-TiNt8w/edit#heading=h.mxr5xtjsd8n2
4	Kokotajlo	Timelines median is 2030. Incentives highly likely. Alignment likely to be difficult. Lots of optimism comes down to (or correlates with) warning shots, and there are many ways to not get suitable warning shots. Pessimism about adequacy of civilizational response. Disempowerment very likely conditional on high-impact failures. Multi-stage fallacy worries. Argument from minimal deference to pessimists.	77%	90%	75%	90%	95%	98%	65%	https://docs.google.com/document/d/1GwT7AS_PWpglWWrVrpiMqeKiJ_E2VgAUIG5tTdVhVeM/edit#heading=h.rq4krnj82zba
5	Levinstein	Wants more attention to the balance of power between aligned and misaligned AIs; to the possibility that the AIs get power by humans giving it to them voluntarily; to the non-intelligences advantages humans might have; and to the fact that humans aren't especially power-seeking.	75%	65%	65%	80%	50%	90%	12%	https://docs.google.com/document/d/18jaq4J48UUoff7fJ4PQZ8Dy458ZcpF36fVkLclYKjeI/edit#heading=h.rq4krnj82zba
6	Lifland	Wants more focus on "win conditions" rather than avoiding catastrophe, and thinks the catastrophe focus leads to underweighting of "many actors" problems (and of the plausibility of more extreme probabilities more generally). Thinks that the premises don't cover some scenarios (e.g., where systems that aren't superficially attractive to deploy get deployed). Wants more clarity about what strategic awareness is. Higher probability on incentives, alignment difficulty, and post-deployment catastrophe.	65%	90%	75%	90%	80%	95%	30% (~35-40% for any AI-mediated existential catastrophe)	https://docs.google.com/document/d/1nCqjJXydfPQGRTKT71jQn8yXi4FSHnVSdfZbhQUMa1I/edit#heading=h.twyi0dkec6u3
7	Nanda	Wanted more timelines discussion. Deployment discussion too focused on rational-agent cost-benefit analysis, and not enough on biases. Better to split out the warning shots variable explicitly.	65%	50%	70%	63%	64%	98%	9%	https://docs.google.com/document/d/1Da5ErtIzgkGv8JMh59SQcCmcaSx-83OpGrm7Jh7xxPA/edit#heading=h.rq4krnj82zba
8	Soares	Shorter timelines. Expects incompetence in civilization's response, such that alignment need not be all that hard for us to fail with high probability, and we wouldn't respond adequately to warning shots if we got them. Narrow band of capability required for high-impact but still correctable warning shots. Possible background disagreements about ease of decisive strategic advantage, and plausibility of rapid/discontinuous take-off. Multi-stage fallacy worries. Argument for survival should be conjunctive rather than disjunctive.	85%	95% on a premise like "Alignment Difficulty", but conditioned only on Timelines.		95% on existential catastrophe from deployed, practically PS-misaligned APS-systems, conditional on first two premises.			>77%	https://www.lesswrong.com/posts/cCMihiwtZx7kdcKgt/comments-on-carlsmith-s-is-power-seeking-ai-an-existential
9	Tarsney	Somewhat more optimistic about avoiding high-impact failures. Plausible that in disempowerment scenarios, humans should've empowered the systems but didn't.	55%	80%	50%	40%	50%	80%	3.5%	https://docs.google.com/document/d/1-6Wqr4Rb87chhlePct8k8fSE7kClGQMlOiEUotyGgBA/edit#heading=h.rq4krnj82zba
10	Thorstad	Wants more clarity about agentic planning, and about required scope of strategic awareness. Wanted more positive argument for instrumental convergence claim. Skeptical of permanent disempowerment.	75%	20%	0.1%	30%	0.5%	98%	0.00002%	https://docs.google.com/document/d/1dZa9pWJgMmebGHrwrXUz3UNuKft5Tq_dIMQ4TZ0K4yA/edit#heading=h.rq4krnj82zba
11	Wallace	Longer timelines. Suspects that understanding of AI goals/motivations is closely connected with understanding how to make APS systems at all. Skeptical we will be teaching AI goals via anything as crude as training data.	10%	80%	30%	95%	75%	N/A	2%	https://docs.google.com/document/d/1kyZPqbwxmBO5NuPWiuSByPnkKiq96eRo/edit#heading=h.gjdgxs
12	Anon 1	Wants simpler argument structure and less detail/precision. Demand for ML solutions might slow down. Comparison with climate change is misleading because many more actors are in a position to emit than will be in a position to build APS-AI. Optimism about warning shots and feedback loops preventing high-impact failures. Possible that better control over AI decision making is required for increasing capabilities.	90%	50%	80%	10%	90%	70%	2%	https://docs.google.com/document/d/1yFVpxbIISUxKLTOExAIYQDd12hfsrjhyJKViwjqYW2g/edit#heading=h.rq4krnj82zba
13	Anon 2	Framing neglects correlations between existential catastrophe from human misuse of AI, and existential catastrophe from misaligned AI. Hesitant about definition of APS systems. Wants more discussion of present/future tech ecosystems. Warning shots and corrective feedback loops might stem from human misuse.	>99% (possibly >.99%, some ambiguity in doc)	50%	N/A	>80%	<<1%	<<.1% (possible misinterpretation of premise?)	<.001%	https://docs.google.com/document/d/1AVXZAthlMrovbzyf_CgTUOET6kDzs1BHNYIGpTQ_T1g/edit#heading=h.rq4krnj82zba
14	Carlsmith (for reference)	Original author	65%	80%	40%	65%	40%	95%	5%	Original report: https://docs.google.com/document/d/1smaI1lagHHcrhoi6ohdq3TYIZv0eNWWZMPEy8C8byYg/edit#heading=h.k655kzng3r22
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100