ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
TheoriesNameMeanStdNeel NandaEvan HubingerAdam ScherlisVictoria KrakovnaJanos KramarDane Sherburn
Charlie Steiner
Nick TurnerRamana KumarSimeon Campos
Aron Malmborg
Jenny Nitishinskaya
2
1Force multiplier6.71.67888837586685
3
6Threat model evidence6.51.888.564987595574
4
18Intervening on training6.41.527.584668656884
5
4Auditing for deception6.31.67954875665795
6
7Improving Feedback6.31.96688449863685
7
3Auditing6.02.079549647663.594
8
2Better prediction6.01.968686354868.573
9
8Informed oversight5.81.80874748445874
10
12Cultural shift 15.71.294645776767.554
11
17Forecasting discontinuities5.71.92758667453782
12
9In the loss function5.61.44574546366876
13
10Norm setting5.41.44564548677553
14
19Auditing a training run5.42.315.584468257681
15
14
Epistemic learned helplessness
5.21.857.548463385644
16
20ELK4.92.01683356.53
17
13Cultural shift 24.91.88464358366833
18
16Get AIs to do it4.91.83678573435452
19
5Enabling coordination4.81.80634444967443
20
15Microscope AI4.32.22374339244732
21
11Regulation4.11.516.534334257543
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100