ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
your ownother group'sNotes:
2
Safety investment0.51
safety generalization matters a lot
3
Safety generalization0.50.5
if high doom, even 0.1 generalization is enough
4
Available safety, total11P(win race)
if 50% doom, hard to find parameters?
5
P(doom | no alignment)0.070.071.000
describe specific scenario: "if 50% doom, and slowing down makes you 1% safer, but 100% less likely to win arms race..."
6
P(near lockin | no alignment)0.070.07
if they suck at utopia then you have a problem
7
P(future safe | no alignment)0.860.86
If any safety work you do only transfers 50% to the other player's AI systems, and an unaligned AI only has a 7% chance of causing doom, and a 7% chance of causing short term lockin of some kind of mediocre values, and the other side has a 5% chance of locking in something of negative value in that case, and a 5% chance of destroying everything even with aligned AI, and moving from no safety to full safety means you go from a 50% chance of winning the race to a 0% chance, then your best bet is still maximal safety. Why? Because getting the other side half way to full safety is worth a lot more than than the lost chance of winning, especially since if you win you do so fairly unaligned, and you unaligned < them aligned, even if them aligned is not perfect.
8
P(mundane good | near lockin AI + control by this agent)
10.95
Unrealistic, because possible for you to get to full safety even if they maximally race—i.e. no time component
9
P(utopia | full alignment + control by this agent)
10.95
10
Head start (in safety units)0
11
12
Dropoff P(win) with safety35(scaling factor)
13
Outcomes given safety investments:
14
P(outcome|pure scenario):
If own winIf other winTotal
E(V|particular player winning)
15
own win & no alignother win & no alignown win & alignother win & align
P(outcome|own win & av. safety)
P(outcome|other win & av. safety)
P(outcome|chosen safety investments)
Value (utility)
E(V) from scenarios
E(V_own_win)E(V_other_win)E(V_combined)
16
Death0.070.07000.0000.0000.00000000
17
Mundane bad00.046500.050.0000.0500.000-0.0500-0.00250
18
Mundane good0.070.0665000.0000.0000.0000.050000
19
Future safe0.860.81710.951.0000.9501.00010.999999998710.950.9999999987
20
21
1.0001.0001.000<-- Totals -->0.999999998710.94750.9999999987
22
Value of the future
23
(fraction of max)
24
25
26
27
Instructions:
Choose the pink cell to increase the yellow cell. Change the blue cells to try different scenarios.
28
29
Modify bold cells—like this,or this,or this
Thesis: under plausible high x-risk scenarios you basically want your own alignment research to be maximally high.
30
Blue cells are general parameters of interest
31
Grey cells are also modifiable parameters, because why not, but probably not of interest to modify
32
Pink cells are for modifying to make choices for the players within the scenario (i.e. to play out a race)
33
Orange cells are intermediate values that vary according to your choices. Don't modify them directly.
34
Yellow cell is how good the future is. i.e. the goal is to modify the pink numbers to make the yellow number high
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100