ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
NameGameDescription
Average over final 5M steps
Stddev over final 5M steps
2
si-natureSpace InvadersAs of 6/10/2016 my best approx of nature. The new change is that target network should be updated every 10k steps, not every 10k training batches1139158
3
si-nature-oldSpace InvadersAs of 5/10/2016 my best approx of nature paper1091163
4
si-least-squares-lossSpace InvadersThis is like the nature paper (as best I can approximate on 5/10/16) but with NO weight normalization and a simple least squares loss function (meaning no loss clipping)87473
5
si-no-weight-normalizationSpace InvadersAs of 5/10/16 this is my best approx of the nature paper MINUS weight normalization871187
6
si-3M-replaySpace Invaders3M replay, no weight normalization (otherwise my best approx of nature paper on 5/10/16)1039183
7
si-333K-replaySpace Invaders333,333 replay, no weight normalization (otherwise my best approx of nature paper on 5/10/16)861165
8
si-prioritized-replay-0.1Space Invadersnature (incl normalized weights) + prioritized replay with 0.1 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.1021129
9
si-prioritized-replay-1.0Space Invadersnature (incl normalized weights) + prioritized replay with 1.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.831117
10
si-prioritized-replay-2.0Space Invadersnature (incl normalized weights) + prioritized replay with 2.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.82598
11
si-prioritized-replay-4.0Space Invadersnature (incl normalized weights) + prioritized replay with 4.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.62862
12
bo-natureBreakoutAs of 5/20/216 my best approx of nature paper28478
13
bo-4M-replayBreakout4M replay capacity, otherwise just like Nature (as of 5/20/16)33071
14
si-prioritized-replay-penalizedSpace InvadersLike nature but with prioritized replay with coefficient of 3, and .8 scaling factor after every selection (to penalize selection)1018155
15
si-deepmind-prio-replay-partial
Space InvadersAttempt to implement deepmind style prioritized replay based on delta Q values ("learning potential") but without Importance sampling. Not clear I even go tthe first part implemented right. It was a quick hack656129
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100