github/gtoubassi/dqn-atari results

	A	B	C	D	E
1	Name	Game	Description	Average over final 5M steps	Stddev over final 5M steps

2	si-nature	Space Invaders	As of 6/10/2016 my best approx of nature. The new change is that target network should be updated every 10k steps, not every 10k training batches	1139	158
3	si-nature-old	Space Invaders	As of 5/10/2016 my best approx of nature paper	1091	163
4	si-least-squares-loss	Space Invaders	This is like the nature paper (as best I can approximate on 5/10/16) but with NO weight normalization and a simple least squares loss function (meaning no loss clipping)	874	73
5	si-no-weight-normalization	Space Invaders	As of 5/10/16 this is my best approx of the nature paper MINUS weight normalization	871	187
6	si-3M-replay	Space Invaders	3M replay, no weight normalization (otherwise my best approx of nature paper on 5/10/16)	1039	183
7	si-333K-replay	Space Invaders	333,333 replay, no weight normalization (otherwise my best approx of nature paper on 5/10/16)	861	165
8	si-prioritized-replay-0.1	Space Invaders	nature (incl normalized weights) + prioritized replay with 0.1 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.	1021	129
9	si-prioritized-replay-1.0	Space Invaders	nature (incl normalized weights) + prioritized replay with 1.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.	831	117
10	si-prioritized-replay-2.0	Space Invaders	nature (incl normalized weights) + prioritized replay with 2.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.	825	98
11	si-prioritized-replay-4.0	Space Invaders	nature (incl normalized weights) + prioritized replay with 4.0 coefficient (aggressive prioritization). Cut short after 28M steps due to performance.	628	62
12	bo-nature	Breakout	As of 5/20/216 my best approx of nature paper	284	78
13	bo-4M-replay	Breakout	4M replay capacity, otherwise just like Nature (as of 5/20/16)	330	71
14	si-prioritized-replay-penalized	Space Invaders	Like nature but with prioritized replay with coefficient of 3, and .8 scaling factor after every selection (to penalize selection)	1018	155
15	si-deepmind-prio-replay-partial	Space Invaders	Attempt to implement deepmind style prioritized replay based on delta Q values ("learning potential") but without Importance sampling. Not clear I even go tthe first part implemented right. It was a quick hack	656	129
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100