ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Cobbe et al. 2019 (using openai/baselines’ PPO)
Cobbe et al. 2020 (re-refactored PPO)
average
2
PPO (100M)PPO (200M)PPO (100M)
3
Figure 4 (Train)Figure 2 (PPO)
4
CoinRun7.37.58.37.7
5
StarPilot1013.518.313.93333333
6
CaveFlyer44.585.5
7
Dodgeball5.57.56.56.5
8
FruitBot7162014.33333333
9
Chaser5766
10
Miner12171013
11
Jumper6.5766.5
12
Leaper6987.666666667
13
Maze6.37.576.933333333
14
BigFish691710.66666667
15
Heist3.84.53.84.033333333
16
Climber68.58.57.666666667
17
Plunder5.25.4148.2
18
Ninja6797.333333333
19
BossFight9.410109.8
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100