HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player
Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone
Motivation
Create a General Video Game Playing agent which learns from visual representations
Introducing GVGP
Atari 2600
418 games with wildly
varying dynamics
Standard interface for control - 18 Actions
Two player (Multi-agent) Capabilities
Multiple standardized state representations
Good open source emulation
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Raw Game Screen
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Continuously Valued Firings
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Continuously Valued Outputs
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
Fitness Evaluation
CPPN
Neural Network
Individual
Atari 2600
Emulator
Game Score
At end of game, Score is given to the individual as fitness
Evolution
Evolution then produces the next generation
Fitness
Fitness
Fitness
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
HyperNEAT
Stanley, Ambrosio, Gauci
Extension of NEAT
A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks - Kenneth Stanley, David Ambrosio, Jason Gauci. Artificial Life 2009
HyperNEAT
HyperNEAT
Input node firings are continuous valued and taken directly from the processed screen
HyperNEAT
Firings are propagated through the network forming continuously valued outputs
HyperNEAT
How to determine connection weights?
HyperNEAT
NEAT
Evolve the weights!
HyperNEAT
Evolve the weights!
Maybe we can do better...
HyperNEAT
X1
X2
Y1
Y2
B
CPPN
HyperNEAT
3
0
3
1
B
CPPN
HyperNEAT
3
0
3
1
B
CPPN
HyperNEAT
3
0
3
1
B
CPPN
HyperNEAT
X1
X2
Y1
Y2
B
CPPN
determines all connection weights
CPPN Evolved by NEAT
X1
X2
Y1
Y2
B
Advantages of HyperNEAT vs NEAT
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
Visual Processing Framework
Raw Game Screen: 160x210 pixels; 256 colors
Visual Processing Framework
Blob Detection
Visual Processing Framework
Object Detection
Adjacent blobs with non-zero velocity are merged into objects
Visual Processing Framework
Class Detection
Self Detection
Atari-HyperNEAT Interface
Raw screen reduced to a 16x21 grid
Mapping from object classes to continuous values
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Emulator
Action Selection
HyperNEAT-GGP Architecture
Visual Processing
Action Selection
CPPN
Action
Neural Network
HyperNEAT
Atari 2600
Selected Games
Examine 2 Atari games - Freeway and Asterix
Experimental Setup
Results
| Freeway | Asterix |
Sarsa-Lambda BASS* | 0 | 402 |
Sarsa-Lambda DISCO* | 0 | 301 |
Sarsa-Lambda RAM* | 0 | 545 |
Random | 0 | 156 |
HyperNEAT-GGP Avg | 27.4 | 870 |
HyperNEAT-GGP Champ | 29 | 1000 |
*Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, 2010.
Freeway Results
Asterix Results
Related Work
Atari Game Playing: Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, 2010.
GGP: M. Genesereth and N. Love. General game playing: Overview of the aaai competition. AI Magazine, 26:62-72, 2005.
Ms. Pac-Man: S. M. Lucas. Ms pac-man competition (screen capturemode). http://dces.essex.ac.uk/staff/sml/pacman/CIG2011Results.html.
Quake II: M. Parker and B. Bryant. Backpropagation without human supervision for visual control in quake ii. Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG'09), pages 287-293, 2009.
Pitfall: C. Diuk, A. Cohen, and M. L. Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of 25th International Conference on Machine Learning (ICML), pages 240-247, 2008.
HyperNEAT:
P. Verbancsics and K. O. Stanley. Evolving static representations for task transfer. J. Mach. Learn. Res., 1:1737-1769, August 2010.
J. Gauci and K. O. Stanley. A case study on the critical role of geometric regularity in machine learning. In Proceedings of the 23rd National Conference on Articial Intelligence (AAAI), 2008.
D. B. D'Ambrosio and K. O. Stanley. Generative encoding for multiagent learning. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 819-826, New York, NY, USA, 2008. ACM.
J. Clune, B. E. Beckmann, C. Ofria, and R. T. Pennock. Evolving coordinated quadruped gaits with the hyperneat generative encoding. In Proceedings of the Eleventh conference on Congress on Evolutionary Computation, CEC'09, pages 2764{2771, Piscataway, NJ, USA, 2009. IEEE Press.
Conclusion
Questions?
Self Detection Algorithm
actions = set of actions applicable to this game
current_blobs = set of blobs in the current game frame
ActionHist = Set of action at time 0...n
for blob b in current_blobs do
vHist_b = Set of velocities of blob b at time 0...n
H_b = H(vHist_b)
for action a in actions do
vHist_(b|a) = [vHist_b[t] forall t s.t. ActionHist[t-1] == a]
H_(b|a) = H(vHist_(b|a))
end for
InfoGain_b = H_b - sum(p_a * H_(b|a))
end for
return arg_max over blobs (InfoGain_b)