1 of 49

HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player

Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone

2 of 49

Motivation

Create a General Video Game Playing agent which learns from visual representations

Little domain specific knowledge

Capable of playing different Atari 2600 games without reconfiguration

3 of 49

Introducing GVGP

GGP agents given a declarative representation of the game including complete game dynamics

In contrast we seek to learn from a visual representation without knowing dynamics

General video game playing (GVGP) is a good challenge

4 of 49

Atari 2600

418 games with wildly

varying dynamics

Standard interface for control - 18 Actions

Two player (Multi-agent) Capabilities

Multiple standardized state representations

Good open source emulation

5 of 49

6 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

7 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Raw Game Screen

Atari 2600

Emulator

8 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

9 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Continuously Valued Firings

Atari 2600

Emulator

10 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

11 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Continuously Valued Outputs

Atari 2600

Emulator

12 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

13 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

14 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

15 of 49

Fitness Evaluation

CPPN

Neural Network

Individual

Atari 2600

Emulator

Game Score

At end of game, Score is given to the individual as fitness

16 of 49

Evolution

Evolution then produces the next generation

Fitness

17 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

18 of 49

HyperNEAT

Stanley, Ambrosio, Gauci

Extension of NEAT

A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks - Kenneth Stanley, David Ambrosio, Jason Gauci. Artificial Life 2009

19 of 49

HyperNEAT

20 of 49

HyperNEAT

Input node firings are continuous valued and taken directly from the processed screen

21 of 49

HyperNEAT

Firings are propagated through the network forming continuously valued outputs

22 of 49

HyperNEAT

How to determine connection weights?

23 of 49

HyperNEAT

NEAT

Evolve the weights!

HyperNEAT

Evolve the weights!

Maybe we can do better...

24 of 49

HyperNEAT

X1

X2

Y1

Y2

B

CPPN

25 of 49

HyperNEAT

3

0

3

1

B

CPPN

26 of 49

HyperNEAT

3

0

3

1

B

CPPN

27 of 49

HyperNEAT

3

0

3

1

B

CPPN

28 of 49

HyperNEAT

X1

X2

Y1

Y2

B

CPPN

determines all connection weights

29 of 49

CPPN Evolved by NEAT

X1

X2

Y1

Y2

B

Add Nodes - Gaussian, sinusoidal, sigmoid, absolute value, linear

Add links

Change Connection Weights

30 of 49

Advantages of HyperNEAT vs NEAT

Indirect Encoding

Geometrically Aware

Learn a function and apply it regardless of the absolute location

Ultimately allows better policies to be found more quickly

31 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

32 of 49

Visual Processing Framework

Raw Game Screen: 160x210 pixels; 256 colors

33 of 49

Visual Processing Framework

Blob Detection

Simple 8-connectivity check

Each blob assigned a velocity based on its location in previous frame

34 of 49

Visual Processing Framework

Object Detection

Adjacent blobs with non-zero velocity are merged into objects

35 of 49

Visual Processing Framework

Class Detection

Objects with sufficient pixel similarity are said to belong to the same "object class"

3 main classes: car left, car right, chicken

36 of 49

Self Detection

Information gain approach to identify the object on screen most likely to be the "self"

Intuitively object that responds to actions

Self circled in red

37 of 49

Atari-HyperNEAT Interface

Raw screen reduced to a 16x21 grid

Mapping from object classes to continuous values

38 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

Emulator

39 of 49

Action Selection

Examine squares adjacent to "self"
Take directional action corresponding to max valued neighbor
no-op if "self" square is highest valued
Up action selected in this case

40 of 49

HyperNEAT-GGP Architecture

Visual Processing

Action Selection

CPPN

Action

Neural Network

HyperNEAT

Atari 2600

41 of 49

Selected Games

Examine 2 Atari games - Freeway and Asterix

42 of 49

Experimental Setup

Run HyperNEAT-GGP for 250 generations with 100 individuals in each generation

Individual evaluations performed in parallel

Individual fitness = raw game score

Compared against previously published results using Sarsa-Lambda with linear function approximation

43 of 49

Results

	Freeway	Asterix
Sarsa-Lambda BASS*	0	402
Sarsa-Lambda DISCO*	0	301
Sarsa-Lambda RAM*	0	545
Random	0	156
HyperNEAT-GGP Avg	27.4	870
HyperNEAT-GGP Champ	29	1000

*Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, 2010.

44 of 49

Freeway Results

45 of 49

Asterix Results

46 of 49

Related Work

Atari Game Playing: Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, 2010.

GGP: M. Genesereth and N. Love. General game playing: Overview of the aaai competition. AI Magazine, 26:62-72, 2005.

Ms. Pac-Man: S. M. Lucas. Ms pac-man competition (screen capturemode). http://dces.essex.ac.uk/staff/sml/pacman/CIG2011Results.html.

Quake II: M. Parker and B. Bryant. Backpropagation without human supervision for visual control in quake ii. Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG'09), pages 287-293, 2009.

Pitfall: C. Diuk, A. Cohen, and M. L. Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of 25th International Conference on Machine Learning (ICML), pages 240-247, 2008.

HyperNEAT:

P. Verbancsics and K. O. Stanley. Evolving static representations for task transfer. J. Mach. Learn. Res., 1:1737-1769, August 2010.

J. Gauci and K. O. Stanley. A case study on the critical role of geometric regularity in machine learning. In Proceedings of the 23rd National Conference on Articial Intelligence (AAAI), 2008.

D. B. D'Ambrosio and K. O. Stanley. Generative encoding for multiagent learning. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 819-826, New York, NY, USA, 2008. ACM.

J. Clune, B. E. Beckmann, C. Ofria, and R. T. Pennock. Evolving coordinated quadruped gaits with the hyperneat generative encoding. In Proceedings of the Eleventh conference on Congress on Evolutionary Computation, CEC'09, pages 2764{2771, Piscataway, NJ, USA, 2009. IEEE Press.

47 of 49

Conclusion

Introduce HyperNEAT-GGP, a general Atari game playing agent
Learns from the game screen using HyperNEAT to evolve policies for gameplay
Performance results exceed prior work (Sarsa-Lambda) for the games Asterix and Freeway
In future work extend this system to more games
Represents a first step toward the challenge of general video game playing (GVGP) from visual representations

48 of 49

Questions?

HyperNEAT-GGP Code: https://github.com/mhauskn/HyperNEAT

A.L.E. Code: http://yavar.naddaf.name/ale/

49 of 49

Self Detection Algorithm

actions = set of actions applicable to this game

current_blobs = set of blobs in the current game frame

ActionHist = Set of action at time 0...n

for blob b in current_blobs do

vHist_b = Set of velocities of blob b at time 0...n

H_b = H(vHist_b)

for action a in actions do

vHist_(b|a) = [vHist_b[t] forall t s.t. ActionHist[t-1] == a]

H_(b|a) = H(vHist_(b|a))

end for

InfoGain_b = H_b - sum(p_a * H_(b|a))

end for

return arg_max over blobs (InfoGain_b)