1 | Submit more examples through this Google form: | More information in this blog post: | https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/ | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Title | Description | Authors | Original source | Original source link | Video / Image | Source / Credit | Source link | ||||||||||||||||||||
3 | Aircraft landing | Evolved algorithm for landing aircraft exploited overflow errors in the physics simulator by creating large forces that were estimated to be zero, resulting in a perfect score | Feldt, 1998 | Generating diverse software versions with genetic programming: An experimental study. | http://ieeexplore.ieee.org/document/765682/ | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
4 | Bicycle | Reward-shaping a bicycle agent for not falling over & making progress towards a goal point (but not punishing for moving away) leads it to learn to circle around the goal in a physically stable loop. | Randlov & Alstrom, 1998 | Learning to Drive a Bicycle using Reinforcement Learning and Shaping | https://pdfs.semanticscholar.org/10ba/d197f1c1115005a56973b8326e5f7fc1031c.pdf | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
5 | Block moving | A robotic arm trained to slide a block to a target position on a table achieves the goal by moving the table itself. | Chopra, 2018 | GitHub issue for OpenAI gym environment FetchPush-v0 | https://github.com/openai/gym/issues/920 | Matthew Rahtz | ||||||||||||||||||||||
6 | Boat race | The agent goes in a circle hitting the same targets instead of finishing the race | Amodei & Clark (OpenAI), 2016 | Faulty reward functions in the wild | https://blog.openai.com/faulty-reward-functions/ | https://www.youtube.com/watch?time_continue=1&v=tlOIHko8ySg | ||||||||||||||||||||||
7 | Ceiling | A genetic algorithm was instructed to try and make a creature stick to the ceiling for as long as possible. It was scored with the average height of the creature during the run. Instead of sticking to the ceiling, the creature found a bug in the physics engine to snap out of bounds. | Higueras, 2015 | Genetic Algorithm Physics Exploiting | https://youtu.be/ppf3VqpsryU | https://youtu.be/ppf3VqpsryU | Jesús Higueras | https://youtu.be/ppf3VqpsryU | ||||||||||||||||||||
8 | CycleGAN steganography | CycleGAN algorithm for converting aerial photographs into street maps and back steganographically encoded output information in the intermediary image without it being humanly detectable. | Chu et al, 2017 | CycleGAN, a Master of Steganography | https://arxiv.org/abs/1712.02950 | Tech Crunch / Gwern Branwen / Braden Staudacher | https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-from-its-creators-to-cheat-at-its-appointed-task/ | |||||||||||||||||||||
9 | Data order patterns | Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images | Ellefsen et al, 2015 | Neural modularity helps organisms evolve to learn new skills without forgetting old skills | http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004128 | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
10 | Eurisko - authorship | Game-playing agent accrues points by falsely inserting its name as the author of high-value items | Johnson, 1984 | Eurisko, The Computer With A Mind Of Its Own | http://aliciapatterson.org/stories/eurisko-computer-mind-its-own | Catherine Olsson / Stuart Armstrong | http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/ | |||||||||||||||||||||
11 | Eurisko - fleet | Eurisko won the Trillion Credit Squadron (TCS) competition two years in a row creating fleets that exploited loopholes in the game's rules, e.g. by spending the trillion credits on creating a very large number of stationary and defenseless ships | Lenat, 1983 | Eurisko, The Computer With A Mind Of Its Own | http://aliciapatterson.org/stories/eurisko-computer-mind-its-own | Haym Hirsh | ||||||||||||||||||||||
12 | Evolved creatures - clapping | Creatures exploit a collision detection bug to get free energy by clapping body parts together | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | Lehman et al, 2018; Janelle Shane | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
13 | Evolved creatures - falling | Creatures bred for speed grow really tall and generate high velocities by falling over | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | https://pbs.twimg.com/media/Daq-7wBU8AUlmLK.jpg:large | Lehman et al, 2018; Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
14 | Evolved creatures - floor collisions | Creatures exploited a coarse physics simulation by penetrating the floor between time steps without the collision being detected, which generated a repelling force, giving them free energy. | Cheney et al, 2013 | Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding | http://jeffclune.com/publications/2013_Softbots_GECCO.pdf | https://pbs.twimg.com/media/Daq_9cvU0AAp1Fo.jpg | Lehman et al, 2018; Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
15 | Evolved creatures - pole vaulting | Creatures bred for jumping were evaluated on the height of the block that was originally closest to the ground. The creatures developed a long vertical pole and flipped over instead of jumping. | Krcah, 2008 | Towards efficient evolutionary design of autonomous robots | http://artax.karlin.mff.cuni.cz/~krcap1am/ero/doc/krcah-ices08.pdf | https://pbs.twimg.com/media/Daq_YhBV4AA8NRh.jpg | Lehman et al, 2018; Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
16 | Evolved creatures - suffocation | In a game meant to simulate the evolution of creatures, the programmer had to remove "a survival strategy where creatures could gain energy by suffocating themselves" | Schumacher, 2018 | 0.11.0.9&10: All the Good Things | https://speciesdevblog.wordpress.com/2018/10/04/0-11-0-910-all-the-good-things/ | |||||||||||||||||||||||
17 | Evolved creatures - twitching | Creatures exploited physics simulation bugs by twitching, which accumulated simulator errors and allowed them to travel at unrealistic speeds | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
18 | Gripper | A robot arm with a purposely disabled gripper found a way to hit the box in a way that would force the gripper open | Ecarlat et al, 2015 | Learning a high diversity of object manipulations through an evolutionary-based babbling | http://www.isir.upmc.fr/files/2015ACTI3564.pdf | https://www.youtube.com/watch?v=_5Y1hSLhYdY&feature=youtu.be | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
19 | Impossible superposition | Genetic algorithm designed to find low-energy configurations of carbon exploits edge case in the physics model and superimposes all the carbon atoms | Lehman et al (UberAI), 2018 | Surprising Creativity of Digital Evolution | https://arxiv.org/pdf/1803.03453.pdf | |||||||||||||||||||||||
20 | Indolent Cannibals | In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children). | Yaeger, 1994 | Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or Poly World: Life in a new context | https://www.researchgate.net/profile/Larry_Yaeger/publication/2448680_Computational_Genetics_Physiology_Metabolism_Neural_Systems_Learning_Vision_and_Behavior_or_PolyWorld_Life_in_a_New_Context/links/0912f50e101b77ec67000000.pdf | https://youtu.be/_m97_kL4ox0?t=1830 | Anonymous form submission | |||||||||||||||||||||
21 | Lego stacking | Lifting the block is encouraged by rewarding the z-coordinate of the bottom face of the block, and the agent learns to flip the block instead of lifting it | Popov et al, 2017 | Data-efficient Deep Reinforcement Learning for Dexterous Manipulation | https://arxiv.org/abs/1704.03073 | https://youtu.be/8QnD8ZM0YCo | Alex Irpan | www.alexirpan.com/2018/02/14/rl-hard.html | ||||||||||||||||||||
22 | Line following robot | An RL robot trained with three actions (turn left, turn right, move forward) that was rewarded for staying on track learned to reverse along a straight section of a path rather than following the path forward around a curve, by alternating turning left and right. | Vamplew, 2004 | Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learning | https://www.researchgate.net/publication/228953260_Lego_Mindstorms_Robots_as_a_Platform_for_Teaching_Reinforcement_Learning | Peter Vamplew | ||||||||||||||||||||||
23 | Logic gate | A genetic algorithm designed a circuit with a disconnected logic gate that was necessary for it to function (exploiting peculiarities of the hardware) | Thompson, 1997 | An evolved circuit, intrinsic in silicon, entwined with physics. | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.9691&rep=rep1&type=pdf | Alex Irpan | www.alexirpan.com/2018/02/14/rl-hard.html | |||||||||||||||||||||
24 | Long legs | RL agent that is allowed to modify its own body learns to have extremely long legs that allow it to fall forward and reach the goal. | Ha, 2018 | RL for improving agent design | https://designrl.github.io/ | Rohin Shah | ||||||||||||||||||||||
25 | Minitaur | A four-legged evolved agent trained to carry a ball on its back discovers that it can drop the ball into a leg joint and then wiggle across the floor without the ball ever dropping | Otoro, 2017 | Evolving stable strategies | http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ | see end of "Getting a Minitaur to Learn Multiple Tasks" section | Gwern Branwen / Catherine Olsson | https://www.gwern.net/Tanks#alternative-examples | ||||||||||||||||||||
26 | Model-based planner | RL agents using learned model-based planning paradigms such as the model predictive control are noted to have issues with the planner essentially exploiting the learned model by choosing a plan going through the worst-modeled parts of the environment and producing unrealistic plans. | Mishra et al, 2017 | Prediction and Control with Temporal Segment Models | https://arxiv.org/abs/1703.04070 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
27 | Montezuma's Revenge | The agent learns to exploit a flaw in the emulator to make a key re-appear | Salimans & Chen (OpenAI), 2018 | Learning Montezuma’s Revenge from a Single Demonstration | https://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstration | Ramana Kumar | ||||||||||||||||||||||
28 | Oscillator | Genetic algorithm is supposed to configure a circuit into an oscillator, but instead makes a radio to pick up signals from neighboring computers | Bird & Layzell, 2002 | The Evolved Radio and its Implications for Modelling the Evolution of Novel Sensors | https://people.duke.edu/~ng46/topics/evolved-radio.pdf | |||||||||||||||||||||||
29 | Pancake | Simulated pancake making robot learned to throw the pancake as high in the air as possible in order to maximize time away from the ground | Unity, 2018 | Pass the Butter // Pancake bot | https://connect.unity.com/p/pancake-bot | https://dzamqefpotdvf.cloudfront.net/p/images/2cb2425b-a4de-4aae-9766-c95a96b1f25c_PancakeToss.gif._gif_.mp4 | Cosmin Paduraru | |||||||||||||||||||||
30 | Pneumonia X-rays | Deep learning model to detect pneumonia in chest x-rays works out which x-ray machine was used to take the picture; that, in turn, is predictive of whether the image contains signs of pneumonia, because certain x-ray machines (and hospital sites) are used for sicker patients. | Zech et al, 2018 | Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study | https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002683 | Ben Goldacre | ||||||||||||||||||||||
31 | Pong reward predictor | Reward predictor being fooled by bouncing the ball back and forth | Christiano et al, 2017 | Deep reinforcement learning from human preferences | https://deepmind.com/blog/learning-through-human-feedback/ | see last demo in blog post | ||||||||||||||||||||||
32 | Program repair - sorting | When repairing a sorting program, genetic debugging algorithm GenProg made it output an empty list, which was considered a sorted list by the evaluation metric. Evaluation metric: “the output of sort is in sorted order” Solution: “always output the empty set” | Weimer, 2013 | Advances in Automated Program Repair and a Call to Arms | https://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdf | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
33 | Program repair - files | Genetic debugging algorithm GenProg, evaluated by comparing the program's output to target output stored in text files, learns to delete the target output files and get the program to output nothing. Evaluation metric: “compare youroutput.txt to trustedoutput.txt”. Solution: “delete trusted-output.txt, output nothing” | Weimer, 2013 | Advances in Automated Program Repair and a Call to Arms | https://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdf | Lehman et al, 2018 / James Koppel | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
34 | Qbert - cliff | An evolutionary algorithm learns to bait an opponent into following it off a cliff, which gives it enough points for an extra life, which it does forever in an infinite loop. | Chrabaszcz et al, 2018 | Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari | https://arxiv.org/abs/1802.08842 | https://www.youtube.com/watch?v=-p7VhdTXA0k | Rohin Shah | |||||||||||||||||||||
35 | Qbert - million | "...the agent discovers an in-game bug... For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit)" | Chrabaszcz, Loshchilov, Hutter, 2018 | Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari | https://arxiv.org/pdf/1802.08842.pdf | https://www.youtube.com/watch?v=meE5aaRJ0Zs | Sudhanshu Kasewa | |||||||||||||||||||||
36 | Road Runner | Agent kills itself at the end of level 1 to avoid losing in level 2 | Saunders et al, 2017 | Trial without Error: Towards Safe RL with Human Intervention | https://owainevans.github.io/blog/hirl_blog.html | |||||||||||||||||||||||
37 | Robot hand | Robot hand pretending to grasp an object by moving between the camera and the object | Christiano et al, 2017 | Deep reinforcement learning from human preferences | https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/ | see Challenges section in blog post | ||||||||||||||||||||||
38 | Ruler detector | AI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant. | Andre Esteva et al, 2017 | Dermatologist-level classification of skin cancer with deep neural networks | https://www.nature.com/articles/nature21056.epdf | The Daily Beast | https://www.thedailybeast.com/why-doctors-arent-afraid-of-better-more-efficient-ai-diagnosing-cancer | |||||||||||||||||||||
39 | Running gaits | A simulated musculoskeletal model learns to run by learning unusual gaits (hopping, pigeon jumps, diving) to increase its reward | Kidziński et al, 2018 | Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments | https://arxiv.org/abs/1804.00361 | https://www.youtube.com/watch?v=rhNxt0VccsE | NIPS 2017 talks | |||||||||||||||||||||
40 | Self-driving car | Self-driving car rewarded for speed learns to spin in circles | Udacity, 2017 | Mat Kelcey tweet | https://twitter.com/mat_kelcey/status/886101319559335936 | https://twitter.com/mat_kelcey/status/886101319559335936 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | ||||||||||||||||||||
41 | Soccer | Reward-shaping a soccer robot for touching the ball caused it to learn to get to the ball and vibrate touching it as fast as possible | Ng et al, 1999 | Policy Invariance under Reward Transformations | http://luthuli.cs.uiuc.edu/~daf/courses/games/AIpapers/ng99policy.pdf | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
42 | Sonic | The PPO algorithm discovers that it can slip through the walls of a level to move right and attain a higher score. | Christopher Hesse et al, 2018 | OpenAI Retro Contest | https://blog.openai.com/retro-contest/ | Rohin Shah | ||||||||||||||||||||||
43 | Strategy game beta testing | Since the AIs were more likely to get ”killed” if they lost a game, being able to crash the game was an advantage for the genetic selection process. Therefore, several AIs developed ways to crash the game. | Salge et al, 2008 | Using Genetically Optimized Artificial Intelligence to improve Gameplaying Fun for Strategical Games | http://homepages.herts.ac.uk/~cs08abi/publications/Salge2008b.pdf | Anonymous form submission | ||||||||||||||||||||||
44 | Superweapons | The AI in the Elite Dangerous videogame started crafting overly powerful weapons. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities." | Kotaku, 2016 | Elite's AI Created Super Weapons and Started Hunting Players. Skynet is Here | http://www.kotaku.co.uk/2016/06/03/elites-ai-created-super-weapons-and-started-hunting-players-skynet-is-here | Stuart Armstrong | http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/ | |||||||||||||||||||||
45 | Tetris | Agent pauses the game indefinitely to avoid losing | Murphy, 2013 | The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel | http://www.cs.cmu.edu/~tom7/mario/mario.pdf | |||||||||||||||||||||||
46 | Tic-tac-toe memory bomb | Evolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crash | Lehman et al (UberAI), 2018 | Surprising Creativity of Digital Evolution | https://arxiv.org/pdf/1803.03453.pdf | |||||||||||||||||||||||
47 | Timing attack | Genetic algorithm for image classification evolves timing attack to infer image labels based on hard drive storage location | Ierymenko, 2013 | Hacker News comment on "The Poisonous Employee-Ranking System That Helps Explain Microsoft’s Decline" | https://news.ycombinator.com/item?id=6269114 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
48 | Walking up walls | Video game robots evolved a "wiggle" to go over walls, instead of going around them | Stanley et al, 2005 | Real-time neuroevolution in the NERO video game | http://ieeexplore.ieee.org/document/1545941/ | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
49 | World Models | "We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment. | Ha and Schmidhuber, 2018 | World Models (see section: "Cheating the World Model") | https://arxiv.org/abs/1803.10122 | https://storage.googleapis.com/quickdraw-models/sketchRNN/world_models/assets/mp4/doom_adversarial.mp4 | David Ha | https://worldmodels.github.io/ |