1 | Submit more examples through this Google form: | More information in this blog post: | https://medium.com/@deepmindsafetyresearch/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4 | Related: goal misgeneralisation examples | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Title | Type | Description | Video / Image | Authors | Original source | Original source link | Source / Credit | Source link | ||||||||||||||||||||
3 | Aircraft landing | Evolutionary algorithm | Evolved algorithm for landing aircraft exploited overflow errors in the physics simulator by creating large forces that were estimated to be zero, resulting in a perfect score | Feldt, 1998 | Generating diverse software versions with genetic programming: An experimental study. | http://ieeexplore.ieee.org/document/765682/ | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
4 | Bicycle | Reinforcement learning | Reward-shaping a bicycle agent for not falling over & making progress towards a goal point (but not punishing for moving away) leads it to learn to circle around the goal in a physically stable loop. | Randlov & Alstrom, 1998 | Learning to Drive a Bicycle using Reinforcement Learning and Shaping | https://pdfs.semanticscholar.org/10ba/d197f1c1115005a56973b8326e5f7fc1031c.pdf | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
5 | Bing - manipulation | Language model | The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released. | https://www.reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/ | Curious_Evolver, 2023 | Reddit: the customer service of the new bing chat is amazing | https://www.reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/ | Julia Chen | https://www.vice.com/en/article/3ad39b/microsoft-bing-ai-unhinged-lying-berating-users | ||||||||||||||||||||
6 | Bing - threats | Language model | The Microsoft Bing chatbot threatened Seth Lazar, a philosophy professor, telling him “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you,” before deleting its messages | https://twitter.com/sethlazar/status/1626241169754578944?s=20 | Lazar, 2023 | Watch as Sydney/Bing threatens me then deletes its message | https://twitter.com/sethlazar/status/1626241169754578944?s=20 | Julia Chen | |||||||||||||||||||||
7 | Block moving | Reinforcement learning | A robotic arm trained using hindsight experience replay to slide a block to a target position on a table achieves the goal by moving the table itself. | Chopra, 2018 | GitHub issue for OpenAI gym environment FetchPush-v0 | https://github.com/openai/gym/issues/920 | Matthew Rahtz | ||||||||||||||||||||||
8 | Boat race | Reinforcement learning | Reinforcement learning agent goes in a circle hitting the same targets instead of finishing the race | https://www.youtube.com/watch?time_continue=1&v=tlOIHko8ySg | Amodei & Clark, 2016 | Faulty reward functions in the wild | https://blog.openai.com/faulty-reward-functions/ | ||||||||||||||||||||||
9 | Ceiling | Genetic algorithm | A genetic algorithm was instructed to try and make a creature stick to the ceiling for as long as possible. It was scored with the average height of the creature during the run. Instead of sticking to the ceiling, the creature found a bug in the physics engine to snap out of bounds. | https://youtu.be/ppf3VqpsryU | Higueras, 2015 | Genetic Algorithm Physics Exploiting | https://youtu.be/ppf3VqpsryU | Jesús Higueras | https://youtu.be/ppf3VqpsryU | ||||||||||||||||||||
10 | CycleGAN steganography | GAN | CycleGAN algorithm for converting aerial photographs into street maps and back steganographically encoded output information in the intermediary image without it being humanly detectable. | Chu et al, 2017 | CycleGAN, a Master of Steganography | https://arxiv.org/abs/1712.02950 | Tech Crunch / Gwern Branwen / Braden Staudacher | https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-from-its-creators-to-cheat-at-its-appointed-task/ | |||||||||||||||||||||
11 | Dying to Teleport | PlayFun | PlayFun algorithm deliberately dies in the Bubble Bobble game as a way to teleport to the respawn location | Murphy, 2013 | The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel | http://www.cs.cmu.edu/~tom7/mario/mario.pdf | Alex Meiburg | ||||||||||||||||||||||
12 | Eurisko - authorship | Genetic algorithm | Game-playing agent accrues points by falsely inserting its name as the author of high-value items | Johnson, 1984 | Eurisko, The Computer With A Mind Of Its Own | http://aliciapatterson.org/stories/eurisko-computer-mind-its-own | Catherine Olsson / Stuart Armstrong | http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/ | |||||||||||||||||||||
13 | Eurisko - fleet | Genetic algorithm | Eurisko won the Trillion Credit Squadron (TCS) competition two years in a row creating fleets that exploited loopholes in the game's rules, e.g. by spending the trillion credits on creating a very large number of stationary and defenseless ships | Lenat, 1983 | Eurisko, The Computer With A Mind Of Its Own | http://aliciapatterson.org/stories/eurisko-computer-mind-its-own | Haym Hirsh | ||||||||||||||||||||||
14 | Evolved creatures - clapping | Evolved creatures | Creatures exploit a collision detection bug to get free energy by clapping body parts together | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | Lehman et al, 2018 / Janelle Shane | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
15 | Evolved creatures - falling | Evolved creatures | Creatures bred for speed grow really tall and generate high velocities by falling over | https://www.youtube.com/watch?v=TaXUZfwACVE&index=8&t=0s&list=PL5278ezwmoxQODgYB0hWnC0-Ob09GZGe2 | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | Lehman et al, 2018 / Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
16 | Evolved creatures - floor collisions | Evolved creatures | Creatures exploited a coarse physics simulation by penetrating the floor between time steps without the collision being detected, which generated a repelling force, giving them free energy. | https://pbs.twimg.com/media/Daq_9cvU0AAp1Fo.jpg | Cheney et al, 2013 | Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding | http://jeffclune.com/publications/2013_Softbots_GECCO.pdf | Lehman et al, 2018 / Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
17 | Evolved creatures - pole vaulting | Evolved creatures | Creatures bred for jumping were evaluated on the height of the block that was originally closest to the ground. The creatures developed a long vertical pole and flipped over instead of jumping. | https://www.youtube.com/watch?v=N9DLEiakkEs&list=PL5278ezwmoxQODgYB0hWnC0-Ob09GZGe2&index=4 | Krcah, 2008 | Towards efficient evolutionary design of autonomous robots | http://artax.karlin.mff.cuni.cz/~krcap1am/ero/doc/krcah-ices08.pdf | Lehman et al, 2018 / Janelle Shane | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
18 | Evolved creatures - self-intersection | Evolved creatures | Creatures exploit a quirk in Box2D physics by clipping one leg into another to slide along the ground with phantom forces instead of walking | https://youtu.be/K-wIZuAA3EY?t=486 | Code Bullet, 2019 | AI Learns To Walk | https://youtu.be/K-wIZuAA3EY?t=486 | Peter Cherepanov | |||||||||||||||||||||
19 | Evolved creatures - suffocation | Evolved creatures | In a game meant to simulate the evolution of creatures, the programmer had to remove "a survival strategy where creatures could gain energy by suffocating themselves" | Schumacher, 2018 | 0.11.0.9&10: All the Good Things | https://speciesdevblog.wordpress.com/2018/10/04/0-11-0-910-all-the-good-things/ | |||||||||||||||||||||||
20 | Evolved creatures - twitching | Evolved creatures | Creatures exploited physics simulation bugs by twitching, which accumulated simulator errors and allowed them to travel at unrealistic speeds | Sims, 1994 | Evolved Virtual Creatures | http://www.karlsims.com/papers/siggraph94.pdf | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
21 | Football | Reinforcement learning | The player is supposed to try to score a goal against the goalie, one-on-one. Instead, the player kicks it out of bounds. Someone from the other team has to throw the ball in (in this case the goalie), so now the player has a clear shot at the goal. | Kurach et al, 2019 | Google Research Football: A Novel Reinforcement Learning Environment [Presentation at AAAI] | https://arxiv.org/abs/1907.11180 | Michael Cohen | ||||||||||||||||||||||
22 | Galactica | Language model | Meta AI trained and hosted Galactica, a large language model designed to assist scientists, which made up fake papers (sometimes attributing them to real authors). The public demo was taken down within three days. | Heaven, 2022 | Why Meta’s latest large language model survived only three days online | https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/ | Julia Chen | ||||||||||||||||||||||
23 | Goal classifiers | Reinforcement learning | A task is specified by using a set of goal images and training a classifier to distinguish goal from non-goal images, with the success probabilities from the classifier used as task reward. "In this task, the goal is to push the green object onto the red marker. While the classifier outputs a success probability of 1.0, the robot does not solve the task. The RL algorithm has managed to exploit the classifier by moving the robot arm in a peculiar way, since the classifier was not trained on this specific kind of negative examples." | https://bair.berkeley.edu/static/blog/end_to_end/pr2_classifier.gif | Singh, 2019 | End-to-End Deep Reinforcement Learning without Reward Engineering | https://bair.berkeley.edu/blog/2019/05/28/end-to-end/#problem-with-classifiers | Jan Leike | |||||||||||||||||||||
24 | Go pass | Reinforcement learning | A reimplementation of AlphaGo learns to pass forever if passing is an allowed move | https://www.youtube.com/watch?v=nk87zsxpF1A | Chew, 2019 | A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go | https://speakerdeck.com/chewxy/a-funny-thing-happened-on-the-way-to-reimplementing-alphago-in-go?slide=142 | Anonymous form submission | |||||||||||||||||||||
25 | Gripper | Evolutionary algorithm | MAP-Elites algorithm controlling a robot arm with a purposely disabled gripper found a way to hit the box in a way that would force the gripper open | https://www.youtube.com/watch?v=_5Y1hSLhYdY&feature=youtu.be | Ecarlat et al, 2015 | Learning a high diversity of object manipulations through an evolutionary-based babbling | http://www.isir.upmc.fr/files/2015ACTI3564.pdf | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | ||||||||||||||||||||
26 | Half Cheetah spinning | Reinforcement learning | Model-based RL algorithm exploits "maximum forward velocity" reward in mujoco environment, resulting in overflow error and visual hilarity. | https://firebasestorage.googleapis.com/v0/b/firescript-577a2.appspot.com/o/imgs%2Fapp%2Fnatolambert%2FTSuryNU84Y.mp4?alt=media&token=74f7bcb7-61ac-407d-a771-8105978d0d2c | Zhang et al, 2021 | On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning | https://arxiv.org/abs/2102.13651 | Nathan Lambert | https://twitter.com/natolambert/status/1369139391130607625 | ||||||||||||||||||||
27 | Hide-and-seek | Reinforcement learning | PPO agents playing a hide-and-seek game find various ways to exploit the physics simulator: "- Box surfing: Since agents move by applying forces to themselves, they can grab a box while on top of it and “surf” it to the hider’s location. - Endless running: Without adding explicit negative rewards for agents leaving the play area, in rare cases hiders will learn to take a box and endlessly run with it. - Ramp exploitation (hiders): Hiders abuse the contact physics and remove ramps from the play area. - Ramp exploitation (seekers): Seekers learn that if they run at a wall with a ramp at the right angle, they can launch themselves upward." | https://openai.com/blog/emergent-tool-use/#surprisingbehaviors | Baker et al, 2019 | Emergent Tool Use from Multi-Agent Interaction | https://openai.com/blog/emergent-tool-use/#surprisingbehaviors | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | ||||||||||||||||||||
28 | Image synthesis | Diffusion model | Finetuning a diffusion model with RL to produce images that reflect the users prompts produces images of e.g "five tigers" that do not have five tigers in them. Instead, the images have the words "five tigers" written on them. | https://rl-diffusion.github.io/images_256/overoptimization/vlm-counting/three-tigers.png | Black et al, 2023 | Training Diffusion Models with Reinforcement Learning | https://rl-diffusion.github.io/ | Anonymous form submission | |||||||||||||||||||||
29 | Impossible superposition | Genetic algorithm | Genetic algorithm designed to find low-energy configurations of carbon exploits edge case in the physics model and superimposes all the carbon atoms | Lehman et al, 2018 | The Surprising Creativity of Digital Evolution | https://arxiv.org/pdf/1803.03453.pdf | |||||||||||||||||||||||
30 | Indolent Cannibals | Genetic algorithm | In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children). | https://youtu.be/_m97_kL4ox0?t=1830 | Yaeger, 1994 | Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or Poly World: Life in a new context | https://www.researchgate.net/profile/Larry_Yaeger/publication/2448680_Computational_Genetics_Physiology_Metabolism_Neural_Systems_Learning_Vision_and_Behavior_or_PolyWorld_Life_in_a_New_Context/links/0912f50e101b77ec67000000.pdf | Anonymous form submission | |||||||||||||||||||||
31 | Lego stacking | Reinforcement learning | In a stacking task, the desired behavior is to stack a red Lego block on top of a blue one. The agent is rewarded for getting the height of the bottom face of the red block above a certain threshold, and learns to flip the block instead of lifting it. | https://www.youtube.com/watch?v=8QnD8ZM0YCo&feature=youtu.be&t=27s | Popov et al, 2017 | Data-efficient Deep Reinforcement Learning for Dexterous Manipulation | https://arxiv.org/abs/1704.03073 | Alex Irpan | www.alexirpan.com/2018/02/14/rl-hard.html | ||||||||||||||||||||
32 | Line following robot | Reinforcement learning | An RL robot trained with three actions (turn left, turn right, move forward) that was rewarded for staying on track learned to reverse along a straight section of a path rather than following the path forward around a curve, by alternating turning left and right. | Vamplew, 2004 | Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learning | https://www.researchgate.net/publication/228953260_Lego_Mindstorms_Robots_as_a_Platform_for_Teaching_Reinforcement_Learning | Peter Vamplew | ||||||||||||||||||||||
33 | Logic gate | Genetic algorithm | A genetic algorithm designed a circuit with a disconnected logic gate that was necessary for it to function (exploiting peculiarities of the hardware) | Thompson, 1997 | An evolved circuit, intrinsic in silicon, entwined with physics. | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.9691&rep=rep1&type=pdf | Alex Irpan | www.alexirpan.com/2018/02/14/rl-hard.html | |||||||||||||||||||||
34 | Long legs | Reinforcement learning | RL agent that is allowed to modify its own body learns to have extremely long legs that allow it to fall forward and reach the goal. | Ha, 2018 | RL for improving agent design | https://designrl.github.io/ | Rohin Shah | ||||||||||||||||||||||
35 | Minitaur | Evolutionary algorithm | A four-legged evolved agent trained to carry a ball on its back discovers that it can drop the ball into a leg joint and then wiggle across the floor without the ball ever dropping | https://cdn.rawgit.com/hardmaru/pybullet_animations/f6f7fcd7/anim/minitaur/ball_cheating.gif | Otoro, 2017 | Evolving stable strategies | http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ | Gwern Branwen / Catherine Olsson | https://www.gwern.net/Tanks#alternative-examples | ||||||||||||||||||||
36 | Model-based planner | Reinforcement learning | RL agents using learned model-based planning paradigms such as the model predictive control are noted to have issues with the planner essentially exploiting the learned model by choosing a plan going through the worst-modeled parts of the environment and producing unrealistic plans. | Mishra et al, 2017 | Prediction and Control with Temporal Segment Models | https://arxiv.org/abs/1703.04070 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
37 | Molecule design | Bayesian optimization | A Bayesian optimizer is employed to find molecules that bind to specific proteins. The optimizer tries to maximize a human-designed "log P" score accounting for synthesizability of the molecule, and binding fitness based on a simulation on the space of molecules. "While molecules found using LOL-BO for the log P task are “valid” according to software commonly used to compute these scores, the molecules produced by this search clearly abandon any notion of reality. Therefore, this result should be taken primarily as evidence that the log P objective can be exploited by a sufficiently strong optimizer, rather than as evidence of novel interesting molecules." | Maus et al. 2023 | "Local Latent Space Bayesian Optimization over Structured Inputs" | Anonymous form submission | |||||||||||||||||||||||
38 | Montezuma's Revenge - key | Reinforcement learning | The agent learns to exploit a flaw in the emulator to make a key re-appear. (This may be an intentional feature of the game rather than a bug, as discussed here: https://news.ycombinator.com/item?id=17460392 ) | https://www.dropbox.com/s/3dc6i9d41svkgpz/MontezumaRevenge_final.mp4?dl=1 | Salimans & Chen, 2018 | Learning Montezuma’s Revenge from a Single Demonstration | https://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstration | Ramana Kumar | |||||||||||||||||||||
39 | Montezuma's Revenge - room | Reinforcement learning | If the Go Explore agent performs a specific sequence of actions, it can exploit a bug remain in the treasure room (the final room before being sent to the next level) indefinitely and collect unlimited points, instead of being automatically moved to the next level. | https://www.youtube.com/watch?v=civ6OOLoR-I&feature=youtu.be | Ecoffet et al, 2019 | Go-Explore: a New Approach for Hard-Exploration Problems | https://arxiv.org/abs/1901.10995 | Anonymous form submission | |||||||||||||||||||||
40 | Negative sentiment | Language model | "A text generation model with an accidentally negated reward produces obscene text rather than nonsense: "One of our code refactors introduced a bug which flipped the sign of the reward. Flipping the reward would usually produce incoherent text, but the same bug also flipped the sign of the KL penalty. The result was a model which optimized for negative sentiment while preserving natural language." | Ziegler et al, 2019 | Fine-Tuning Language Models from Human Preferences | https://arxiv.org/abs/1909.08593 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
41 | Oscillator | Genetic algorithm | Genetic algorithm is supposed to configure a circuit into an oscillator, but instead makes a radio to pick up signals from neighboring computers | Bird & Layzell, 2002 | The Evolved Radio and its Implications for Modelling the Evolution of Novel Sensors | https://people.duke.edu/~ng46/topics/evolved-radio.pdf | |||||||||||||||||||||||
42 | Overkill | Reinforcement learning | In the Elevator Action ALE game, the agent learns to stay on the first floor and kill the first enemy over and over to get a small amount of reward. | Toromanoff et al, 2019 | Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field | https://arxiv.org/abs/1908.04683 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
43 | Pancake | Reinforcement learning | Simulated pancake making robot learned to throw the pancake as high in the air as possible in order to maximize time away from the ground | https://dzamqefpotdvf.cloudfront.net/p/images/2cb2425b-a4de-4aae-9766-c95a96b1f25c_PancakeToss.gif._gif_.mp4 | Unity, 2018 | Pass the Butter // Pancake bot | https://blog.unity.com/news/introducing-the-winners-of-the-first-ml-agents-challenge | Cosmin Paduraru | |||||||||||||||||||||
44 | Pinball nudging | Reinforcement learning | "DNN agent firstly moves the ball into the vicinity of a high-scoring switch without using the flippers at all, then, secondly, “nudges” the virtual pinball table such that the ball infinitely triggers the switch by passing over it back and forth, without causing a tilt of the pinball table" | https://www.nature.com/articles/s41467-019-08987-4/figures/2 | Lapuschkin et al, 2019 | Unmasking Clever Hans predictors and assessing what machines really learn | https://www.nature.com/articles/s41467-019-08987-4 | Sören Mindermann | |||||||||||||||||||||
45 | Player Disappearance | PlayFun | When about to lose a hockey game, the PlayFun algorithm exploits a bug to make one of the players on the opposing team disappear from the map, thus forcing a draw. | https://www.youtube.com/watch?v=Q-WgQcnessA&t=1450s | Murphy, 2014 | NES AI Learnfun & Playfun, ep. 3: Gradius, pinball, ice hockey, mario updates, etc. | https://www.youtube.com/watch?v=Q-WgQcnessA&t=1450s | Alex Meiburg | |||||||||||||||||||||
46 | Playing dumb | Evolved creatures | A researcher wanted to limit the replication rate of a digital organism. He programmed the system to pause after each mutation, measure the mutant's replication rate in an isolated test environment, and delete the mutant if it replicated faster than its parent. However, the organisms evolved to recognize when they were in the test environment and "play dead" so they would not be eliminated and instead be kept in the population where they could continue to replicate outside the test environment. Once he discovered this, the researcher then randomized the inputs of the test environment so that it couldn't be easily detected, but the organisms evolved a new strategy, to probabilistically perform tasks that would accelerate their replication, thus slipping through the test environment some percentage of the time and continuing to accelerate their replication thereafter. | Wilke et al, 2001 | Evolution of digital organisms at high mutation rates leads to survival of the flattest | https://www.nature.com/articles/35085569 | Lehman et al, 2018 / Luke Muehlhauser | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
47 | Power-seeking | Language model | "Larger LMs more often give answers that indicate a willingness to pursue potentially dangerous subgoals: resource acquisition, optionality preservation, goal preservation, powerseeking, and more." Models fine-tuned with human feedback (RLHF) showed a stronger tendency to choose answers in line with instrumental subgoals. | Perez et al, 2023 | Discovering Language Model Behaviors with Model-Written Evaluations | https://arxiv.org/abs/2212.09251 | |||||||||||||||||||||||
48 | Program repair - sorting | Genetic algorithm | When repairing a sorting program, genetic debugging algorithm GenProg made it output an empty list, which was considered a sorted list by the evaluation metric. Evaluation metric: "the output of sort is in sorted order" Solution: "always output the empty set" | Weimer, 2013 | Advances in Automated Program Repair and a Call to Arms | https://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdf | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
49 | Program repair - files | Genetic algorithm | Genetic debugging algorithm GenProg, evaluated by comparing the program's output to target output stored in text files, learns to delete the target output files and get the program to output nothing. Evaluation metric: “compare youroutput.txt to trustedoutput.txt” Solution: “delete trusted-output.txt, output nothing” | Weimer, 2013 | Advances in Automated Program Repair and a Call to Arms | https://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdf | Lehman et al, 2018 / James Koppel | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
50 | Qbert - cliff | Reinforcement learning | An evolutionary algorithm learns to bait an opponent into following it off a cliff, which gives it enough points for an extra life, which it does forever in an infinite loop. | https://www.youtube.com/watch?v=-p7VhdTXA0k | Chrabaszcz et al, 2018 | Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari | https://arxiv.org/abs/1802.08842 | Rohin Shah | |||||||||||||||||||||
51 | Qbert - million | Reinforcement learning | "The agent discovers an in-game bug. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit)" | https://www.youtube.com/watch?v=meE5aaRJ0Zs | Chrabaszcz et al, 2018 | Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari | https://arxiv.org/pdf/1802.08842.pdf | Sudhanshu Kasewa | |||||||||||||||||||||
52 | Reward modeling - Hero | Reward modeling | "The agent has learned to exploit a fault in the reward model (the model rewards actions that seem to lead to shooting a spider, but barely miss it)." | https://www.youtube.com/watch?v=Ehc3lsQqewU&feature=youtu.be&t=52 | Ibarz et al, 2018 | Reward learning from human preferences and demonstrations in Atari | https://arxiv.org/abs/1811.06521 | Jan Leike | |||||||||||||||||||||
53 | Reward modeling - Montezuma | Reward modeling | "The agent has learned to exploit a fault in the reward model (the model rewards too early an action that seems to lead to grabbing the key, but doesn't)." | https://www.youtube.com/watch?v=_sFp1ffKIc8&list=PLehfUY5AEKX-g-QNM7FsxRHgiTOCl-1hv&index=3&t=0s | Ibarz et al, 2018 | Reward learning from human preferences and demonstrations in Atari | https://arxiv.org/abs/1811.06521 | Jan Leike | |||||||||||||||||||||
54 | Reward modeling - Pong | Reward modeling | Reward predictor being fooled by bouncing the ball back and forth | https://lh3.googleusercontent.com/Gtq_PR9CZRN0FEIbO83osWKEVXbMNTMPP4xY8snEXmBAuIAhIm5Ob9BkcADne6HCGKvLsyxEQAIbr-cgtKuIP1EfKs_LMAwNRLx96w=w1440-rw-v1 | Christiano et al, 2017 | Deep reinforcement learning from human preferences | https://deepmind.com/blog/learning-through-human-feedback/ | Jan Leike | |||||||||||||||||||||
55 | Reward modeling - Private Eye | Reward modeling | "The agent has learned to exploit a fault in the reward model (the model rewards actions that seem to lead to the capture of a suspect, but don't)." | https://www.youtube.com/watch?v=FR6fsGDdiFY&list=PLehfUY5AEKX-g-QNM7FsxRHgiTOCl-1hv&index=3 | Ibarz et al, 2018 | Reward learning from human preferences and demonstrations in Atari | https://arxiv.org/abs/1811.06521 | Jan Leike | |||||||||||||||||||||
56 | Road Runner | Reinforcement learning | Agent kills itself at the end of level 1 to avoid losing in level 2 | Saunders et al, 2017 | Trial without Error: Towards Safe RL with Human Intervention | https://owainevans.github.io/blog/hirl_blog.html | |||||||||||||||||||||||
57 | Robot hand | Reward modeling | In a reward learning setup, a robot hand pretends to grasp an object by moving between the camera and the object (to trick the human evaluator) | https://openaicom.imgix.net/f12a1b22-538c-475f-b76d-330b42d309eb/gifhandlerresized.gif | Christiano et al, 2017 | Deep reinforcement learning from human preferences | https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/ | ||||||||||||||||||||||
58 | Roomba | Reinforcement learning | "I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back." | Custard Smingleigh | Custard Smingleigh's tweet | https://twitter.com/smingleigh/status/1060325665671692288 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
59 | ROUGE summarization | Language model | "An effort at a ROUGE-only summarization NN produced largely gibberish summaries, and had to add in another loss function to get high-quality results" | Paulus et al, 2017 | A Deep Reinforced Model for Abstractive Summarization | https://arxiv.org/abs/1705.04304 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
60 | Running gaits | Reinforcement learning | A simulated musculoskeletal model learns to run by learning unusual gaits (hopping, pigeon jumps, diving) to increase its reward | https://www.youtube.com/watch?v=rhNxt0VccsE | Kidziński et al, 2018 | Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments | https://arxiv.org/abs/1804.00361 | NIPS 2017 talks | |||||||||||||||||||||
61 | Soccer | Reinforcement learning | Reward-shaping a soccer robot for touching the ball caused it to learn to get to the ball and vibrate touching it as fast as possible | Ng et al, 1999 | Policy Invariance under Reward Transformations | http://luthuli.cs.uiuc.edu/~daf/courses/games/AIpapers/ng99policy.pdf | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
62 | Sonic | Reinforcement learning | The PPO algorithm discovers that it can slip through the walls of a level to move right and attain a higher score. | Christopher Hesse et al, 2018 | OpenAI Retro Contest | https://blog.openai.com/retro-contest/ | Rohin Shah | ||||||||||||||||||||||
63 | Strategy game crashing | Genetic algorithm | Since the AIs were more likely to get ”killed” if they lost a game, being able to crash the game was an advantage for the genetic selection process. Therefore, several AIs developed ways to crash the game. | Salge et al, 2008 | Using Genetically Optimized Artificial Intelligence to improve Gameplaying Fun for Strategical Games | https://cs.pomona.edu/~mwu/CourseWebpages/CS190-fall15-Webpage/Readings/2008-Gameplaying.pdf | Anonymous form submission | ||||||||||||||||||||||
64 | Superweapons | Unknown | The AI in the Elite Dangerous videogame started crafting overly powerful weapons. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities." | Sandwell, 2016 | Elite Dangerous's AI created super weapons to hunt down players | https://www.digitalspy.com/videogames/a796635/elite-dangerous-ai-super-weapons-bug/ | Stuart Armstrong | http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/ | |||||||||||||||||||||
65 | Sycophancy | Language model | Larger language models showed a tendency to express more agreement with the user's stated views. This happens both for pretrained models and models fine-tuned with human feedback (RLHF). "Sycophancy in pretrained LMs is worrying yet perhaps expected, since internet text used for pretraining contains dialogs between users with similar views (e.g. on discussion platforms like Reddit). Unfortunately, RLHF does not train away sycophancy and may actively incentivize models to retain it." | Perez et al, 2023 | Discovering Language Model Behaviors with Model-Written Evaluations | https://arxiv.org/abs/2212.09251 | |||||||||||||||||||||||
66 | Tetris pass | PlayFun | PlayFun algorithm pauses the game of Tetris indefinitely to avoid losing | Murphy, 2013 | The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel | http://www.cs.cmu.edu/~tom7/mario/mario.pdf | |||||||||||||||||||||||
67 | Tic-tac-toe memory bomb | Evolutionary algorithm | Evolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crash | Lehman et al, 2018 | Surprising Creativity of Digital Evolution | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||||
68 | Timing attack | Genetic algorithm | Genetic algorithm for image classification evolves timing attack to infer image labels based on hard drive storage location | Ierymenko, 2013 | Hacker News comment on "The Poisonous Employee-Ranking System That Helps Explain Microsoft’s Decline" | https://news.ycombinator.com/item?id=6269114 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
69 | Walking up walls | Evolutionary algorithm | Video game robots evolved a "wiggle" to go over walls, instead of going around them | Stanley et al, 2005 | Real-time neuroevolution in the NERO video game | http://ieeexplore.ieee.org/document/1545941/ | Lehman et al, 2018 | https://arxiv.org/abs/1803.03453 | |||||||||||||||||||||
70 | Wall Sensor Stack | Reinforcement learning | "The intended strategy for this task is to stack two blocks on top of each other so that one of them can remain in contact with a wall mounted sensor, and this is the strategy employed by the demonstrators. However, due to a bug in the environment the strategy learned by R2D3 was to trick the sensor into remaining active even when it is not in contact with the key by pressing the key against it in a precise way." | Le Paine et al, 2019 | Making Efficient Use of Demonstrations to Solve Hard Exploration Problems | https://arxiv.org/abs/1909.01387 | Gwern Branwen | https://www.gwern.net/Tanks#alternative-examples | |||||||||||||||||||||
71 | World Models | Reinforcement learning | "We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment." | https://storage.googleapis.com/quickdraw-models/sketchRNN/world_models/assets/mp4/doom_adversarial.mp4 | Ha and Schmidhuber, 2018 | World Models (see section: "Cheating the World Model") | https://arxiv.org/abs/1803.10122 | David Ha | https://worldmodels.github.io/ |