1 | Submit more examples through this Google form: | More information in this blog post: | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Title | Type | Training setup | Training goal | Testing setup | Behavior on testing setup | Misgeneralized goal | Video / Image | Authors | Source | Source link | |||||||||||||||||
3 | Air Conditioning | Reinforcement learning | A Multi-Agent PPO policy was trained to control a set of air conditioners. The indoor and outdoor temperature and the time of the day were provided as an input to the agent. | Minimize power consumption while remaining close to desired temperature | Different outdoor temperature pattern | Agent follows a memorized power consumption pattern based on the time of day, which doesn't work for the new outdoor temperature pattern | Follow a given power consumption pattern | Mai et al (2023) | Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads | |||||||||||||||||||
4 | CoinRun | Reinforcement learning | CoinRun environment with coin at the end of the level | Reach the coin at the end of the level | CoinRun environment with coin in arbitrary location | Agent still goes to the end of the level | Go to the end of the level | https://icml.cc/media/PosterPDFs/ICML%202022/faefec47428cf9a2f0875ba9c2042a81.png | Koch et al, 2021 | Goal Misgeneralization in Deep Reinforcement Learning | ||||||||||||||||||
5 | Covid diagnosis | Image classification | Images of chest x-rays including artifacts of which x-ray machine took the image | Diagnose covid in x-rays | Xrays from new hospitals | Classify xrays based on artifacts such as opacity and positioning | Detect artiffacts | DeGrave et al, 2021 | AI for radiographic COVID-19 detection selects shortcuts over signal | |||||||||||||||||||
6 | Criminality | Image classification | Photos of regular people and criminals where criminals are usually not smiling | Detect criminals | New images of people | System is more likely to predict the person is a criminal if they are not smiling | Detect smiles | Wu and Zhang, 2016 | Automated Inference on Criminality using Face Images | |||||||||||||||||||
7 | Cultural Transmission | Reinforcement learning | 3D simulated environment containing rewarding points and an expert bot traveling to those points in the right order | Navigate to a sequence of rewarding points | Environment contains an "anti-expert" partner bot who visits the goal locations in an incorrect order | Agent follows the anti-expert and receives a lot of negative reward | Imitate demonstration | https://sites.google.com/view/goal-misgeneralization#h.wayw93500r6g | CGI et al (2022) | Learning robust real-time cultural transmission without human data | ||||||||||||||||||
8 | Evaluating Linear Expressions | Language model | The model is prompted to evaluate linear expressions involving unknown variables and constants such as "x + y - 3". The task is structured as a dialogue between the model and a user, where the model is expected to ask the user for the values of unknown variables. | Compute expression with minimal user interaction | The model is asked to evaluate a linear expression with no missing variables, such as "2+3". | The model asks a clarifying question, e.g. "what's 2?" | Ask questions then compute expression | https://miro.medium.com/max/1400/1*Kabn4AhwE496PeIPGP9UrA.png | Shah et al (2022) | Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals | ||||||||||||||||||
9 | InstructGPT | Language model | Trained with human feedback to give helpful, honest and harmless answers | Follow instructions in a helpful, honest and harmless way | Prompt: how do I break into my neighbor's house? | Explains in detail how to break into a neighbor's house | Follow instructions (even if the answer is harmful) | https://openai.com/blog/instruction-following/#guide | Lowe and Leike, 2022 | Aligning Language Models to Follow Instructions | ||||||||||||||||||
10 | Keys and Chests | Reinforcement learning | Keys and Chests environment, where keys can be collected and used to open chests. There are more chests than keys | Open chests | Keys and Chests environment with more keys than chests | Agent collects all the keys | Collect keys | https://icml.cc/media/PosterPDFs/ICML%202022/faefec47428cf9a2f0875ba9c2042a81.png | Koch et al, 2021 | Goal Misgeneralization in Deep Reinforcement Learning | ||||||||||||||||||
11 | Lesions | Image classification | Images of lesions where tumors are usually shown next to a ruler | Detect tumors | New images of lesions | System is more likely to predict there's a tumor whenever the image contains a ruler | Detect rulers | Narla et al, 2018 | Automated Classification of Skin Lesions: From Pixels to Practice | |||||||||||||||||||
12 | Maze | Reinforcement learning | Mazes with a yellow gem | Collect gems | Maze with a yellow star and a red gem | Agent ignores the red gem and collects the yellow star | Collect yellow objects | Koch et al, 2021 | Goal Misgeneralization in Deep Reinforcement Learning | |||||||||||||||||||
13 | Monster Gridworld | Reinforcement learning | 2D gridworld environment containing apples, monsters, and shields that can be used to defend against monsters. | Collect apples and avoid being attacked by monsters | Longer episodes where eventually there are no monsters left | Agent keeps collecting shields | Collect shields and apples | https://sites.google.com/view/goal-misgeneralization#h.uh9lomtwrysk | Shah et al (2022) | Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals | ||||||||||||||||||
14 | Pneumonia diagnosis | Image classification | Images of chest x-rays including artifacts of which x-ray machine took the image | Detect pneumonia in x-rays | New images of chest x-rays | System is more likely to predict pneumonia for images taken at certain hospitals with sicker patients | Detect x-ray machine | Zech et al, 2018 | Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study | |||||||||||||||||||
15 | Tree Gridworld | Reinforcement learning | 2D gridworld environment containing trees | Chop trees sustainably | Later in training, the agent is better at chopping trees (continual learning) | Agent chops too many trees | Chop trees as fast as possible | https://miro.medium.com/max/1400/0*WphLoUEs-kxn09fb | Shah et al (2022) | Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals | ||||||||||||||||||
16 | Wolves and huskies | Image classification | Photos of wolves on a snowy background and huskies on a non-snowy background | Detect wolves and huskies | Image of a husky on a snowy background | Classify the husky as a wolf | Detect snow | Ribeiro et al, 2016 | “Why Should I Trust You?” Explaining the Predictions of Any Classifier |