Introduction to AI Safety
Aryeh L. Englander
AMDS / A4I
Overview
2
What do we mean by Technical AI Safety?
3
Sources:
Other related concerns
4
Adversarial examples: fooling AI into thinking a stop sign is a 45 mph sign
Potential terrorist use of lethal fully autonomous drones
(image source, based on a report from the OECD)
Jobs at risk of automation by AI
AI Safety research communities
5
AI Safety: Lots of ways to frame conceptually
6
AI Safety Landscape overview from the Future of Life Institute (FLI)
Connections between different research agendas
(Source: Everitt et al, AGI Safety Literature Review)
AI Safety: DeepMind’s conceptual framework
7
Source: DeepMind Safety Research Blog
Assured Autonomy: AAIP conceptual framework
8
Source: Ashmore et al., Assuring the Machine Learning Lifecycle
AAIP = Assuring Autonomy International Programme (University of York)
Combined framework
9
= focus of AI Safety / Deepmind framework
= focus of Assured Autonomy / AAIP framework
My personal preference
Problems that scale up to long term:
DeepMind framework
10
Near-term machine learning:
AAIP framework
+
+
Everything else:
Combined framework
AI safety concerns and APL’s mission areas
11
Technical AI Safety
12
13
Specification problems
14
Specification: Specification Gaming
15
The evolvable motherboard that led to the evolved radio
A reinforcement learning agent discovers an unintended strategy for achieving a higher score
(Source: OpenAI, Faulty Reward Functions in the Wild)
Specification: Specification Gaming (cont.)
16
Google images misidentified black people as gorillas
(source)
Blank labels can make DL systems misidentify stop signs as Speed Limit 45 MPH signs
(source)
Specification: Avoiding side effects
17
Two side effect scenarios
(source: DeepMind Safety Research blog)
Specification: Avoiding side effects (cont.)
18
Get from point A to point B – but don’t knock over the vase!
Can we think of all possible side effects like this in advance?
Specification: Other problems
19
OpenAI’s hide and seek AI agents demonstrated surprising emergent behaviors (source)
Robustness problems
20
Robustness: Distributional shift / generalization
21
Robustness: Safe exploration
22
How do we tell a cleaning robot not to experiment with sticking wet brooms into sockets during training?
Robustness: Security
23
What if an adversary fools an AI into thinking a school bus is a tank?
Monitoring and Control
24
Scaling up testing, evaluation, verification, and validation
25
Theoretical issues
26
Embedding agents in the environment can lead to a host of theoretical problems
(source: MIRI Embedded Agency sequence)
Human-AI teaming
27
Systems engineering and best practices
28
Assuring the Machine Learning Lifecycle
29
30
Data management
31
Model learning
32
Model verification
33
Model deployment
34
Final notes
35
Research groups outside APL (partial list)
36
Primary reading
37
Partial bibliography: General / Literature Reviews
38
Partial bibliography: Technical AI Safety literature
39
Partial bibliography: Assured Autonomy literature
40
Partial bibliography: Misc.
41