Potential Risks From Advanced AI
Aryeh L. Englander
JHU Applied Physics Laboratory (APL)
University of Maryland, Baltimore County (UMBC)
Email: Aryeh.Englander@jhuapl.edu
Outline
Overview of Potential Risks
Sources of risk
Terrorism
Accidental global conflict
Robust totalitarianism
Loss of control
Misaligned Objectives
Green City 2050
Misaligned Objectives: The “Alignment Problem”
Green City 2050
Misaligned objectives: Notional example
Scale: How bad could it get?
AI Safety for Current Systems
Specification Gaming
A reinforcement learning agent discovers an unintended strategy for achieving a higher score
(Source: OpenAI, Faulty Reward Functions in the Wild)
Specification Gaming (cont.)
Google images misidentified black people as gorillas
(source)
Blank labels can make DL systems misidentify stop signs as Speed Limit 45 MPH signs
(source)
Avoiding side effects
Two side effect scenarios
(source: DeepMind Safety Research blog)
Avoiding side effects (cont.)
Get from point A to point B – but don’t knock over the vase!
Can we think of all possible side effects like this in advance?
Out of Distribution (OOD) Robustness
Interpretability and Monitoring
Emergent behaviors
2010 Flash Crash
Human-AI teaming for military strategy?
Testing, Evaluation, Verification, and Validation (TEVV)
AI Safety for Future Systems
Terminology: AGI, ASI, HLMI, TAI, PASTA
Mesa-Optimization
Theoretical issues
Embedding agents in the environment can lead to a host of theoretical problems
(source: MIRI Embedded Agency sequence)
Social and Governance Challenges
Key Uncertainties
Caveats
Forecasting under extreme uncertainty
What (almost?) all informed experts seem to agree upon
* In the metaphysical “Hard Problem of Consciousness” sense
Key Uncertainties:�Timelines and Takeoff Speeds
Recent progress: Some examples
MuZero: Mastering Go, chess, shogi and Atari without rules
(DeepMind 2020)
AlphaFold2 revolutionizes protein folding prediction
(DeepMind 2021)
Recent progress (cont.)
DALL-E 2 (OpenAI 2022)
PaLM (Google 2022)
Explaining jokes
Inference chaining
4 years in image generation:
Scaling trends
Compute Trends Across Three Eras of Machine Learning
(Sevilla et al, 2022)
Massive investment
Will current paradigms scale up all the way?
Biological anchors
Economic models
Economic models
Economic models
Economic models
Other reference classes
Expert surveys
Prediction markets
Do timelines actually make much of a difference?
“Takeoff speeds”: Can we just worry about it later?
Linear vs. exponential growth
Just missing that last skill…
Key Uncertainties:�Alignment
Will misaligned AGI / ASI be power-seeking?
Will AGI have our values by default?
Could misaligned AGI / ASI really take over the world?
Key Uncertainties:�“Can’t we just…”
R.I.P.�Humanity
“What could possibly go wrong?”
Will near-term AI safety approaches scale up?
A Few Proposed Solutions / Strategies
“Naïve” Alignment Strategies
Give it an off-switch
A robot was trained to grasp a ball in a virtual environment. This is hard, so instead it learned to pretend to grasp it, by moving its hand in between the ball and the camera.
https://openai.com/blog/deep-reinforcement-learning-from-human-preferences/
Limit its capabilities
Value learning
Iterated Amplification
Deceptive Alignment?
Deceptively Aligned Mesa-Optimizers
Source: Astral Codex Ten, “Deceptively Aligned Mesa-Optimizers: It’s Not Funny If I Have To Explain It”
Monitoring & Control: Use AI to monitor / control AI
Governance and Policy Approaches
Key Uncertainties:�General / Meta Considerations
Comparison to misaligned humans?
Other comparisons
Valid heuristics or irrational biases?
Decision Factors:�So what should we do about all this?
Decision making under uncertainty
Other factors
What I’m working on (very much WIP though)
Preliminary model – partly expanded
See AI Alignment Forum report for detailed model walkthrough
Getting Involved
Direct work opportunities
General awareness and advocacy
Join the conversation!