last updated 02-21-2013
Note that I've included works on any of the problem categories from my list of open problems in AI risk research, even if they're not explicitly about AI risk.
Superintelligence: The Coming Machine Intelligence Revolution by Nick Bostrom
Monograph. Covers paths to superintelligence, forms of superintelligence, powers of superintelligence, takeoff dynamics, convergent instrumental values, the control problem, capability control methods, motivation selection methods, CEV vs. Oracle AI, multipolar outcomes, current strategy for interventions, and more. (OUP, 2013)
Singularity Hypotheses, Vol. 2, edited by Vic Callaghan
Another edited volume, the sequel to Singularity Hypotheses.
"The Löb Problem", by Eliezer Yudkowsky
Describes the difficulty that Löb's theorem raises for AIs that we'd hope could prove things about themselves and their descendent AIs.
"Responses to Catastrophic AGI Risk: A Survey" by Kaj Sotala & Roman Yampolskiy
A taxonomy of societal and technical responses to the challenges of AGI risk.
"Predicting Machine Superintelligence"
Builds on the technological forecasting section in "Intelligence Explosion: Evidence and Import" and Stuart Armstrongs "How We're Predicting AI... or Failing to", and devotes more space and effort to methods that might successfully predict the first creation of AI. Possible targets: Futures, Technological Forecasting and Social Change, Philosophy and Technology, Journal of Evolution and Technology.
"Solomonoff Induction and Second-Order Logic"
Shows that a version of Solomonoff Induction in second-order logic would be more expressive (have a "greater imagination") than currently available Solomonoff approaches, and gives some hints toward how this could be accomplished. Possible targets: international conference on algorithmic learning theory, Entropy, Artificial General Intelligence.
"The Challenge of Preference Extraction"
Explains the neurobiology, behavioral psychology, and evolutionary psychology behind the claim that humans don't have coherent utility functions; thou art godshatter. Current attempts at preference extraction in AI and economics are insufficient. Possible targets: Journal of Evolution and Technology, Autonomous Agents and Multi-Agent Systems.
Summarizes the work on value extrapolation in philosophy, proposes a few ways forward for such research, particularly in the context of machine ethics. Picks up where "The Singularity and Machine Ethics" leaves off. Possible targets: Philosophy and Technology, Philosopher's Imprint, Utilitas.
"Losses in Hardscrabble Hell"
Explains Hanson's "hardscrabble hell" scenario in more detail and clarifies the losses we can expect to occur in such a scenario.
"Will Values Converge?"
Summarizes the philosophy literature on whether values will converge, proposes a few ways forward for such research, incorporates the latest from psychology and neuroscience. Possible targets: Philosopher's Imprint, Philosophical Review.
"Biases in AI Research"
"Catastrophic Risks and Existential Risks"
An update on the current landscape of catastrophic risks and existential risks, emphasizing that several risks could be catastrophic but not existential. (But, e.g., AI risk and simulation shutdown risk are both strong candidates for existential risks.)
"Uncertainty and Decision Theories"
A survey of our uncertainty about decision theories and proposed solutions, and different methods for handling these uncertainties.
"Intelligence Explosion: The Proportionality Thesis"
Examines the proportionality thesis in Chalmer's version of the argument for the singularity. (An important premise that hasn't been analyzed in much detail, as Chalmers notes in "The Singularity: A Reply.")
"Hazards from Large Scale Computation"
Large-scale induction and simulation might cause certain hazards, for example creating millions of poorly-formed simulated beings capable of consciousness and suffering (cf Metzinger), or producing dangerous agent-like behavior from an otherwise agentless system.
"Tool Oracles for Safe AI Development"
Is there a safe path from a weak non-agenty Oracle to a superintelligent non-agenty Oracle that can help us secure a valuable future while avoiding the dangers of agenty superintelligence?
"Stable Attractors for Technologically Advanced Civilizations"
What 'win' and 'fail' scenarios might we expected for technologically advanced civilizations? ('Future of Human Evolution', etc.)
"AI Risk: Private Projects vs. Government Projects"
What are the advantages and disadvantages of private vs. government AI projects, with regard to AI risk?
"When will whole brain emulation be possible?"
Update to WBE tech report from FHI.
"Is it desirable to accelerate progress toward whole brain emulation?"
Building on the material in MIRI Tech Report 2012-01.
"Awareness of nanotechnology risks: Lessons for AI risk mitigation"
Analyzes the history of awareness of nanotechnology risks, what interventions were made by Drexler and others, and what lessons this history has for AI risk mitigation.
"AI and Physical Effects"
Details a variety of ways in which an AI could have physical effects in the world, especially once it hits the internet.
"Moore's Law of Mad Science"
Surveys the evidence suggesting that technology is making it easier and easier to destroy the world, though "Moore's Law of Mad Science" (taken to mean that the IQ required to destroy the world drops by one point each year) can't be substantiated.
"AI Capability vs. AI Safety"
A summary of ways in which AI capabilities research and AI safety research may be independent. For example, those aiming for AI capability may benefit most from a scruffy approach, while safety may best be accommodated by a neat approach.