| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | AF | AG | AH | AI | AJ | AK | AL | AM | AN | AO | AP | AQ | AR | AS | AT | AU | AV | AW | AX | AY | AZ | BA | BB | BC | BD | BE | BF | BG | BH | BI | BJ | BK | BL | BM | BN | BO | BP | BQ | BR | BS | BT | BU | BV | BW | BX | BY | BZ | CA | CB | CC | CD | CE | CF | CG | CH | CI | CJ | CK | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Key | Item Type | Publication Year | Author | Title | Publication Title | ISBN | ISSN | DOI | Url | Abstract Note | Date | Date Added | Date Modified | Access Date | Pages | Num Pages | Issue | Volume | Number Of Volumes | Journal Abbreviation | Short Title | Series | Series Number | Series Text | Series Title | Publisher | Place | Language | Rights | Type | Archive | Archive Location | Library Catalog | Call Number | Extra | Notes | File Attachments | Link Attachments | Manual Tags | Automatic Tags | Editor | Series Editor | Translator | Contributor | Attorney Agent | Book Author | Cast Member | Commenter | Composer | Cosponsor | Counsel | Interviewer | Producer | Recipient | Reviewed Author | Scriptwriter | Words By | Guest | Number | Edition | Running Time | Scale | Medium | Artwork Size | Filing Date | Application Number | Assignee | Issuing Authority | Country | Meeting Name | Conference Name | Court | References | Reporter | Legal Status | Priority Numbers | Programming Language | Version | System | Code | Code Number | Section | Session | Committee | History | Legislative Body | ||
2 | B24PTESX | report | 2019 | Drexler, K Eric | Reframing Superintelligence: Comprehensive AI Services as General Intelligence | 2019 | 2019-12-16 2:15:54 | 2020-12-19 23:32:46 | 210 | Future of Humanity Institute | en | Zotero | ZSCC: 0000009 | /Users/jriedel/Zotero/storage/PEXGT3JJ/Drexler - Reframing Superintelligence.pdf | FHI; TechSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | NMH6V9DF | conferencePaper | 2019 | Carey, Ryan | How Useful Is Quantilization For Mitigating Specification-Gaming? | For some tasks, there exists a goal that perfectly describes what the designer wants the AI system to achieve. For many tasks, however, the best available proxy objective is only a rough approximation of the designer’s intentions. When given such a goal, a system that optimizes the proxy objective tends to select degenerate solutions where the proxy reward is very different from the designer’s true reward function. One way to counteract the tendency toward specification-gaming is quantilization, a method that interpolates between imitating demonstrations, and optimizing the proxy objective. If the demonstrations are of adequate quality, and the proxy reward overestimates performance, then quantilization has better guaranteed performance than other strategies. However, if the proxy reward underestimates performance, then either imitation or optimization will offer the best guarantee. This work introduces three new gym environments: Mountain Car-RR, Hopper-RR, and Video Pinball-RR, and shows that quantilization outperforms baselines on these tasks. | 2019 | 2019-12-16 2:16:27 | 2020-12-19 23:32:01 | 11 | en | Zotero | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/N66QDHXW/Carey - 2019 - HOW USEFUL IS QUANTILIZATION FOR MITIGATING SPECIF.pdf; /Users/jriedel/Zotero/storage/6UNXCG57/Carey - 2019 - HOW USEFUL IS QUANTILIZATION FOR MITIGATING SPECIF.pdf | FHI; TechSafety | ICLR 2019 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | IN3IZ95F | conferencePaper | 2019 | Kenton, Zachary; Filos, Angelos; Evans, Owain; Gal, Yarin | Generalizing from a few environments in safety-critical reinforcement learning | http://arxiv.org/abs/1907.01475 | Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training, allowing them to learn about every possible danger, but this is often impractical. This paper investigates safety and generalization from a limited number of training environments in deep reinforcement learning (RL). We find RL algorithms can fail dangerously on unseen test environments even when performing perfectly on training environments. Firstly, in a gridworld setting, we show that catastrophes can be significantly reduced with simple modifications, including ensemble model averaging and the use of a blocking classifier. In the more challenging CoinRun environment we find similar methods do not significantly reduce catastrophes. However, we do find that the uncertainty information from the ensemble is useful for predicting whether a catastrophe will occur within a few steps and hence whether human intervention should be requested. | 2019-07-02 | 2019-12-16 2:16:41 | 2020-12-20 22:36:40 | 2019-12-16 2:16:41 | arXiv.org | ZSCC: 0000003 arXiv: 1907.01475 | /Users/jriedel/Zotero/storage/VYSE5G8W/Kenton et al. - 2019 - Generalizing from a few environments in safety-cri.pdf; /Users/jriedel/Zotero/storage/44FDLE2I/Kenton et al. - 2019 - Generalizing from a few environments in safety-cri.pdf; /Users/jriedel/Zotero/storage/M8LGMYQU/1907.html; /Users/jriedel/Zotero/storage/EPUK6KML/1907.html | FHI; TechSafety | Computer Science - Artificial Intelligence; Statistics - Machine Learning; Computer Science - Machine Learning | SafeML ICLR 2019 Workshop | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | S5365R4J | journalArticle | 2019 | Garfinkel, Ben; Dafoe, Allan | How does the offense-defense balance scale? | Journal of Strategic Studies | 0140-2390 | 10.1080/01402390.2019.1631810 | https://doi.org/10.1080/01402390.2019.1631810 | We ask how the offense-defense balance scales, meaning how it changes as investments into a conflict increase. To do so we offer a general formalization of the offense-defense balance in terms of contest success functions. Simple models of ground invasions and cyberattacks that exploit software vulnerabilities suggest that, in both cases, growth in investments will favor offense when investment levels are sufficiently low and favor defense when they are sufficiently high. We refer to this phenomenon as offensive-then-defensive scaling or OD-scaling. Such scaling effects may help us understand the security implications of applications of artificial intelligence that in essence scale up existing capabilities. | 2019-09-19 | 2019-12-16 2:16:59 | 2020-12-15 0:26:11 | 2019-12-16 2:16:59 | 736-763 | 6 | 42 | Taylor and Francis+NEJM | ZSCC: 0000016 | /Users/jriedel/Zotero/storage/PCLNPVRT/01402390.2019.html; /Users/jriedel/Zotero/storage/Y6SXR85M/Garfinkel and Dafoe - 2019 - How does the offense-defense balance scale.pdf; /Users/jriedel/Zotero/storage/ZB3PWR3T/Garfinkel and Dafoe - 2019 - How does the offense-defense balance scale.pdf; /Users/jriedel/Zotero/storage/DVJSA4BW/01402390.2019.html; /Users/jriedel/Zotero/storage/34P3JAT2/01402390.2019.html | FHI; MetaSafety | emerging technologies; Offense-defense theory; strategic stability | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | EK8XDY6B | report | 2019 | Cihon, Peter | Standards for AI Governance: International Standards to Enable Global Coordination in AI Research & Development | 2019 | 2019-12-16 2:17:42 | 2020-11-23 22:00:14 | Standards for AI Governance | Berkeley Existential Risk Initiative | Google Scholar | ZSCC: 0000013 | /Users/jriedel/Zotero/storage/BYM4WJAY/Cihon - 2019 - Standards for AI Governance International Standar.pdf | BERI; FHI; MetaSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | FK4RBXR7 | manuscript | 2019 | Hubinger, Evan; van Merwijk, Chris; Mikulik, Vladimir; Skalse, Joar; Garrabrant, Scott | Risks from Learned Optimization in Advanced Machine Learning Systems | http://arxiv.org/abs/1906.01820 | We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research. | 2019-06-11 | 2019-12-16 2:27:32 | 2020-12-20 20:53:13 | 2019-12-16 2:27:32 | arXiv.org | ZSCC: 0000003 arXiv: 1906.01820 | /Users/jriedel/Zotero/storage/6R87FL23/Hubinger et al. - 2019 - Risks from Learned Optimization in Advanced Machin.pdf; /Users/jriedel/Zotero/storage/ULV4CDNG/Hubinger et al. - 2019 - Risks from Learned Optimization in Advanced Machin.pdf; /Users/jriedel/Zotero/storage/AZKJYAVE/1906.html | FHI; MIRI; TechSafety | Computer Science - Artificial Intelligence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 8XVGV7AE | manuscript | 2019 | Demski, Abram; Garrabrant, Scott | Embedded Agency | http://arxiv.org/abs/1902.09469 | Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts. We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. Such agents must optimize an environment that is not of type ``function''; they must rely on models that fit within the modeled environment; and they must reason about themselves as just another physical system, made of parts that can be modified and that can work at cross purposes. | 2019-02-25 | 2019-12-16 2:27:50 | 2020-12-20 20:53:23 | 2019-12-16 2:27:50 | arXiv.org | ZSCC: 0000004 arXiv: 1902.09469 | /Users/jriedel/Zotero/storage/QQFFI4RX/Demski and Garrabrant - 2019 - Embedded Agency.pdf; /Users/jriedel/Zotero/storage/HTYD7KUU/1902.html | MIRI; TechSafety | Computer Science - Artificial Intelligence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | KUHC2TTD | manuscript | 2019 | Manheim, David; Garrabrant, Scott | Categorizing Variants of Goodhart's Law | http://arxiv.org/abs/1803.04585 | There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field. | 2019-02-24 | 2019-12-16 2:27:58 | 2020-12-20 20:53:34 | 2019-12-16 2:27:58 | arXiv.org | ZSCC: NoCitationData[s6] ACC: 23 J: 23 arXiv: 1803.04585 | /Users/jriedel/Zotero/storage/2PDVLINT/Manheim and Garrabrant - 2019 - Categorizing Variants of Goodhart's Law.pdf; /Users/jriedel/Zotero/storage/9KQ5CH2H/Manheim and Garrabrant - 2019 - Categorizing Variants of Goodhart's Law.pdf; /Users/jriedel/Zotero/storage/ZYBTEFZT/1803.html; /Users/jriedel/Zotero/storage/AG6X793U/1803.html | MIRI; TechSafety | Computer Science - Artificial Intelligence; Statistics - Machine Learning; Quantitative Finance - General Finance; 91E45 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | ZQLX4Y7H | conferencePaper | 2019 | Kosoy, Vanessa | Delegative Reinforcement Learning: Learning To Avoid Traps With A Little Help | Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions. | 2019 | 2019-12-16 2:28:03 | 2020-12-19 23:30:58 | 22 | en | Zotero | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/727I8NED/Kosoy - 2019 - Delegative Reinforcement Learning learning to avo.pdf; /Users/jriedel/Zotero/storage/VCGU8X6K/Kosoy - 2019 - DELEGATIVE REINFORCEMENT LEARNING LEARN- ING TO A.pdf | MIRI; TechSafety | Statistics - Machine Learning; Computer Science - Machine Learning; I.2.6; 68Q32 | SafeML ICLR 2019 Workshop | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 | RT8G3R2W | conferencePaper | 2018 | Armstrong, Stuart; Mindermann, Sören | Occam's razor is insufficient to infer the preferences of irrational agents | Advances in Neural Information Processing Systems | Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. Since human planning systematically deviates from rationality, several approaches have been tried to account for specific human shortcomings. However, the general problem of inferring the reward function of an agent of unknown rationality has received little attention. Unlike the well-known ambiguity problems in IRL, this one is practically relevant but cannot be resolved by observing the agent’s policy in enough environments. This paper shows (1) that a No Free Lunch result implies it is impossible to uniquely decompose a policy into a planning algorithm and reward function, and (2) that even with a reasonable simplicity prior/Occam’s razor on the set of decompositions, we cannot distinguish between the true decomposition and others that lead to high regret. To address this, we need simple ‘normative’ assumptions, which cannot be deduced exclusively from observations. | 2018 | 2019-12-16 2:28:12 | 2020-12-19 6:01:00 | 5598–5609 | en | Zotero | ZSCC: 0000017 | /Users/jriedel/Zotero/storage/IZMS6ASX/Armstrong and Mindermann - Occam's razor is insufficient to infer the prefere.pdf; /Users/jriedel/Zotero/storage/SBA4C7TY/7803-occams-razor-is-insufficient-to-infer-the-preferences-of-irrational-agents.html | FHI; TechSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 | 8JZQRCUY | manuscript | 2019 | Kosoy, Vanessa | Forecasting using incomplete models | http://arxiv.org/abs/1705.04630 | We consider the task of forecasting an infinite sequence of future observations based on some number of past observations, where the probability measure generating the observations is "suspected" to satisfy one or more of a set of incomplete models, i.e. convex sets in the space of probability measures. This setting is in some sense intermediate between the realizable setting where the probability measure comes from some known set of probability measures (which can be addressed using e.g. Bayesian inference) and the unrealizable setting where the probability measure is completely arbitrary. We demonstrate a method of forecasting which guarantees that, whenever the true probability measure satisfies an incomplete model in a given countable set, the forecast converges to the same incomplete model in the (appropriately normalized) Kantorovich-Rubinstein metric. This is analogous to merging of opinions for Bayesian inference, except that convergence in the Kantorovich-Rubinstein metric is weaker than convergence in total variation. | 2019-05-16 | 2019-12-16 2:29:04 | 2020-12-20 21:10:30 | 2019-12-16 2:29:04 | arXiv.org | ZSCC: NoCitationData[s6] ACC: 1 J: 1 arXiv: 1705.04630 | /Users/jriedel/Zotero/storage/YKLDJTH6/Kosoy - 2019 - Forecasting using incomplete models.pdf; /Users/jriedel/Zotero/storage/XE4WUD32/1705.html | MIRI; TechSafety | Computer Science - Machine Learning; G.3; I.2.6; 68Q32, 62M10, 62G08 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13 | TDPYE4II | journalArticle | 2018 | Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain | When Will AI Exceed Human Performance? Evidence from AI Experts | Journal of Artificial Intelligence Research | http://arxiv.org/abs/1705.08807 | Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI. | 2018 | 2019-12-16 2:29:10 | 2020-12-19 6:36:02 | 2019-12-16 2:29:10 | 729–754 | 62 | When Will AI Exceed Human Performance? | arXiv.org | ZSCC: 0000299 arXiv: 1705.08807 | /Users/jriedel/Zotero/storage/VW3774FC/Grace et al. - 2018 - When Will AI Exceed Human Performance Evidence fr.pdf; /Users/jriedel/Zotero/storage/JIL882WS/1705.html; /Users/jriedel/Zotero/storage/G5XJG2SK/1705.html; /Users/jriedel/Zotero/storage/T3I7WGX2/Grace et al. - 2018 - When will AI exceed human performance Evidence fr.pdf; /Users/jriedel/Zotero/storage/TW5QN5BY/11222.html; /Users/jriedel/Zotero/storage/KRKZRG9F/11222.html; /Users/jriedel/Zotero/storage/2H6MNFIV/11222.html | FHI; MetaSafety; AI-Impacts-NotFeatured | Computer Science - Artificial Intelligence; Computer Science - Computers and Society | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 | FULQBDXW | conferencePaper | 2018 | Carey, Ryan | Incorrigibility in the CIRL Framework | AIES '18: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society | http://arxiv.org/abs/1709.06275 | A value learning system has incentives to follow shutdown instructions, assuming the shutdown instruction provides information (in the technical sense) about which actions lead to valuable outcomes. However, this assumption is not robust to model mis-specification (e.g., in the case of programmer errors). We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands. These difficulties parallel those discussed by Soares et al. (2015) in their paper on corrigibility. We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework. | 2018-06-03 | 2019-12-16 2:29:24 | 2020-12-20 22:42:39 | 2019-12-16 2:29:24 | arXiv.org | ZSCC: 0000008 arXiv: 1709.06275 | /Users/jriedel/Zotero/storage/T9YZ5NJ3/Carey - 2018 - Incorrigibility in the CIRL Framework.pdf; /Users/jriedel/Zotero/storage/4AGMAW4S/Carey - 2018 - Incorrigibility in the CIRL Framework.pdf; /Users/jriedel/Zotero/storage/LV2CB28R/1709.html; /Users/jriedel/Zotero/storage/H9Q2Y24K/1709.html | FHI; MIRI; TechSafety | Computer Science - Artificial Intelligence; ai safety; cirl; cooperative inverse reinforcement learning; corrigibility | 2018 AAAI/ACM Conference on AI, Ethics, and Society | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15 | RNSR28PB | manuscript | 2016 | Critch, Andrew | Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents | http://arxiv.org/abs/1602.04184 | Löb's theorem and Gödel's theorems make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of L\"ob's theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to out-perform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner's Dilemma. Previous cooperative program equilibria studied by Tennenholtz (2004) and Fortnow (2009) have depended on tests for program equality, a fragile condition, whereas "L\"obian" cooperation is much more robust and agnostic of the opponent's implementation. | 2016-08-24 | 2019-12-16 2:30:38 | 2020-12-20 22:05:44 | 2019-12-16 2:30:38 | arXiv.org | ZSCC: 0000005 arXiv: 1602.04184 | /Users/jriedel/Zotero/storage/DKEA6DTK/Critch - 2016 - Parametric Bounded Lob's Theorem and Robust Coop.pdf; /Users/jriedel/Zotero/storage/ZGB4E4AY/Critch - 2016 - Parametric Bounded Lob's Theorem and Robust Coop.pdf; /Users/jriedel/Zotero/storage/D3NJ3XYP/1602.html; /Users/jriedel/Zotero/storage/R9XEXB56/1602.html; /Users/jriedel/Zotero/storage/3SZSBX7A/1602.html | MIRI; TechSafety; CHAI | Computer Science - Computer Science and Game Theory; Computer Science - Logic in Computer Science | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 | JLCVWGLD | manuscript | 2017 | Garrabrant, Scott; Benson-Tilsen, Tsvi; Critch, Andrew; Soares, Nate; Taylor, Jessica | Logical Induction | http://arxiv.org/abs/1609.03543 | We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that our algorithm, an instance of what we call a logical inductor, satisfies a number of intuitive desiderata, including: (1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time; (2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom; and (3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference. For example, if a given computer program only ever produces outputs in a certain range, a logical inductor learns this fact in a timely manner; and if late digits in the decimal expansion of $\pi$ are difficult to predict, then a logical inductor learns to assign $\approx 10\%$ probability to "the $n$th digit of $\pi$ is a 7" for large $n$. Logical inductors also learn to trust their future beliefs more than their current beliefs, and their beliefs are coherent in the limit (whenever $\phi \implies \psi$, $\mathbb{P}_\infty(\phi) \le \mathbb{P}_\infty(\psi)$, and so on); and logical inductors strictly dominate the universal semimeasure in the limit. These properties and many others all follow from a single logical induction criterion, which is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence $\phi$ is associated with a stock that is worth \$1 per share if [...] | 2017-12-12 | 2019-12-16 2:30:43 | 2020-12-20 21:20:09 | 2019-12-16 2:30:43 | arXiv.org | ZSCC: NoCitationData[s6] ACC: 23 J: 23 arXiv: 1609.03543 | /Users/jriedel/Zotero/storage/5EDLRDMZ/Garrabrant et al. - 2017 - Logical Induction.pdf; /Users/jriedel/Zotero/storage/NTRM7VUQ/1609.html | MIRI; TechSafety | Computer Science - Artificial Intelligence; Computer Science - Logic in Computer Science; Mathematics - Logic; Mathematics - Probability | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
17 | 9KLGQR5U | manuscript | 2016 | Garrabrant, Scott; Fallenstein, Benya; Demski, Abram; Soares, Nate | Inductive Coherence | http://arxiv.org/abs/1604.05288 | While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation. To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce inductive coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion. | 2016-10-07 | 2019-12-16 2:30:52 | 2020-12-20 21:16:30 | 2019-12-16 2:30:52 | arXiv.org | ZSCC: 0000003 arXiv: 1604.05288 | /Users/jriedel/Zotero/storage/N8VGW3ZI/Garrabrant et al. - 2016 - Inductive Coherence.pdf; /Users/jriedel/Zotero/storage/4GJYA5XT/Garrabrant et al. - 2016 - Inductive Coherence.pdf; /Users/jriedel/Zotero/storage/M9FJTSFY/1604.html; /Users/jriedel/Zotero/storage/HEVQPMSD/1604.html | MIRI; TechSafety | Computer Science - Artificial Intelligence; Computer Science - Machine Learning; Mathematics - Probability | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18 | PRXBFVLY | manuscript | 2016 | Garrabrant, Scott; Soares, Nate; Taylor, Jessica | Asymptotic Convergence in Online Learning with Unbounded Delays | http://arxiv.org/abs/1604.05280 | We study the problem of predicting the results of computations that are too expensive to run, via the observation of the results of smaller computations. We model this as an online learning problem with delayed feedback, where the length of the delay is unbounded, which we study mainly in a stochastic setting. We show that in this setting, consistency is not possible in general, and that optimal forecasters might not have average regret going to zero. However, it is still possible to give algorithms that converge asymptotically to Bayes-optimal predictions, by evaluating forecasters on specific sparse independent subsequences of their predictions. We give an algorithm that does this, which converges asymptotically on good behavior, and give very weak bounds on how long it takes to converge. We then relate our results back to the problem of predicting large computations in a deterministic setting. | 2016-09-07 | 2019-12-16 2:30:59 | 2020-12-20 21:02:06 | 2019-12-16 2:30:59 | arXiv.org | ZSCC: 0000009 arXiv: 1604.05280 | /Users/jriedel/Zotero/storage/XB5TFTVL/Garrabrant et al. - 2016 - Asymptotic Convergence in Online Learning with Unb.pdf; /Users/jriedel/Zotero/storage/LUN6C53W/Garrabrant et al. - 2016 - Asymptotic Convergence in Online Learning with Unb.pdf; /Users/jriedel/Zotero/storage/966HHHXG/1604.html; /Users/jriedel/Zotero/storage/LRKN2IJ9/1604.html | MIRI; TechSafety | Computer Science - Artificial Intelligence; Computer Science - Machine Learning; Mathematics - Probability | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 | MZ82VG2C | manuscript | 2019 | Kosoy, Vanessa; Appel, Alexander | Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm | http://arxiv.org/abs/1608.04112 | We introduce a new concept of approximation applicable to decision problems and functions, inspired by Bayesian probability. From the perspective of a Bayesian reasoner with limited computational resources, the answer to a problem that cannot be solved exactly is uncertain and therefore should be described by a random variable. It thus should make sense to talk about the expected value of this random variable, an idea we formalize in the language of average-case complexity theory by introducing the concept of "optimal polynomial-time estimators." We prove some existence theorems and completeness results, and show that optimal polynomial-time estimators exhibit many parallels with "classical" probability theory. | 2019-06-04 | 2019-12-16 2:31:07 | 2020-12-20 22:05:22 | 2019-12-16 2:31:07 | Optimal Polynomial-Time Estimators | arXiv.org | ZSCC: NoCitationData[s7] ACC: 1 J: 1 arXiv: 1608.04112 | /Users/jriedel/Zotero/storage/ZSFMNZUJ/Kosoy and Appel - 2019 - Optimal Polynomial-Time Estimators A Bayesian Not.pdf; /Users/jriedel/Zotero/storage/Z5MMRL34/Kosoy and Appel - 2019 - Optimal Polynomial-Time Estimators A Bayesian Not.pdf; /Users/jriedel/Zotero/storage/96Y3BQY3/1608.html; /Users/jriedel/Zotero/storage/JFY9B26I/1608.html | MIRI; TechSafety | Computer Science - Computational Complexity | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 | HUR4CFJC | conferencePaper | 2016 | Benson-Tilsen, Tsvi; Soares, Nate | Formalizing convergent instrumental goals | Workshops at the Thirtieth AAAI Conference on Artificial Intelligence | 2016 | 2019-12-16 2:31:38 | 2020-12-15 0:25:14 | Google Scholar | ZSCC: 0000010 | /Users/jriedel/Zotero/storage/XGJA6GST/Benson-Tilsen and Soares - Formalizing Convergent Instrumental Goals.pdf; /Users/jriedel/Zotero/storage/VC3UMAXJ/12634.html | MIRI; TechSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
21 | TAP3NQ6F | conferencePaper | 2016 | Orseau, Laurent; Armstrong, Stuart | Safely Interruptible Agents | Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions—harmful either for the agent or for the environment—and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button—which is an undesirable outcome. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator. We provide a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible. | 2016 | 2019-12-16 2:31:59 | 2020-12-19 23:28:51 | 10 | en | Zotero | ZSCC: 0000075 | /Users/jriedel/Zotero/storage/T5HM2S86/Orseau and Armstrong - Safely Interruptible Agents.pdf | FHI; TechSafety; DeepMind | Conference on Uncertainty in Artificial Intelligence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
22 | 4HXYFRIF | conferencePaper | 2016 | Sotala, Kaj | Defining human values for value learners | Workshops at the Thirtieth AAAI Conference on Artificial Intelligence | 2016 | 2019-12-16 2:32:05 | 2020-11-23 23:16:50 | Google Scholar | ZSCC: 0000016 | /Users/jriedel/Zotero/storage/P9HQXVA3/Sotala - 2016 - Defining human values for value learners.pdf; /Users/jriedel/Zotero/storage/7GA7R3AH/12633.html | MIRI; TechSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23 | MPT4845G | conferencePaper | 2016 | Taylor, Jessica | Quantilizers: A safer alternative to maximizers for limited optimization | Workshops at the Thirtieth AAAI Conference on Artificial Intelligence | 2016 | 2019-12-16 2:32:20 | 2020-11-23 23:16:50 | Quantilizers | Google Scholar | ZSCC: 0000022 | /Users/jriedel/Zotero/storage/VEIJHHXR/Taylor - 2016 - Quantilizers A safer alternative to maximizers fo.pdf; /Users/jriedel/Zotero/storage/IKLAMVI2/12613.html | MIRI; TechSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
24 | NAVUYZ6A | bookSection | 2019 | Baum, Seth | Lessons for Artificial Intelligence from Other Global Risks | The Global Politics of Artificial Intelligence | The prominence of artificial intelligence (AI) as a global risk is a relatively recent phenomenon. Other global risks have longer histories and larger bodies of scholarship. The study of these other risks can offer considerable insight to the study of AI risk. This paper examines four risks: biotechnology, nuclear weapons, global warming, and asteroid collision. Several overarching lessons are found. First, the extreme severity of global risks is often insufficient to motivate action to reduce the risks. Second, perceptions of global risks can be influenced by people’s incentives and by their cultural and intellectual orientations. Third, the success of efforts to address global risks can depend on the extent of buy-in from parties who may be negatively affected by the efforts. Fourth, global risks and risk reduction initiatives can be shaped by broader socio-political conditions, such as the degree of policy influence of private industry within a political jurisdiction. The paper shows how these and other lessons can inform efforts to reduce risks from AI. | 2019 | 2019-12-16 2:34:12 | 2020-11-23 23:01:21 | 20 | en | Zotero | ZSCC: NoCitationData[s3] ACC: 0 | /Users/jriedel/Zotero/storage/QHPHCM2Z/Baum - Lessons for Artificial Intelligence from Other Glo.pdf | GCRI; MetaSafety | Tinnirello, Maurizio | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
25 | BBXBT9NS | journalArticle | 2019 | Baum, Seth | Risk-Risk Tradeoff Analysis of Nuclear Explosives for Asteroid Deflection | Risk Analysis | 10.1111/risa.13339 | https://papers.ssrn.com/abstract=3397559 | To prevent catastrophic asteroid-Earth collisions, it has been proposed to use nuclear explosives to deflect away Earthbound asteroids. However, this policy of nuclear deflection could inadvertently increase the risk of nuclear war and other violent conflict. This article conducts risk-risk tradeoff analysis to assess whether nuclear deflection results in a net increase or decrease in risk. Assuming nonnuclear deflection options are also used, nuclear deflection may only be needed for the largest and most imminent asteroid collisions. These are low-frequency, high-severity events. The effect of nuclear deflection on violent conflict risk is more ambiguous due to the complex and dynamic social factors at play. Indeed, it is not clear whether nuclear deflection would cause a net increase or decrease in violent conflict risk. Similarly, this article cannot reach a precise conclusion on the overall risk-risk tradeoff. The value of this article comes less from specific quantitative conclusions and more from providing an analytical framework and a better overall understanding of the policy decision. The article demonstrates the importance of integrated analysis of global risks and the policies to address them, as well as the challenge of quantitative evaluation of complex social processes such as violent conflict. | 2019 | 2019-12-16 2:37:27 | 2020-12-21 20:43:01 | 2019-12-16 2:37:27 | 11 | 39 | en | papers.ssrn.com | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/TWHN5CQ6/papers.html | GCRI; NotSafety | asteroids; nuclear weapons; risk-risk tradeoff | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
26 | AZBKWAQP | journalArticle | 2019 | Baum, Seth D.; Armstrong, Stuart; Ekenstedt, Timoteus; Häggström, Olle; Hanson, Robin; Kuhlemann, Karin; Maas, Matthijs M.; Miller, James D.; Salmela, Markus; Sandberg, Anders | Long-term trajectories of human civilization | Foresight | 2019 | 2019-12-16 2:38:28 | 2020-12-15 0:28:15 | 53–83 | 1 | 21 | Google Scholar | ZSCC: 0000022 | /Users/jriedel/Zotero/storage/YKRAFWHD/Baum et al. - 2019 - Long-term trajectories of human civilization.pdf; /Users/jriedel/Zotero/storage/DMHLLNGN/html.html; /Users/jriedel/Zotero/storage/G2PJXVR6/html.html; /Users/jriedel/Zotero/storage/J4HSCHC3/html.html; /Users/jriedel/Zotero/storage/CQ9FVK7K/Baum et al. - 2019 - Long-term trajectories of human civilization.pdf | FHI; GCRI; MetaSafety; CLTR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
27 | XJVKWX3Z | report | 2018 | Baum, Seth; Barrett, Anthony | A Model for the Impacts of Nuclear War | https://papers.ssrn.com/abstract=3155983 | The total impact of nuclear war is a major factor in many important policy questions, but it has gotten little scholarly attention. This paper presents a model for calculating the total impacts of nuclear war. The model includes physical, infrastructural, and social impacts as they affect human lives. The model has five main branches corresponding to the five main types of effects of nuclear weapon detonations: thermal radiation, blast, ionizing radiation, electromagnetic pulse, and human perceptions. Model branches contain extensive detail on each of these effects, including interconnections between them and connections to other major risks including global warming and pandemics. The paper also includes background information on impacts analysis and modeling to help readers understand how to think about the impacts of nuclear war, including discussion of important attributes of nuclear war such as the number and yield of weapons detonated and the location of their detonation. | 2018-04-03 | 2019-12-16 2:39:07 | 2020-11-23 23:01:37 | 2019-12-16 2:39:07 | Social Science Research Network | Rochester, NY | en | SSRN Scholarly Paper | papers.ssrn.com | ZSCC: 0000004 | /Users/jriedel/Zotero/storage/GIU2XKBW/papers.html | GCRI; NotSafety | global catastrophic risk; impacts; nuclear war; risk analysis | ID 3155983 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
28 | MFMUPY79 | report | 2018 | Baum, Seth; de Neufville, Robert; Barrett, Anthony | A Model for the Probability of Nuclear War | https://papers.ssrn.com/abstract=3137081 | The probability of nuclear war is a major factor in many important policy questions, but it has gotten little scholarly attention. This paper presents a model for calculating the total probability of nuclear war. The model is based on 14 interrelated scenarios for how nuclear war can break out, covering perhaps the entire range of nuclear war scenarios. Scenarios vary based on factors including whether a state intends to make a first strike attack, whether the nuclear attack is preceded by a conventional war or a non-war crisis, whether escalation is intentional or inadvertent, the presence of false alarms of various types, and the presence of non-war nuclear detonations such as nuclear terrorism. As a first step towards quantifying the probability of nuclear war using the model, the paper also includes a dataset of historical incidents that might have threatened to turn into nuclear war. 60 historical incidents are included, making it perhaps the largest such dataset currently available. The paper also includes background information about probabilistic analysis and modeling to help readers understand how to think about the probability of nuclear war, including new theory for the decision to initiate nuclear war. | 2018-03-08 | 2019-12-16 2:39:18 | 2020-11-23 23:03:35 | 2019-12-16 2:39:18 | Social Science Research Network | Rochester, NY | en | SSRN Scholarly Paper | papers.ssrn.com | ZSCC: 0000009 | /Users/jriedel/Zotero/storage/MPFU55WH/papers.html | GCRI; NotSafety | probability; global catastrophic risk; nuclear war; risk analysis; scenario | ID 3137081 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
29 | 368A9G6Q | conferencePaper | 2018 | Baum, Seth | Reflections on the risk analysis of nuclear war | Proceedings of the Workshop on Quantifying Global Catastrophic Risks, Garrick Institute for the Risk Sciences, University of California, Los Angeles | 2018 | 2019-12-16 2:39:33 | 2020-11-23 23:01:13 | 19–50 | Google Scholar | ZSCC: 0000001 | /Users/jriedel/Zotero/storage/X9RETQKR/papers.html | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
30 | E5ZCXF27 | journalArticle | 2018 | Baum, Seth D. | Resilience to global catastrophe | Domains of resilience for complex interconnected systems. | 2018 | 2019-12-16 2:39:55 | 2020-12-17 18:37:57 | 47 | Google Scholar | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/332EJWQG/Baumi - 2018 - Resilience to global catastrophe.pdf | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
31 | ZWGAQCUT | journalArticle | 2018 | Baum, Seth | Countering Superintelligence Misinformation | Information | 2018 | 2019-12-16 2:40:10 | 2020-12-15 0:23:46 | 244 | 10 | 9 | Google Scholar | ZSCC: 0000006 | /Users/jriedel/Zotero/storage/44AMRYCJ/Baum - 2018 - Countering Superintelligence Misinformation.pdf; /Users/jriedel/Zotero/storage/V937XSJU/Baum - 2018 - Countering Superintelligence Misinformation.pdf; /Users/jriedel/Zotero/storage/TG82EJSR/244.html | GCRI; MetaSafety | artificial intelligence; superintelligence; misinformation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
32 | XIVCYAEP | journalArticle | 2018 | Baum, Seth | Superintelligence skepticism as a political tool | Information | 2018 | 2019-12-16 2:40:26 | 2020-12-15 0:31:16 | 209 | 9 | 9 | Google Scholar | ZSCC: 0000010 | /Users/jriedel/Zotero/storage/YLABUNCS/Baum - 2018 - Superintelligence skepticism as a political tool.pdf; /Users/jriedel/Zotero/storage/5YBU3T7Y/Baum - 2018 - Superintelligence Skepticism as a Political Tool.pdf; /Users/jriedel/Zotero/storage/BQWLUEXW/209.html | GCRI; MetaSafety | artificial intelligence; skepticism; superintelligence | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
33 | IGCCEIFA | journalArticle | 2018 | Baum, Seth D. | Uncertain human consequences in asteroid risk analysis and the global catastrophe threshold | Natural Hazards | 2018 | 2019-12-16 2:40:40 | 2020-12-15 0:35:01 | 759–775 | 2 | 94 | Google Scholar | ZSCC: 0000010 | /Users/jriedel/Zotero/storage/FGJXXDEG/s11069-018-3419-4.html; /Users/jriedel/Zotero/storage/4VXFNJLQ/papers.html | GCRI; NotSafety | asteroids; global catastrophic risk; risk analysis; uncertainty | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
34 | PEGAXZSL | journalArticle | 2018 | Baum, Seth D. | Reconciliation between factions focused on near-term and long-term artificial intelligence | AI & Society | 2018 | 2019-12-16 2:42:08 | 2020-12-15 0:29:51 | 565–572 | 4 | 33 | Google Scholar | ZSCC: 0000014 | /Users/jriedel/Zotero/storage/89YAV36R/Baum - 2018 - Reconciliation between factions focused on near-te.pdf; /Users/jriedel/Zotero/storage/JEGTJVIH/s00146-017-0734-3.html | GCRI; MetaSafety | Artificial General Intelligence; Artificial Intelligence; Artificial Superintelligence; Long-Term Artificial Intelligence; Near-Term Artificial Intelligence; Societal Impacts of Artificial Intelligence | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
35 | D9EQV4R7 | journalArticle | 2018 | Umbrello, Steven; Baum, Seth D. | Evaluating future nanotechnology: The net societal impacts of atomically precise manufacturing | Futures | 163287 | 10.1016/j.futures.2018.04.007 | https://linkinghub.elsevier.com/retrieve/pii/S0016328717301908 | 2018-06 | 2019-12-16 2:43:30 | 2020-12-17 18:38:00 | 2019-12-16 2:43:30 | 63-73 | 100 | Futures | Evaluating future nanotechnology | en | DOI.org (Crossref) | ZSCC: 0000014 | /Users/jriedel/Zotero/storage/XKPXQFLA/Umbrello and Baum - 2018 - Evaluating Future Nanotechnology The Net Societal.pdf; ; /Users/jriedel/Zotero/storage/JSWGJ4MM/Umbrello and Baum - 2018 - Evaluating future nanotechnology The net societal.pdf | https://www.researchgate.net/publication/324715437_Evaluating_Future_Nanotechnology_The_Net_Societal_Impacts_of_Atomically_Precise_Manufacturing | GCRI; NotSafety; AmbiguosSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
36 | ZCVIKJ45 | report | 2017 | Baum, Seth; Barrett, Anthony | Global Catastrophes: The Most Extreme Risks | https://papers.ssrn.com/abstract=3046668 | The most extreme risk are those that threaten the entirety of human civilization, known as global catastrophic risks. The very extreme nature of global catastrophes makes them both challenging to analyze and important to address. They are challenging to analyze because they are largely unprecedented and because they involve the entire global human system. They are important to address because they threaten everyone around the world and future generations. Global catastrophic risks also pose some deep dilemmas. One dilemma occurs when actions to reduce global catastrophic risk could harm society in other ways, as in the case of geoengineering to reduce catastrophic climate change risk. Another dilemma occurs when reducing one global catastrophic risk could increase another, as in the case of nuclear power reducing climate change risk while increasing risks from nuclear weapons. The complex, interrelated nature of global catastrophic risk suggests a research agenda in which the full space of risks are assessed in an integrated fashion in consideration of the deep dilemmas and other challenges they pose. Such an agenda can help identify the best ways to manage these most extreme risks and keep human civilization safe. | 2017-10-02 | 2019-12-16 2:43:45 | 2020-11-23 23:01:34 | 2019-12-16 2:43:45 | Global Catastrophes | Social Science Research Network | Rochester, NY | en | SSRN Scholarly Paper | papers.ssrn.com | ZSCC: 0000005 | /Users/jriedel/Zotero/storage/DLVDED68/papers.html | GCRI; MetaSafety | global catastrophic risk; catastrophic risk; extreme risk; risk | ID 3046668 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
37 | CE2HVSU7 | report | 2017 | Baum, Seth | A Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy | https://papers.ssrn.com/abstract=3070741 | Artificial general intelligence (AGI) is AI that can reason across a wide range of domains. It has long been considered the “grand dream” or “holy grail” of AI. It also poses major issues of ethics, risk, and policy due to its potential to transform society: if AGI is built, it could either help solve the world’s problems or cause major catastrophe, possibly even human extinction. This paper presents the first-ever survey of active AGI R&D projects in terms of ethics, risk, and policy. A thorough search identifies 45 projects of diverse sizes, nationalities, ethical goals, and other attributes. Most projects are either academic or corporate. The academic projects tend to express goals of advancing knowledge and are less likely to be active on AGI safety issues. The corporate projects tend to express goals of benefiting humanity and are more likely to be active on safety. Most projects are based in the US, and almost all are in either the US or a US ally, including all of the larger projects. This geographic concentration could simplify policymaking, though most projects publish open-source code, enabling contributions from anywhere in the world. These and other findings of the survey offer an empirical basis for the study of AGI R&D and a guide for policy and other action. | 2017-11-12 | 2019-12-16 2:43:48 | 2020-11-23 23:01:05 | 2019-12-16 2:43:48 | Social Science Research Network | Rochester, NY | en | SSRN Scholarly Paper | papers.ssrn.com | ZSCC: 0000032 | /Users/jriedel/Zotero/storage/MJH8W84Z/papers.html | GCRI; MetaSafety | risk; artificial intelligence; ethics; policy | ID 3070741 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
38 | KJ3DCQ5H | journalArticle | 2017 | Baum, Seth D. | On the promotion of safe and socially beneficial artificial intelligence | AI & Society | 0951-5666, 1435-5655 | 10.1007/s00146-016-0677-0 | http://link.springer.com/10.1007/s00146-016-0677-0 | 2017-11 | 2019-12-16 2:44:37 | 2020-12-15 0:28:51 | 2019-12-16 2:44:37 | 543-551 | 4 | 32 | AI & Soc | en | DOI.org (Crossref) | ZSCC: 0000047 | /Users/jriedel/Zotero/storage/563FTKPG/papers.html | GCRI; MetaSafety | artificial intelligence; artificial intelligence safety; beneficial artificial intelligence; social psychology | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
39 | Z5RD28RZ | journalArticle | 2017 | Barrett, Anthony Michael | Value of Global Catastrophic Risk (GCR) Information: Cost-Effectiveness-Based Approach for GCR Reduction | Decision Analysis | 1545-8490, 1545-8504 | 10.1287/deca.2017.0350 | http://pubsonline.informs.org/doi/10.1287/deca.2017.0350 | 2017-09 | 2019-12-16 2:44:44 | 2020-12-15 0:35:10 | 2019-12-16 2:44:44 | 187-203 | 3 | 14 | Decision Analysis | Value of Global Catastrophic Risk (GCR) Information | en | DOI.org (Crossref) | ZSCC: 0000005 | /Users/jriedel/Zotero/storage/5FL342CX/Barrett - 2017 - Value of Global Catastrophic Risk (GCR) Informatio.pdf | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
40 | 6L73SD4Q | journalArticle | 2017 | Barrett, Anthony M.; Baum, Seth D. | A model of pathways to artificial superintelligence catastrophe for risk and decision analysis | Journal of Experimental & Theoretical Artificial Intelligence | 0952-813X, 1362-3079 | 10.1080/0952813X.2016.1186228 | https://www.tandfonline.com/doi/full/10.1080/0952813X.2016.1186228 | 2017-03-04 | 2019-12-16 2:44:58 | 2020-11-23 23:00:43 | 2019-12-16 2:44:58 | 397-414 | 2 | 29 | Journal of Experimental & Theoretical Artificial Intelligence | en | DOI.org (Crossref) | ZSCC: 0000028 | /Users/jriedel/Zotero/storage/456HQ9C5/Barrett and Baum - 2017 - A model of pathways to artificial superintelligenc.pdf | GCRI; MetaSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
41 | Q695XMNG | journalArticle | 2015 | Baum, Seth D. | The far future argument for confronting catastrophic threats to humanity: Practical significance and alternatives | Futures | 163287 | 10.1016/j.futures.2015.03.001 | https://linkinghub.elsevier.com/retrieve/pii/S0016328715000312 | 2015-09 | 2019-12-16 2:45:25 | 2020-11-23 23:02:26 | 2019-12-16 2:45:25 | 86-96 | 72 | Futures | The far future argument for confronting catastrophic threats to humanity | en | DOI.org (Crossref) | ZSCC: 0000018 | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42 | R58MPXDG | journalArticle | 2016 | Barrett, Anthony M. | False Alarms, True Dangers? | RAND Corporation document PE-191-TSF, DOI | 2016 | 2019-12-16 2:45:51 | 2020-12-17 18:38:11 | 10 | Google Scholar | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/KVUBBYP6/Barrett - 2016 - False Alarms, True Dangers.pdf | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
43 | 2FT24BBD | journalArticle | 2016 | Baum, Seth; Denkenberger, David; Pearce, Joshua | Alternative foods as a solution to global food supply catastrophes | Solutions | 2016 | 2019-12-16 2:46:20 | 2020-12-19 23:26:59 | 4 | 7 | Google Scholar | ZSCC: 0000011 | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
44 | 24G8TGF5 | bookSection | 2016 | Baum, Seth D. | The Ethics of Outer Space: A Consequentialist Perspective | The Ethics of Space Exploration | 2016 | 2019-12-16 2:46:35 | 2020-12-17 18:38:16 | 109–123 | The Ethics of Outer Space | Springer | Google Scholar | ZSCC: 0000009 | /Users/jriedel/Zotero/storage/JX2Z5JFE/Baum - 2016 - The Ethics of Outer Space A Consequentialist Pers.pdf; /Users/jriedel/Zotero/storage/UUQALRJJ/978-3-319-39827-3_8.html | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
45 | GDTA58MT | journalArticle | 2017 | White, Trevor N.; Baum, Seth D. | Liability For Present And Future Robotics Technology | Robot Ethics 2.0: From Autonomous Cars to Artificial Intelligence | 2017 | 2019-12-16 2:46:59 | 2020-11-23 23:04:47 | 5 | Google Scholar | ZSCC: 0000002 | /Users/jriedel/Zotero/storage/46JIB29U/books.html | GCRI; MetaSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
46 | VHMRLZWT | bookSection | 2017 | Barrett, Anthony M.; Baum, Seth D. | Risk analysis and risk management for the artificial superintelligence research and development process | The Technological Singularity | 2017 | 2019-12-16 2:47:13 | 2020-11-26 1:03:14 | 127–140 | Springer | Google Scholar | ZSCC: 0000007 DOI: 10.1007/978-3-662-54033-6_6 | /Users/jriedel/Zotero/storage/RIIFRP2R/Barrett and Baum - 2017 - Risk analysis and risk management for the artifici.pdf; /Users/jriedel/Zotero/storage/XSCGP7NY/978-3-662-54033-6_6.html | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
47 | JF365X2C | conferencePaper | 2017 | Baum, Seth; Barrett, Anthony | Towards an integrated assessment of global catastrophic risk | Catastrophic and Existential Risk: Proceedings of the First Colloquium, Garrick Institute for the Risk Sciences, University of California, Los Angeles, Forthcoming | 2017 | 2019-12-16 2:47:32 | 2020-11-23 23:01:29 | Google Scholar | ZSCC: 0000006 | /Users/jriedel/Zotero/storage/V9LPQIF8/Baum and Barrett - 2017 - Towards an integrated assessment of global catastr.pdf; /Users/jriedel/Zotero/storage/78BJ9YAT/papers.html | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
48 | 5FJFCT5Y | journalArticle | 2017 | Baum, Seth; Barrett, Anthony; Yampolskiy, Roman V. | Modeling and interpreting expert disagreement about artificial superintelligence | Informatica | 2017 | 2019-12-16 2:47:43 | 2020-12-15 0:28:20 | 419–428 | 7 | 41 | Google Scholar | ZSCC: 0000007 | /Users/jriedel/Zotero/storage/HZGLAMIY/Baum et al. - 2017 - Modeling and interpreting expert disagreement abou.pdf; /Users/jriedel/Zotero/storage/GZEXZBXM/papers.html; /Users/jriedel/Zotero/storage/9UBF2B9M/papers.html | GCRI; MetaSafety | risk analysis; artificial intelligence; artificial superintelligence; expert disagreement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
49 | AVA5PW2K | journalArticle | 2015 | Baum, Seth D. | Confronting the threat of nuclear winter | Futures | 163287 | 10.1016/j.futures.2015.03.004 | https://linkinghub.elsevier.com/retrieve/pii/S0016328715000403 | 2015-09 | 2019-12-16 2:48:42 | 2020-11-23 23:02:20 | 2019-12-16 2:48:42 | 69-79 | 72 | Futures | en | DOI.org (Crossref) | ZSCC: 0000014 | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
50 | N3IHWK6C | journalArticle | 2015 | Baum, Seth D.; Denkenberger, David C.; Haqq-Misra, Jacob | Isolated refuges for surviving global catastrophes | Futures | 163287 | 10.1016/j.futures.2015.03.009 | https://linkinghub.elsevier.com/retrieve/pii/S0016328715000464 | 2015-09 | 2019-12-16 2:48:50 | 2020-12-17 18:38:20 | 2019-12-16 2:48:50 | 45-56 | 72 | Futures | en | DOI.org (Crossref) | ZSCC: 0000024 | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
51 | ARVTM9F9 | journalArticle | 2015 | Baum, Seth D.; Denkenberger, David C.; Pearce, Joshua M.; Robock, Alan; Winkler, Richelle | Resilience to global food supply catastrophes | Environment Systems and Decisions | 2194-5403, 2194-5411 | 10.1007/s10669-015-9549-2 | http://link.springer.com/10.1007/s10669-015-9549-2 | 2015-06 | 2019-12-16 2:48:57 | 2020-11-23 23:03:25 | 2019-12-16 2:48:57 | 301-313 | 2 | 35 | Environ Syst Decis | en | DOI.org (Crossref) | ZSCC: 0000035 | GCRI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
52 | 4WERWSN7 | journalArticle | 2015 | Baum, Seth D. | Risk and resilience for unknown, unquantifiable, systemic, and unlikely/catastrophic threats | Environment Systems and Decisions | 2194-5403, 2194-5411 | 10.1007/s10669-015-9551-8 | http://link.springer.com/10.1007/s10669-015-9551-8 | 2015-06 | 2019-12-16 2:49:03 | 2020-11-23 23:02:20 | 2019-12-16 2:49:03 | 229-236 | 2 | 35 | Environ Syst Decis | en | DOI.org (Crossref) | ZSCC: 0000026 | GCRI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
53 | Y98VH4U7 | journalArticle | 2015 | Baum, Seth D. | Winter-safe Deterrence: The Risk of Nuclear Winter and Its Challenge to Deterrence | Contemporary Security Policy | 1352-3260, 1743-8764 | 10.1080/13523260.2015.1012346 | https://www.tandfonline.com/doi/full/10.1080/13523260.2015.1012346 | 2015-01-02 | 2019-12-16 2:49:10 | 2020-11-23 23:02:20 | 2019-12-16 2:49:10 | 123-148 | 1 | 36 | Contemporary Security Policy | Winter-safe Deterrence | en | DOI.org (Crossref) | ZSCC: 0000021 | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
54 | CA5X6KN3 | journalArticle | 2014 | Baum, Seth D | The great downside dilemma for risky emerging technologies | Physica Scripta | 0031-8949, 1402-4896 | 10.1088/0031-8949/89/12/128004 | http://stacks.iop.org/1402-4896/89/i=12/a=128004?key=crossref.f5938bc78a3023d740968f020cfa9970 | 2014-12-01 | 2019-12-16 2:49:17 | 2020-11-23 23:01:56 | 2019-12-16 2:49:17 | 128004 | 12 | 89 | Phys. Scr. | DOI.org (Crossref) | ZSCC: 0000021 | /Users/jriedel/Zotero/storage/UC7BQJ47/Baum - 2014 - The great downside dilemma for risky emerging tech.pdf | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
55 | FBR48WMT | journalArticle | 2014 | Baum, Seth D.; Handoh, Itsuki C. | Integrating the planetary boundaries and global catastrophic risk paradigms | Ecological Economics | 9218009 | 10.1016/j.ecolecon.2014.07.024 | https://linkinghub.elsevier.com/retrieve/pii/S0921800914002262 | 2014-11 | 2019-12-16 2:49:23 | 2020-12-17 18:38:40 | 2019-12-16 2:49:23 | 13-21 | 107 | Ecological Economics | en | DOI.org (Crossref) | ZSCC: 0000033 | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
56 | CWBT7ND7 | journalArticle | 2013 | Baum, Seth D.; Wilson, Grant S. | The Ethics of Global Catastrophic Risk from Dual-Use Bioengineering | Ethics in Biology, Engineering and Medicine | 2151-805X | 10.1615/EthicsBiologyEngMed.2013007629 | http://www.dl.begellhouse.com/journals/6ed509641f7324e6,709fef245eef4861,06d520d747a5c0d1.html | 2013 | 2019-12-16 2:49:31 | 2020-11-23 23:03:33 | 2019-12-16 2:49:31 | 59-72 | 1 | 4 | Ethics Biology Eng Med | en | DOI.org (Crossref) | ZSCC: 0000007 | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
57 | PR7SY9MW | journalArticle | 2013 | Barrett, Anthony M.; Baum, Seth D.; Hostetler, Kelly | Analyzing and Reducing the Risks of Inadvertent Nuclear War Between the United States and Russia | Science & Global Security | 0892-9882, 1547-7800 | 10.1080/08929882.2013.798984 | http://www.tandfonline.com/doi/abs/10.1080/08929882.2013.798984 | 2013-05 | 2019-12-16 2:49:42 | 2020-11-23 23:00:55 | 2019-12-16 2:49:42 | 106-133 | 2 | 21 | Science & Global Security | en | DOI.org (Crossref) | ZSCC: 0000047 | GCRI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
58 | RDDGR22W | journalArticle | 2013 | Maher, Timothy; Baum, Seth | Adaptation to and Recovery from Global Catastrophe | Sustainability | 2071-1050 | 10.3390/su5041461 | http://www.mdpi.com/2071-1050/5/4/1461 | 2013-03-28 | 2019-12-16 2:49:48 | 2020-12-17 18:38:46 | 2019-12-16 2:49:48 | 1461-1479 | 4 | 5 | Sustainability | en | DOI.org (Crossref) | ZSCC: 0000044 | /Users/jriedel/Zotero/storage/G84KXN4H/Maher and Baum - 2013 - Adaptation to and Recovery from Global Catastrophe.pdf | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
59 | 53I7M6C7 | journalArticle | 2013 | Baum, Seth D.; Maher, Timothy M.; Haqq-Misra, Jacob | Double catastrophe: intermittent stratospheric geoengineering induced by societal collapse | Environment Systems & Decisions | 2194-5403, 2194-5411 | 10.1007/s10669-012-9429-y | http://link.springer.com/10.1007/s10669-012-9429-y | 2013-03 | 2019-12-16 2:49:55 | 2020-11-23 23:03:25 | 2019-12-16 2:49:55 | 168-180 | 1 | 33 | Environ Syst Decis | Double catastrophe | en | DOI.org (Crossref) | ZSCC: 0000057 | /Users/jriedel/Zotero/storage/PPM247ZT/Baum et al. - 2013 - Double catastrophe intermittent stratospheric geo.pdf | GCRI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
60 | SAKUCJQH | journalArticle | 2013 | Haqq-Misra, Jacob; Busch, Michael W.; Som, Sanjoy M.; Baum, Seth D. | The benefits and harm of transmitting into space | Space Policy | 2659646 | 10.1016/j.spacepol.2012.11.006 | https://linkinghub.elsevier.com/retrieve/pii/S0265964612001361 | 2013-02 | 2019-12-16 2:50:01 | 2020-12-17 18:38:52 | 2019-12-16 2:50:01 | 40-48 | 1 | 29 | Space Policy | en | DOI.org (Crossref) | ZSCC: 0000028 | /Users/jriedel/Zotero/storage/KWN3HUL9/Haqq-Misra et al. - 2013 - The benefits and harm of transmitting into space.pdf | GCRI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
61 | W7HHRP3G | journalArticle | 2014 | Wilson, Grant | Deepwater Horizon and the Law of the Sea: Was the Cure Worse than the Disease | BC Envtl. Aff. L. Rev. | 2014 | 2019-12-16 2:50:17 | 2020-11-23 23:05:07 | 63 | 41 | Deepwater Horizon and the Law of the Sea | Google Scholar | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/UCVLUDDL/Wilson - 2014 - Deepwater Horizon and the Law of the Sea Was the .pdf; /Users/jriedel/Zotero/storage/C7EWKQ9S/LandingPage.html | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
62 | EL9KVIHH | journalArticle | 2013 | Wilson, Grant | Minimizing global catastrophic and existential risks from emerging technologies through international law | Va. Envtl. LJ | 2013 | 2019-12-16 2:50:46 | 2020-11-23 23:05:03 | 307 | 31 | Google Scholar | ZSCC: 0000037 | /Users/jriedel/Zotero/storage/E543AJVL/LandingPage.html | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
63 | DREFD99R | journalArticle | 2013 | Baum, Seth D. | Teaching Astrobiology in a Sustainability Course | Journal of Sustainability Education | 2013 | 2019-12-16 2:51:02 | 2020-11-23 23:02:00 | 4 | Google Scholar | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/8PS2UC5R/Baum - 2013 - Teaching Astrobiology in a Sustainability Course.pdf | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
64 | 77SNYJSC | journalArticle | 2019 | Baum, Seth D. | Preparing for the unthinkable | Science | 0036-8075, 1095-9203 | 10.1126/science.aay4219 | http://www.sciencemag.org/lookup/doi/10.1126/science.aay4219 | 2019-09-20 | 2019-12-16 2:51:12 | 2020-11-23 23:02:51 | 2019-12-16 2:51:12 | 1254-1254 | 6459 | 365 | Science | en | DOI.org (Crossref) | ZSCC: 0000000 | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
65 | 6T8ACPY9 | journalArticle | 2017 | Baum, Seth D. | The Social Science of Computerized Brains – Review of The Age of Em: Work, Love, and Life When Robots Rule the Earth by Robin Hanson (Oxford University Press, 2016) | Futures | 163287 | 10.1016/j.futures.2017.03.005 | https://linkinghub.elsevier.com/retrieve/pii/S0016328716302518 | 2017-06 | 2019-12-16 2:52:35 | 2020-11-23 23:02:34 | 2019-12-16 2:52:35 | 61-63 | 90 | Futures | The Social Science of Computerized Brains – Review of The Age of Em | en | DOI.org (Crossref) | ZSCC: 0000000 | GCRI; MetaSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
66 | JXLJV4J7 | journalArticle | 2015 | Baum, Seth D.; Tonn, Bruce E. | Confronting future catastrophic threats to humanity | Futures | 163287 | 10.1016/j.futures.2015.08.004 | https://linkinghub.elsevier.com/retrieve/pii/S0016328715001135 | 2015-09 | 2019-12-16 2:52:43 | 2020-11-23 23:03:33 | 2019-12-16 2:52:43 | 1-3 | 72 | Futures | en | DOI.org (Crossref) | ZSCC: 0000011 | GCRI; MetaSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
67 | ZK3CLRHC | journalArticle | 2015 | Baum, Seth D. | Winter-Safe Deterrence as a Practical Contribution to Reducing Nuclear Winter Risk: A Reply | Contemporary Security Policy | 1352-3260, 1743-8764 | 10.1080/13523260.2015.1054101 | https://www.tandfonline.com/doi/full/10.1080/13523260.2015.1054101 | 2015-05-04 | 2019-12-16 2:53:34 | 2020-11-23 23:02:20 | 2019-12-16 2:53:34 | 387-397 | 2 | 36 | Contemporary Security Policy | Winter-Safe Deterrence as a Practical Contribution to Reducing Nuclear Winter Risk | en | DOI.org (Crossref) | ZSCC: 0000001 | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
68 | Y7R3AU6S | journalArticle | 2014 | Baum, Seth D. | Book review: Only One Chance: How Environmental Pollution Impairs Brain Development – and How to Protect the Brains of the Next Generation. | Environmental Science & Policy | 14629011 | 10.1016/j.envsci.2014.07.001 | https://linkinghub.elsevier.com/retrieve/pii/S1462901114001221 | 2014-10 | 2019-12-16 2:53:51 | 2020-11-23 23:02:04 | 2019-12-16 2:53:51 | 197-199 | 42 | Environmental Science & Policy | en | DOI.org (Crossref) | ZSCC: 0000000 | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
69 | REPS4A4Q | magazineArticle | 2014 | Baum, Seth | Film Review: Snowpiercer | Journal of Sustainability Education | http://www.susted.com/wordpress/content/film-review-snowpiercer_2014_12/ | 2014 | 2019-12-16 2:55:16 | 2020-11-23 23:01:01 | 2019-12-16 2:55:16 | Film Review | en-US | ZSCC: 0000002 | /Users/jriedel/Zotero/storage/ZU3LTXQG/film-review-snowpiercer_2014_12.html | GCRI; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
70 | L7A6MU9S | journalArticle | 2019 | Beard, Simon; Rowe, Thomas; Fox, James | An Analysis and Evaluation of Methods Currently Used to Quantify the Likelihood of Existential Hazards | Futures | 0016-3287 | 10.1016/j.futures.2019.102469 | http://www.sciencedirect.com/science/article/pii/S0016328719303313 | This paper examines and evaluates the range of methods that have been used to make quantified claims about the likelihood of Existential Hazards. In doing so, it draws on a comprehensive literature review of such claims that we present in an appendix. The paper uses an informal evaluative framework to consider the relative merits of these methods regarding their rigour, ability to handle uncertainty, accessibility for researchers with limited resources and utility for communication and policy purposes. We conclude that while there is no uniquely best way to quantify Existential Risk, different methods have their own merits and challenges, suggesting that some may be more suited to particular purposes than others. More importantly, however, we find that, in many cases, claims based on poor implementations of each method are still frequently invoked by the Existential Risk community, despite the existence of better ones. We call for a more critical approach to methodology and the use of quantified claims by people aiming to contribute research to the management of Existential Risk, and argue that a greater awareness of the diverse methods available to these researchers should form an important part of this. | 2019-12-03 | 2019-12-16 3:04:42 | 2020-12-13 17:56:01 | 2019-12-16 3:04:42 | 102469 | Futures | en | ScienceDirect | ZSCC: 0000000[s1] | /Users/jriedel/Zotero/storage/8HLCRE9M/S0016328719303313.html | CSER; MetaSafety; AmbiguosSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
71 | 85S3PSNX | bookSection | 2019 | Weitzdörfer, Julius; Beard, Simon | Law and Policy Responses to Disaster-Induced Financial Distress | Governance, Risk and Financial Impact of Mega Disasters: Lessons from Japan | 9789811390050 | https://doi.org/10.1007/978-981-13-9005-0_4 | This chapter treats disaster response policies directed at the economic recovery of private households. First, we examine problems of disaster-induced financial distress from a legal and economic perspective. We do this both qualitatively and quantitatively, and focussing on residential loans, using the victims of the 11 March 2011 tsunami as our example. Then, using doctrinal and systematic analysis, we set out the broad array of law and policy solutions tackling disaster-induced debt launched by the Japanese Government. On this basis, we assess the strengths and weaknesses of these measures in terms of their practical adequacy to prevent and mitigate financial hardship and examine them against multiple dimensions of disaster justice. We conclude with suggestions for improving financial disaster recovery by taking a prospective approach, preventing the snowballing of disaster-related losses, which we argue represents a equitable and effective way forward in allocating resources following future mega disasters. | 2019 | 2019-12-16 3:05:04 | 2020-11-23 22:42:05 | 2019-12-16 3:05:04 | 47-80 | Economics, Law, and Institutions in Asia Pacific | Springer | Singapore | en | Springer Link | ZSCC: NoCitationData[s2] ACC: 0 DOI: 10.1007/978-981-13-9005-0_4 | /Users/jriedel/Zotero/storage/VKVJK4KZ/Weitzdörfer and Beard - 2019 - Law and Policy Responses to Disaster-Induced Finan.pdf | CSER; NotSafety | Kamesaka, Akiko; Waldenberger, Franz | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
72 | PAA6PSGD | journalArticle | 2019 | Colvin, R. M.; Kemp, Luke; Talberg, Anita; Castella, Clare De; Downie, C.; Friel, S.; Grant, Will J.; Howden, Mark; Jotzo, Frank; Markham, Francis; Platow, Michael J. | Learning from the Climate Change Debate to Avoid Polarisation on Negative Emissions | Environmental Communication | 1752-4032 | 10.1080/17524032.2019.1630463 | https://doi.org/10.1080/17524032.2019.1630463 | This paper identifies critical lessons from the climate change experience to guide how communications and engagement on negative emissions can be conducted to encourage functional public and policy discourse. Negative emissions technologies present a significant opportunity for limiting climate change, and are likely to be necessary to keep warming below 2°C. While the concept of negative emissions is still in its infancy, there is evidence of nascent polarization, and a lack of nuance in discussion of individual technologies. We argue that if negative emissions technologies are to be implemented effectively and sustainably, an effective governance regime is needed; built on functional societal discourse and avoiding the ideological baggage of the broader climate change debate or the controversies concerning geoengineering. At its core, our argument is to avoid the ideological bundling of negative emissions; this can be pursued directly and via careful selection of communication frames and the use of non-partisan, trusted messengers. Whether these lessons are heeded may determine if negative emissions are governed proactively, or are distorted politically, misused and delayed. | 2019-07-25 | 2019-12-16 3:08:45 | 2020-12-15 0:27:08 | 2019-12-16 3:08:45 | 1-13 | 0 | 0 | Taylor and Francis+NEJM | ZSCC: 0000001[s1] | /Users/jriedel/Zotero/storage/KAYDVYDV/Colvin et al. - 2020 - Learning from the Climate Change Debate to Avoid P.pdf; /Users/jriedel/Zotero/storage/V9JAJ5HD/17524032.2019.html | CSER; NotSafety; AmbiguosSafety | climate engineering; Framing; geoengineering; greenhouse gas removal; ideology | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
73 | YXZSUDQC | journalArticle | 2019 | Kemp, Luke | Mediation without measures: conflict resolution in climate diplomacy | Research Handbook on Mediating International Crises | https://www.elgaronline.com/view/edcoll/9781788110693/9781788110693.00032.xml | The climate negotiations have struggled to resolve conflicts for two decades while ignoring undeveloped mediation tools in its constitution. The United Nations Framework Convention on Climate Change (UNFCCC) outlined both arbitration procedures and ‘conciliation commissions’ to oversee mediation. Both measures were to be adopted through annexes to the 1992 UNFCCC treaty. Both were never developed. Instead the negotiations are in a state of ‘procedural purgatory’ and have relied on a patchwork of informal practices, particularly smaller, exclusive meetings. The negotiations towards the Paris Agreement saw an increasing use of confined, closed-door sessions, and a mounting reliance on the power and often manipulative tactics of the Chair (facilitators) of negotiations. Such an approach is risky and prone to backfiring, such as in Copenhagen. Countries should turn towards adopting the annexes for arbitration and conciliation commissions to enable transparent and effective mediation in the post-Paris era. | 2019-03-29 | 2019-12-16 3:09:08 | 2020-11-23 22:41:19 | 2019-12-16 3:09:08 | Mediation without measures | en_US | www.elgaronline.com | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/YHB2Y44W/9781788110693.00032.html | CSER; NotSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
74 | PJTE8ZKN | bookSection | 2019 | Avin, Shahar; Amadae, S. M. | Autonomy and machine learning at the interface of nuclear weapons, computers and people | The Impact of Artificial Intelligence on Strategic Stability and Nuclear Risk | https://www.repository.cam.ac.uk/handle/1810/297703 | A new era for our species started in 1945: with the terrifying demonstration of the power of the atom bomb in Hiroshima and Nagasaki, Japan, the potential global catastrophic consequences of human technology could no longer be ignored. Within the field of global catastrophic and existential risk, nuclear war is one of the more iconic scenarios, although significant uncertainties remain about its likelihood and potential destructive magnitude. The risk posed to humanity from nuclear weapons is not static. In tandem with geopolitical and cultural changes, technological innovations could have a significant impact on how the risk of the use of nuclear weapons changes over time. Increasing attention has been given in the literature to the impact of digital technologies, and in particular autonomy and machine learning, on nuclear risk. Most of this attention has focused on ‘first-order’ effects: the introduction of technologies into nuclear command-and-control and weapon-delivery systems. This essay focuses instead on higher-order effects: those that stem from the introduction of such technologies into more peripheral systems, with a more indirect (but no less real) effect on nuclear risk. It first describes and categorizes the new threats introduced by these technologies (in section I). It then considers policy responses to address these new threats (section II). | 2019-10-10 | 2019-12-16 3:09:24 | 2020-12-18 16:35:01 | 2019-12-16 3:09:24 | Stockholm International Peace Research Institute | en | All rights reserved | www.repository.cam.ac.uk | ZSCC: 0000001 DOI: 10.17863/CAM.44758 | /Users/jriedel/Zotero/storage/XI3ZP8TL/297703.html | CSER; NotSafety; AmbiguosSafety; CFI | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
75 | R86DEQ7R | journalArticle | 2019 | Tzachor, Asaf | The Future of Feed: Integrating Technologies to Decouple Feed Production from Environmental Impacts | Industrial Biotechnology | 1550-9087 | 10.1089/ind.2019.29162.atz | https://www.liebertpub.com/doi/abs/10.1089/ind.2019.29162.atz | Population growth, an expanding middle-class, and a global shift in dietary preferences have driven an enduring demand for animal products. Since animal products are playing a vital role in human diets, their consumption is predicted to increase further. However, the great dependency of animal husbandry on global staple feed crop soybean; the environmental consequences of soybean production; and barriers for soy cropland expansion cast doubt on food system sustainability. The need to mitigate future demand for soy with other feed sources of similar nutritional profile, and thereby decouple food and feed production from ecological pressures, is compelling. Yet, the literature and science of sustainable agriculture is one of incremental improvements, featuring primarily, crop production intensification. A different, more profound approach to the design of feed systems is required to ensure sustainable food security. The question arises if alternative technologies exist to support such a design. This paper explores a particular novel configuration of four advanced technologies recently deployed in the region of Hengill, Iceland: light-emitting diode systems, advanced indoor photobioreactors, atmospheric carbon capture technology, and geothermal energy technology. In situ system analysis and data triangulation with scientific literature and data from independent sources illustrate the potential of these integrated technologies to produce algal-based animal feed. The analysis suggests that a highly sustainable soybean equivalent is technically attainable for feed purposes. The integrated system requires less than 1% of arable land and fresh water compared with soybean cultivation and is carbon negative. In addition, it provides a pesticide- and herbicide-free cultivation platform. This new configuration provides one pathway for the future of feed. | 2019-04-01 | 2019-12-16 3:09:54 | 2020-12-15 0:33:39 | 2019-12-16 3:09:54 | 52-62 | 2 | 15 | Industrial Biotechnology | The Future of Feed | liebertpub.com (Atypon) | ZSCC: 0000001 | /Users/jriedel/Zotero/storage/9HE5JEEW/Tzachor - 2019 - The Future of Feed Integrating Technologies to De.pdf; /Users/jriedel/Zotero/storage/ADR62Q5G/ind.2019.29162.html | CSER; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
76 | GAAEUYJ6 | journalArticle | 2019 | Lewis, Sophie C.; Perkins‐Kirkpatrick, Sarah E.; Althor, Glenn; King, Andrew D.; Kemp, Luke | Assessing Contributions of Major Emitters' Paris-Era Decisions to Future Temperature Extremes | Geophysical Research Letters | 1944-8007 | 10.1029/2018GL081608 | https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018GL081608 | The likelihood and severity of high-impact future temperature extremes can be reduced through climate change mitigation efforts. However, meeting the Paris Agreement warming limits requires notably stronger greenhouse gas emissions reduction efforts by major emitters than existing pledges. We examine the impact of Paris-era decision-making by the world's three largest greenhouse gas emitters (EU, USA, and China) on projected future extreme temperature events. Country-level contributions to the occurrence of future temperature extremes are calculated based on current emissions policies and sequential mitigation efforts, using a new metric called the Contribution to Excess Risk Ratio. We demonstrate the Contribution concept by applying it to extreme monthly temperature projections. In many regions, future extremes depend on the current and future carbon dioxide emissions reductions adopted by major emitters. By implementing stronger Paris-era climate pledges, major emitters can reduce the frequency of future extremes and their own calculated contributions to these temperature extremes. | 2019 | 2019-12-16 3:10:09 | 2020-11-23 22:41:25 | 2019-12-16 3:10:09 | 3936-3943 | 7 | 46 | en | ©2019. The Authors. | Wiley Online Library | ZSCC: 0000004 | /Users/jriedel/Zotero/storage/4CHLD8S7/Lewis et al. - 2019 - Assessing Contributions of Major Emitters' Paris-E.pdf; /Users/jriedel/Zotero/storage/5M4IDUPL/2018GL081608.html | CSER; NotSafety | attribution; climate change; extremes; Paris; projections | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
77 | KXPI34ZN | book | 2019 | Needham, Duncan; Weitzdörfer, Julius | Extremes | 978-1-108-45700-2 | Humanity is confronted by and attracted to extremes. Extreme events shape our thinking, feeling, and actions; they echo in our politics, media, literature, and science. We often associate extremes with crises, disasters, and risks to be averted, yet extremes also have the potential to lead us towards new horizons. Featuring essays by leading intellectuals and public figures arising from the 2017 Darwin College Lectures, this volume explores 'extreme' events, from the election of President Trump, the rise of populism, and the Brexit referendum, to the 2008 financial crisis, the Syrian war, and climate change. It also celebrates 'extreme' achievements in the realms of health, exploration, and scientific discovery. A fascinating, engaging, and timely collection of essays by renowned scholars, journalists, and intellectuals, this volume challenges our understanding of what is normal and what is truly extreme, and sheds light on some of the issues facing humanity in the twenty-first century. | 2019-03-07 | 2019-12-16 3:10:58 | 2020-11-23 22:41:32 | 187 | Cambridge University Press | en | Google Books | ZSCC: 0000003 Google-Books-ID: 6lqHDwAAQBAJ | https://books.google.com/books?id=6lqHDwAAQBAJ | CSER; MetaSafety | Science / General; Science / Astronomy; Business & Economics / Environmental Economics; Law / Natural Resources; Mathematics / Probability & Statistics / General; Nature / Natural Resources; Nature / Sky Observation; Political Science / General; Social Science / Sociology / General | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
78 | KDBKEB2I | journalArticle | 2019 | Beard, Simon | Perfectionism and the Repugnant Conclusion | The Journal of Value Inquiry | 1573-0492 | 10.1007/s10790-019-09687-4 | https://doi.org/10.1007/s10790-019-09687-4 | The Repugnant Conclusion and its paradoxes pose a significant problem for outcome evaluation. Derek Parfit has suggested that we may be able to resolve this problem by accepting a view he calls ‘Perfectionism’, which gives lexically superior value to ‘the best things in life’. In this paper, I explore perfectionism and its potential to solve this problem. I argue that perfectionism provides neither a sufficient means of avoiding the Repugnant Conclusion nor a full explanation of its repugnance. This is because even lives that are ‘barely worth living’ may contain the best things in life if they also contain sufficient ‘bad things’, such as suffering or frustration. Therefore, perfectionism can only fully explain or avoid the Repugnant Conclusion if combined with other claims, such as that bad things have an asymmetrical value relative to many good things. This combined view faces the objection that any such asymmetry implies Parfit’s ‘Ridiculous Conclusion’. However, I argue that perfectionism itself faces very similar objections, and that these are question-begging against both views. Finally, I show how the combined view that I propose not only explains and avoids the Repugnant Conclusion but also allows us to escape many of its paradoxes as well. | 2019-03-05 | 2019-12-16 3:11:49 | 2020-12-15 0:29:05 | 2019-12-16 3:11:49 | J Value Inquiry | en | Springer Link | ZSCC: 0000000[s1] | /Users/jriedel/Zotero/storage/8NPAB28B/Beard - 2019 - Perfectionism and the Repugnant Conclusion.pdf; /Users/jriedel/Zotero/storage/83ZRZ76V/Beard - 2020 - Perfectionism and the Repugnant Conclusion.pdf | CSER; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
79 | 3UAAD7IH | journalArticle | 2019 | Beard, Simon | What Is Unfair about Unequal Brute Luck? An Intergenerational Puzzle | Philosophia | 1574-9274 | 10.1007/s11406-018-00053-5 | https://doi.org/10.1007/s11406-018-00053-5 | According to Luck egalitarians, fairness requires us to bring it about that nobody is worse off than others where this results from brute bad luck, but not where they choose or deserve to be so. In this paper, I consider one type of brute bad luck that appears paradigmatic of what a Luck Egalitarian ought to be most concerned about, namely that suffered by people who are born to badly off parents and are less well off as a result. However, when we consider what is supposedly unfair about this kind of unequal brute luck, luck egalitarians face a dilemma. According to the standard account of luck egalitarianism, differential brute luck is unfair because of its effects on the distribution of goods. Yet, where some parents are worse off because they have chosen to be imprudent, it may be impossible to neutralize these effects without creating a distribution that seems at least as unfair. This, I argue, is problematic for luck egalitarianism. I, therefore, explore two alternative views that can avoid this problem. On the first of these, proposed by Shlomi Segall, the distributional effects of unequal brute luck are unfair only when they make a situation more unequal, but not when they make it more equal. On the second, it is the unequal brute luck itself, rather than its distributional effects, that is unfair. I conclude with some considerations in favour of this second view, while accepting that both are valid responses to the problem I describe. | 2019-09-01 | 2019-12-16 3:12:31 | 2020-12-15 0:35:13 | 2019-12-16 3:12:31 | 1043-1051 | 4 | 47 | Philosophia | What Is Unfair about Unequal Brute Luck? | en | Springer Link | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/UG6JCBMI/Beard - 2019 - What Is Unfair about Unequal Brute Luck An Interg.pdf; /Users/jriedel/Zotero/storage/ESAP965I/Beard - 2019 - What Is Unfair about Unequal Brute Luck An Interg.pdf | CSER; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
80 | VSS2LUIR | conferencePaper | 2019 | Hernandez-Orallo, Jose; Martınez-Plumed, Fernando; Avin, Shahar | Surveying Safety-relevant AI Characteristics | 1st AAAI's Workshop on Artificial Intelligence Safety (SafeAI) | The current analysis in the AI safety literature usually combines a risk or safety issue (e.g., interruptibility) with a particular paradigm for an AI agent (e.g., reinforcement learning). However, there is currently no survey of safety-relevant characteristics of AI systems that may reveal neglected areas of research or suggest to developers what design choices they could make to avoid or minimise certain safety concerns. In this paper, we take a first step towards delivering such a survey, from two angles. The first features AI system characteristics that are already known to be relevant to safety concerns, including internal system characteristics, characteristics relating to the effect of the external environment on the system, and characteristics relating to the effect of the system on the target environment. The second presents a brief survey of a broad range of AI system characteristics that could prove relevant to safety research, including types of interaction, computation, integration, anticipation, supervision, modification, motivation and achievement. This survey enables further work in exploring system characteristics and design choices that affect safety concerns. | 2019 | 2019-12-16 3:12:56 | 2020-11-23 22:40:32 | 9 | en | Zotero | ZSCC: NoCitationData[s7] ACC: 4 J: 4 | /Users/jriedel/Zotero/storage/N7D78GHR/Hernandez-Orallo et al. - Surveying Safety-relevant AI Characteristics.pdf | CSER; TechSafety; CFI | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
81 | TXDA45PL | journalArticle | MacAskill, William; Vallinder, Aron; Shulman, Carl; Österheld, Caspar; Treutlein, Johannes | The Evidentialist’s Wager | Journal of Philosophy | Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act in accordance with EDT. First, we claim that that the appropriate response to normative uncertainty is to hedge one’s bets. That is, if the stakes are much higher on one theory than another, and the credences you assign to each of these theories aren’t very different, then it’s appropriate to choose the option which performs best on the high-stakes theory. Second, we show that, given the assumption of altruism, the existence of correlated decision-makers will increase the stakes for EDT but leave the stakes for CDT unaffected. Together these two claims imply that whenever there are sufficiently many correlated agents, the appropriate response is to act in accordance with EDT. | forthcoming | 2019-12-16 3:16:50 | 2020-12-20 22:46:06 | en | Zotero | ZSCC: 0000000[s0] | /Users/jriedel/Zotero/storage/KRYU35VD/MacAskill et al. - The Evidentialist’s Wager.pdf | GPI; TechSafety; AmbiguosSafety; CLTR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
82 | 9V6X6IDU | manuscript | 2019 | Mogensen, Andreas L | Staking Our Future: Deontic Longtermism and the Non-Identity Problem | Greaves and MacAskill argue for a xiological longtermism, according to which, in a wide class of decision contexts, the option that is e x ante best is the option that corresponds to the best lottery over histories from t onwards, where t i s some date far in the future. They suggest that a s takes-sensitivity argument may be used to derive d eontic longtermism from axiological longtermism, where deontic longtermism holds that in a wide class of decision contexts, the option one ought to choose is the option that corresponds to the best lottery over histories from t onwards, where t is some date far in the future. This argument appeals to the Stakes Principle: when the axiological stakes are high, non-consequentialist constraints and prerogatives tend to be insignificant in comparison, so that what one ought to do is simply whichever option is best. I argue that there are strong grounds on which to reject the Stakes Principle. Furthermore, by reflecting on the Non-Identity Problem, I argue that there are plausible grounds for denying the existence of a sound argument from axiological longtermism to deontic longtermism insofar as we are concerned with ways of improving the value of the future of the kind that are focal in Greaves and MacAskill’s presentation. | 2019 | 2019-12-16 3:18:48 | 2020-12-17 18:34:46 | en | Zotero | ZSCC: 0000000[s0] | /Users/jriedel/Zotero/storage/Q7ITRU2U/Mogensen - Staking Our Future Deontic Longtermism and the No.pdf | GPI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
83 | SMXWRWH5 | journalArticle | 2016 | MacAskill, William | When Should an Effective Altruist Donate? | Forum on Philanthropy and the Public Good | https://lawdigitalcommons.bc.edu/philanthropy-forum/givingscholars2016/program/10 | 2016-09-23 | 2019-12-16 3:20:16 | 2020-11-24 4:10:23 | ZSCC: 0000002 | /Users/jriedel/Zotero/storage/SQGUGPBM/10.html | GPI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
84 | Y2LTNLGN | manuscript | 2019 | MacAskill, William; Mogensen, Andreas | The paralysis argument | https://globalprioritiesinstitute.org/william-macaskill-andreas-mogensen-the-paralysis-argument/ | Given plausible assumptions about the long-run impact of our everyday actions, we show that standard non-consequentialist constraints on doing harm entail that we should try to do as little as possible in our lives. We call this the Paralysis Argument. After laying out the argument, we consider and respond to... | 2019-09-24 | 2019-12-16 3:20:52 | 2020-12-17 18:34:51 | 2019-12-16 3:20:52 | William MacAskill, Andreas Mogensen | en-US | ZSCC: NoCitationData[s3] ACC: N/F | /Users/jriedel/Zotero/storage/HJRNS6JA/MacAskill_Mogensen_Paralysis_Argument.pdf | GPI; NotSafety; AmbiguosSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
85 | IIH3XR5Z | journalArticle | 2017 | MacAskill, William | Effective Altruism: Introduction | Essays in Philosophy | 15260569 | 10.7710/1526-0569.1580 | http://commons.pacificu.edu/eip/vol18/iss1/1 | 2017 | 2019-12-16 3:23:58 | 2020-11-24 4:10:23 | 2019-12-16 3:23:58 | 1580 | 1 | 18 | Essays Philos | Effective Altruism | en | DOI.org (Crossref) | ZSCC: 0000013 | /Users/jriedel/Zotero/storage/6RWHX8EF/MacAskill - 2017 - Effective Altruism Introduction.pdf | GPI; NotSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
86 | WQULWUPT | manuscript | 2019 | Majha, Arushi; Sarkar, Sayan; Zagami, Davide | Categorizing Wireheading in Partially Embedded Agents | http://arxiv.org/abs/1906.09136 | $\textit{Embedded agents}$ are not explicitly separated from their environment, lacking clear I/O channels. Such agents can reason about and modify their internal parts, which they are incentivized to shortcut or $\textit{wirehead}$ in order to achieve the maximal reward. In this paper, we provide a taxonomy of ways by which wireheading can occur, followed by a definition of wirehead-vulnerable agents. Starting from the fully dualistic universal agent AIXI, we introduce a spectrum of partially embedded agents and identify wireheading opportunities that such agents can exploit, experimentally demonstrating the results with the GRL simulation platform AIXIjs. We contextualize wireheading in the broader class of all misalignment problems - where the goals of the agent conflict with the goals of the human designer - and conjecture that the only other possible type of misalignment is specification gaming. Motivated by this taxonomy, we define wirehead-vulnerable agents as embedded agents that choose to behave differently from fully dualistic agents lacking access to their internal parts. | 2019-06-21 | 2019-12-16 3:26:47 | 2020-12-20 21:03:31 | 2019-12-16 3:26:47 | arXiv.org | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/FW9DUD8B/Majha et al. - 2019 - Categorizing Wireheading in Partially Embedded Age.pdf; /Users/jriedel/Zotero/storage/D6MAQKID/Majha et al. - 2019 - Categorizing Wireheading in Partially Embedded Age.pdf; /Users/jriedel/Zotero/storage/ZVW9GA8B/1906.html; /Users/jriedel/Zotero/storage/TH9QVNMC/1906.html | AI-Safety-Camp; TechSafety | Computer Science - Artificial Intelligence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
87 | H9VMVGDQ | blogPost | 2019 | Kovarik, Vojta | AI Safety Debate and Its Applications | LessWrong | https://www.lesswrong.com/posts/5Kv2qNfRyXXihNrx2/ai-safety-debate-and-its-applications | All of the experimental work and some of the theoretical work has been done jointly with Anna Gajdova, David Lindner, Lukas Finnveden, and Rajashree Agrawal as part of the third AI Safety Camp. We are grateful to Ryan Carey and Geoffrey Irving for the advice regarding this project. The remainder of the theoretical part relates to my stay at FHI, and I would like to thank the above people, Owain Evans, Michael Dennis, Ethan Perez, Stuart Armstrong, and Max Daniel for comments/discussions. -------------------------------------------------------------------------------- Debate is a recent proposal for AI alignment, which naturally incorporates elicitation of human preferences and has the potential to offload the costly search for flaws in an AI’s suggestions onto the AI. After briefly recalling the intuition behind debate, we list the main open problems surrounding it and summarize how the existing work on debate addresses them. Afterward, we describe, and distinguish between, Debate games and their different applications in more detail. We also formalize what it means for a debate to be truth-promoting. Finally, we present results of our experiments on Debate games and Training via Debate on MNIST and fashion MNIST. DEBATE GAMES AND WHY THEY ARE USEFUL Consider an answer A to some question Q --- for example, "Where should I go for a vacation?" and "Alaska". Rather than directly verifying whether A is an accurate answer to Q, it might be easier to first decompose A into lower-level components (How far/expensive is it? Do they have nice beaches? What is the average temperature? What language do they speak?). Moreover, it isn't completely clear what to do even if we know the relevant facts --- indeed, how does Alaska's cold weather translate to a preference for Alaska from 0 to 10? And how does this preference compare to English being spoken in Alaska? As an alternative, we can hold a debate between two competing answers A and A′="Bali" to Q. This allows strategic de | 2019-07-23 | 2019-12-16 3:27:22 | 2020-12-15 0:20:27 | 2019-12-16 3:27:22 | ZSCC: NoCitationData[s2] ACC: N/A | /Users/jriedel/Zotero/storage/RMIEP9I5/ai-safety-debate-and-its-applications.html | AI-Safety-Camp; TechSafety | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
88 | 5S7B7A3Z | manuscript | 2019 | Mancuso, Jason; Kisielewski, Tomasz; Lindner, David; Singh, Alok | Detecting Spiky Corruption in Markov Decision Processes | http://arxiv.org/abs/1907.00452 | Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption. | 2019-06-30 | 2019-12-16 3:27:42 | 2020-12-20 21:06:04 | 2019-12-16 3:27:42 | arXiv.org | ZSCC: 0000000 arXiv: 1907.00452 | /Users/jriedel/Zotero/storage/JH5MSB4D/Mancuso et al. - 2019 - Detecting Spiky Corruption in Markov Decision Proc.pdf; /Users/jriedel/Zotero/storage/TWFBDMM3/Mancuso et al. - 2019 - Detecting Spiky Corruption in Markov Decision Proc.pdf; /Users/jriedel/Zotero/storage/LJPM3YNW/1907.html | AI-Safety-Camp; TechSafety | Statistics - Machine Learning; Computer Science - Machine Learning | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
89 | 64D2XGY2 | manuscript | 2019 | Evans, Owain; Saunders, William; Stuhlmüller, Andreas | Machine Learning Projects for Iterated Distillation and Amplification | Iterated Distillation and Amplification (IDA) is a framework for training ML models. IDA is related to existing frameworks like imitation learning and reinforcement learning, but it aims to solve tasks for which humans cannot construct a suitable reward function or solve directly. | 2019 | 2019-12-16 3:29:57 | 2020-12-19 23:09:19 | en | Zotero | ZSCC: NoCitationData[s2] ACC: 0 J: 0 | /Users/jriedel/Zotero/storage/7A2HYTLB/Evans et al. - Machine Learning Projects for Iterated Distillatio.pdf | Ought; TechSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
90 | 5MKSVVIK | journalArticle | 2017 | Sotala, Kaj; Gloor, Lukas | Superintelligence As a Cause or Cure For Risks of Astronomical Suffering | Informatica | 1854-3871 | http://www.informatica.si/index.php/informatica/article/view/1877 | Discussions about the possible consequences of creating superintelligence have included the possibility of existential risk , often understood mainly as the risk of human extinction. We argue that suffering risks (s-risks) , where an adverse outcome would bring about severe suffering on an astronomical scale, are risks of a comparable severity and probability as risks of extinction. Preventing them is the common interest of many different value systems. Furthermore, we argue that in the same way as superintelligent AI both contributes to existential risk but can also help prevent it, superintelligent AI can both be a suffering risk or help avoid it. Some types of work aimed at making superintelligent AI safe will also help prevent suffering risks, and there may also be a class of safeguards for AI that helps specifically against s-risks. | 2017-12-27 | 2019-12-16 3:31:47 | 2020-12-15 0:31:14 | 2019-12-16 3:31:47 | 4 | 41 | en | I assign to Informatica , An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems. I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use. In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree. I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement. Copyright © Slovenian Society Informatika | www.informatica.si | ZSCC: 0000020 | /Users/jriedel/Zotero/storage/JYYGRVPL/Sotala and Gloor - 2017 - Superintelligence As a Cause or Cure For Risks of .pdf; /Users/jriedel/Zotero/storage/5XNXDN9J/Sotala and Gloor - 2017 - Superintelligence As a Cause or Cure For Risks of .pdf; /Users/jriedel/Zotero/storage/YRI8ZR62/1877.html; /Users/jriedel/Zotero/storage/FFIG9CST/1877.html | MetaSafety; CLTR | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
91 | L25TRIUS | manuscript | 2016 | Tomasik, Brian | How the Simulation Argument Dampens Future Fanaticism | Some effective altruists assume that most of the expected impact of our actions comes from how we influence the very long-term future of Earthoriginating intelligence over the coming ∼billions of years. According to this view, helping humans and animals in the short term matters, but it mainly only matters via effects on far-future outcomes. | 2016 | 2019-12-16 3:31:52 | 2020-12-19 23:08:36 | en | Zotero | ZSCC: 0000000 | /Users/jriedel/Zotero/storage/2WDB2SAZ/Tomasik - How the Simulation Argument Dampens Future Fanatic.pdf | MetaSafety; CLTR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
92 | MCJDGF4F | manuscript | 2011 | Tomasik, Brian | Risks of Astronomical Future Suffering | It’s far from clear that human values will shape an Earth-based space-colonization wave, but even if they do, it seems more likely that space colonization will increase total suffering rather than decrease it. That said, other people care a lot about humanity’s survival and spread into the cosmos, so I think suffering reducers should let others pursue their spacefaring dreams in exchange for stronger safety measures against future suffering. In general, I encourage people to focus on making an intergalactic future more humane if it happens rather than making sure there will be an intergalactic future. | 2011 | 2019-12-16 3:31:59 | 2020-12-19 23:08:19 | en | Zotero | ZSCC: 0000010 | /Users/jriedel/Zotero/storage/A6H9AJB6/Tomasik - Risks of Astronomical Future Suffering.pdf | MetaSafety; CLTR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
93 | RSILV8MA | manuscript | 2016 | Gloor, Lukas | Suffering-focused AI safety: Why "fail-safe'" measures might be our top intervention | AI-safety efforts focused on suffering reduction should place particular emphasis on avoiding risks of astronomical disvalue. Among the cases where uncontrolled AI destroys humanity, outcomes might still differ enormously in the amounts of suffering produced. Rather than concentrating all our efforts on a specific future we would like to bring about, we should identify futures we least want to bring about and work on ways to steer AI trajectories around these. In particular, a “fail-safe”1 approach to AI safety is especially promising because avoiding very bad outcomes might be much easier than making sure we get everything right. This is also a neglected cause despite there being a broad consensus among different moral views that avoiding the creation of vast amounts of suffering in our future is an ethical priority. | 2016 | 2019-12-16 3:33:05 | 2020-12-19 23:08:11 | en | Zotero | ZSCC: 0000005 | /Users/jriedel/Zotero/storage/V3VLPG7F/Gloor - Suffering-focused AI safety Why ``fail-safe'' mea.pdf | MetaSafety; CLTR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
94 | CDIMI5AM | manuscript | 2019 | Roy, Mati | AI Safety Open Problems | https://docs.google.com/document/d/1J2fOOF-NYiPC0-J3ZGEfE0OhA-QcOInhlvWjr1fAsS0/edit?usp=embed_facebook | Created: 2018-11-08 | Updated: 2019-11-2 | Suggestions: please make suggestions directly in this Doc | List maintainer: Mati Roy (contact@matiroy.com) AI Safety Open Problems Technical AGI safety research outside AI: https://forum.effectivealtruism.org/posts/2e9NDGiXt8PjjbTMC/technical-agi-safet... | 2019 | 2019-12-16 19:57:56 | 2020-11-24 15:31:50 | 2019-12-16 19:57:56 | en | Google Docs | ZSCC: NoCitationData[s2] ACC: N/A | /Users/jriedel/Zotero/storage/2UYJB8JN/edit.html | Ought; TechSafety | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
95 | EXCN4PKI | blogPost | 2019 | Christiano, Paul | What failure looks like | AI Alignment Forum | https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like | The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity. I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts: * Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.") * Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons [https://arbital.com/p/daemons/].) I think these are the most important problems if we fail to solve intent alignment [https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6]. In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years. With fast enough takeoff, my expectations start to look more like the caricature---this post envisions reasonably broad deployment of AI, which becomes less and less likely as things get faster. I think the basic problems are still essentially the same though, just occurring within an AI lab rather than across the world. (None of the concerns in this post are novel.) PART I: YOU GET WHAT YOU MEASURE If I want to convince Bob to vote for Alice, I can experiment with many different persuasion strategies and see which ones work. Or I can build good predictive models of Bob’s behavior and then search for actions that will lead him to vote for Alice. These are powerful techniques for achieving any goal that can be ea | 2019 | 2019-12-16 19:59:17 | 2020-12-15 0:35:11 | 2019-12-16 19:59:17 | ZSCC: NoCitationData[s4] ACC: N/A | /Users/jriedel/Zotero/storage/4EEGJC93/what-failure-looks-like.html; /Users/jriedel/Zotero/storage/AG4TR3LY/what-failure-looks-like.html | TechSafety; Open-AI | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
96 | I3ICAKYP | journalArticle | 2019 | Irving, Geoffrey; Askell, Amanda | AI Safety Needs Social Scientists | Distill | 2476-0757 | 10.23915/distill.00014 | https://distill.pub/2019/safety-needs-social-scientists | 2019-02-19 | 2019-12-16 20:00:35 | 2020-11-24 1:19:44 | 2019-12-16 20:00:35 | 10.23915/distill.00014 | 2 | 4 | Distill | DOI.org (Crossref) | ZSCC: 0000009 | /Users/jriedel/Zotero/storage/NNHEHKHY/distill-social-scientists.html | MetaSafety; Open-AI | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
97 | RTQNCDFI | conferencePaper | 2019 | Clark, Jack; Hadfield, Gillian K | Regulatory Markets for AI Safety | https://arxiv.org/abs/2001.00078 | We propose a new model for regulation to achieve AI safety: global regulatory markets. We first sketch the model in general terms and provide an overview of the costs and benefits of this approach. We then demonstrate how the model might work in practice: responding to the risk of adversarial attacks on AI models employed in commercial drones. | 2019 | 2019-12-16 20:02:07 | 2020-12-20 22:47:55 | en | Zotero | ZSCC: 0000003 | /Users/jriedel/Zotero/storage/TQSIXZCP/Clark and Hadfield - 2019 - Regulatory Markets for AI Safety.pdf; /Users/jriedel/Zotero/storage/REZ5MP38/2001.html; /Users/jriedel/Zotero/storage/8RC62RQZ/Clark and Hadfield - 2019 - REGULATORY MARKETS FOR AI SAFETY.pdf | MetaSafety; Open-AI | Computer Science - Computers and Society; Economics - General Economics | Safe Machine Learning workshop at ICLR, 2019 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
98 | DQKB3KDR | manuscript | 2019 | Askell, Amanda; Brundage, Miles; Hadfield, Gillian | The Role of Cooperation in Responsible AI Development | http://arxiv.org/abs/1907.04534 | In this paper, we argue that competitive pressures could incentivize AI companies to underinvest in ensuring their systems are safe, secure, and have a positive social impact. Ensuring that AI systems are developed responsibly may therefore require preventing and solving collective action problems between companies. We note that there are several key factors that improve the prospects for cooperation in collective action problems. We use this to identify strategies to improve the prospects for industry cooperation on the responsible development of AI. | 2019-07-10 | 2019-12-16 20:04:39 | 2020-12-20 22:15:24 | 2019-12-16 20:04:39 | arXiv.org | ZSCC: 0000006 arXiv: 1907.04534 | /Users/jriedel/Zotero/storage/WINXA8SC/Askell et al. - 2019 - The Role of Cooperation in Responsible AI Developm.pdf; /Users/jriedel/Zotero/storage/7QDSQFZ9/1907.html | MetaSafety; Open-AI | Computer Science - Artificial Intelligence; Computer Science - Computers and Society; K.1; K.4.1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
99 | XCZYZNXA | conferencePaper | 2019 | Nakkiran, Preetum; Kaplun, Gal; Kalimeris, Dimitris; Yang, Tristan; Edelman, Benjamin L.; Zhang, Fred; Barak, Boaz | SGD on Neural Networks Learns Functions of Increasing Complexity | Advances in Neural Information Processing Systems 32 (NeurIPS 2019) | http://arxiv.org/abs/1905.11604 | We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is "retained" throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information. | 2019-05-28 | 2019-12-16 20:04:44 | 2020-12-20 22:49:00 | 2019-12-16 20:04:44 | arXiv.org | ZSCC: 0000017 arXiv: 1905.11604 | /Users/jriedel/Zotero/storage/SHCPCHBI/Nakkiran et al. - 2019 - SGD on Neural Networks Learns Functions of Increas.pdf; /Users/jriedel/Zotero/storage/NXKI7H2T/1905.html | Other-org; NotSafety | Statistics - Machine Learning; Computer Science - Neural and Evolutionary Computing; Computer Science - Machine Learning | NeurIPS 2019 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100 | VUYT4E9T | manuscript | 2019 | Kang, Daniel; Sun, Yi; Brown, Tom; Hendrycks, Dan; Steinhardt, Jacob | Transfer of Adversarial Robustness Between Perturbation Types | http://arxiv.org/abs/1905.01034 | We study the transfer of adversarial robustness of deep neural networks between different perturbation types. While most work on adversarial examples has focused on $L_\infty$ and $L_2$-bounded perturbations, these do not capture all types of perturbations available to an adversary. The present work evaluates 32 attacks of 5 different types against models adversarially trained on a 100-class subset of ImageNet. Our empirical results suggest that evaluating on a wide range of perturbation sizes is necessary to understand whether adversarial robustness transfers between perturbation types. We further demonstrate that robustness against one perturbation type may not always imply and may sometimes hurt robustness against other perturbation types. In light of these results, we recommend evaluation of adversarial defenses take place on a diverse range of perturbation types and sizes. | 2019-05-03 | 2019-12-16 20:04:46 | 2020-12-20 22:17:43 | 2019-12-16 20:04:45 | arXiv.org | ZSCC: 0000011 arXiv: 1905.01034 | /Users/jriedel/Zotero/storage/9L792D68/Kang et al. - 2019 - Transfer of Adversarial Robustness Between Perturb.pdf; /Users/jriedel/Zotero/storage/9YPPQGQA/1905.html | TechSafety; Open-AI | Computer Science - Artificial Intelligence; Statistics - Machine Learning; Computer Science - Cryptography and Security; Computer Science - Machine Learning |