| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Safety Type | Year | Org(s) | Item Type | Citations | Authors | Title | Publication Title | ||||||||||||||||||
2 | TechSafety | 2017 | <Other org> | manuscript | 836 | Doshi-Velez, Finale; Kim, Been | Towards A Rigorous Science of Interpretable Machine Learning | |||||||||||||||||||
3 | TechSafety | 2017 | DeepMind | conferencePaper | 822 | Lakshminarayanan, Balaji; Pritzel, Alexander; Blundell, Charles | Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles | arXiv:1612.01474 [cs, stat] | ||||||||||||||||||
4 | TechSafety | 2016 | Open-AI | manuscript | 724 | Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan | Concrete Problems in AI Safety | |||||||||||||||||||
5 | MetaSafety | 2016 | FHI | bookSection | 365 | Müller, Vincent C.; Bostrom, Nick | Future progress in artificial intelligence: A survey of expert opinion | Fundamental issues of artificial intelligence | ||||||||||||||||||
6 | TechSafety | 2017 | DeepMind; Open-AI | conferencePaper | 314 | Christiano, Paul; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario | Deep reinforcement learning from human preferences | Advances in Neural Information Processing Systems 30 (NIPS 2017) | ||||||||||||||||||
7 | TechSafety | 2017 | Open-AI | conferencePaper | 310 | Achiam, Joshua; Held, David; Tamar, Aviv; Abbeel, Pieter | Constrained policy optimization | Proceedings of the 34th International Conference on Machine Learning | ||||||||||||||||||
8 | MetaSafety | 2018 | BERI; CFI; CSER; FHI; Open-AI | report | 244 | Brundage, Miles; Avin, Shahar; Clark, Jack; Toner, Helen; Eckersley, Peter; Garfinkel, Ben; Dafoe, Allan; Scharre, Paul; Zeitzoff, Thomas; Filar, Bobby; Anderson, Hyrum; Roff, Heather; Allen, Gregory C.; Steinhardt, Jacob; Flynn, Carrick; hÉigeartaigh, Seán Ó; Beard, Simon; Belfield, Haydn; Farquhar, Sebastian; Lyle, Clare; Crootof, Rebecca; Evans, Owain; Page, Michael; Bryson, Joanna; Yampolskiy, Roman; Amodei, Dario | The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation | |||||||||||||||||||
9 | TechSafety | 2016 | CHAI | conferencePaper | 239 | Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart | Cooperative Inverse Reinforcement Learning | Advances in Neural Information Processing Systems 29 (NIPS 2016) | ||||||||||||||||||
10 | TechSafety | 2018 | DeepMind | conferencePaper | 180 | Uesato, Jonathan; O'Donoghue, Brendan; Oord, Aaron van den; Kohli, Pushmeet | Adversarial Risk and the Dangers of Evaluating Against Weak Attacks | Proceedings of the 35th International Conference on Machine Learning | ||||||||||||||||||
11 | TechSafety | 2018 | DeepMind | conferencePaper | 148 | Rabinowitz, Neil C.; Perbet, Frank; Song, H. Francis; Zhang, Chiyuan; Eslami, S. M. Ali; Botvinick, Matthew | Machine Theory of Mind | Proceedings of the 35th International Conference on Machine Learning | ||||||||||||||||||
12 | TechSafety | 2018 | DeepMind | conferencePaper | 145 | Dvijotham, Krishnamurthy; Stanforth, Robert; Gowal, Sven; Mann, Timothy; Kohli, Pushmeet; Kohli, Pushmeet | A Dual Approach to Scalable Verification of Deep Networks | |||||||||||||||||||
13 | TechSafety | 2019 | <Other org> | journalArticle | 134 | Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Chrabaszcz, Patryk; Cheney, Nick; Cully, Antoine; Doncieux, Stephane; Dyer, Fred C.; Ellefsen, Kai Olav; Feldt, Robert; Fischer, Stephan; Forrest, Stephanie; Frénoy, Antoine; Gagné, Christian; Goff, Leni Le; Grabowski, Laura M.; Hodjat, Babak; Hutter, Frank; Keller, Laurent; Knibbe, Carole; Krcah, Peter; Lenski, Richard E.; Lipson, Hod; MacCurdy, Robert; Maestre, Carlos; Miikkulainen, Risto; Mitri, Sara; Moriarty, David E.; Mouret, Jean-Baptiste; Nguyen, Anh; Ofria, Charles; Parizeau, Marc; Parsons, David; Pennock, Robert T.; Punch, William F.; Ray, Thomas S.; Schoenauer, Marc; Shulte, Eric; Sims, Karl; Stanley, Kenneth O.; Taddei, François; Tarapore, Danesh; Thibault, Simon; Weimer, Westley; Watson, Richard; Yosinski, Jason | The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities | Artificial Life | ||||||||||||||||||
14 | TechSafety | 2019 | DeepMind | conferencePaper | 132 | Nalisnick, Eric; Matsukawa, Akihiro; Teh, Yee Whye; Gorur, Dilan; Lakshminarayanan, Balaji | Do Deep Generative Models Know What They Don't Know? | arXiv:1810.09136 [cs, stat] | ||||||||||||||||||
15 | TechSafety | 2017 | CHAI | conferencePaper | 131 | Hadfield-Menell, Dylan; Milli, Smitha; Abbeel, Pieter; Russell, Stuart; Dragan, Anca | Inverse Reward Design | Advances in Neural Information Processing Systems 30 (NIPS 2017) | ||||||||||||||||||
16 | TechSafety | 2017 | DeepMind | manuscript | 130 | Leike, Jan; Martic, Miljan; Krakovna, Victoria; Ortega, Pedro A.; Everitt, Tom; Lefrancq, Andrew; Orseau, Laurent; Legg, Shane | AI Safety Gridworlds | |||||||||||||||||||
17 | TechSafety | 2019 | DeepMind | conferencePaper | 126 | Ovadia, Yaniv; Fertig, Emily; Ren, Jie; Nado, Zachary; Sculley, D.; Nowozin, Sebastian; Dillon, Joshua V.; Lakshminarayanan, Balaji; Snoek, Jasper | Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift | Advances in Neural Information Processing Systems, 2019 | ||||||||||||||||||
18 | TechSafety | 2018 | <Other org> | conferencePaper | 123 | Koller, Torsten; Berkenkamp, Felix; Turchetta, Matteo; Krause, Andreas | Learning-based Model Predictive Control for Safe Exploration | 2018 IEEE Conference on Decision and Control (CDC) | ||||||||||||||||||
19 | MetaSafety | 2016 | <Other org> | book | 120 | Hanson, Robin | The Age of Em: Work, Love, and Life when Robots Rule the Earth | |||||||||||||||||||
20 | MetaSafety | 2018 | <Other org> | journalArticle | 117 | Rahwan, Iyad | Society-in-the-loop: programming the algorithmic social contract | Ethics and Information Technology | ||||||||||||||||||
21 | TechSafety | 2016 | CHAI | conferencePaper | 113 | Sadigh, Dorsa; Sastry, S. Shankar; Seshia, Sanjit A.; Dragan, Anca | Information gathering actions over human internal state | 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) | ||||||||||||||||||
22 | TechSafety | 2017 | <Other org> | conferencePaper | 101 | Conitzer, Vincent; Sinnott-Armstrong, Walter; Borg, Jana Schaich; Deng, Yuan; Kramer, Max | Moral Decision Making Frameworks for Artificial Intelligence | AAAI Workshops, 2017 | ||||||||||||||||||
23 | TechSafety | 2019 | DeepMind | conferencePaper | 94 | Gowal, Sven; Dvijotham, Krishnamurthy; Stanforth, Robert; Bunel, Rudy; Qin, Chongli; Uesato, Jonathan; Arandjelovic, Relja; Mann, Timothy; Kohli, Pushmeet | On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | arXiv:1810.12715 [cs, stat] | ||||||||||||||||||
24 | MetaSafety | 2016 | FHI | journalArticle | 77 | Armstrong, Stuart; Bostrom, Nick; Shulman, Carl | Racing to the precipice: a model of artificial intelligence development | AI & society | ||||||||||||||||||
25 | TechSafety | 2018 | DeepMind | conferencePaper | 77 | Farajtabar, Mehrdad; Chow, Yinlam; Ghavamzadeh, Mohammad | More Robust Doubly Robust Off-policy Evaluation | Proceedings of the 35th International Conference on Machine Learning | ||||||||||||||||||
26 | TechSafety | 2016 | DeepMind; FHI | conferencePaper | 75 | Orseau, Laurent; Armstrong, Stuart | Safely Interruptible Agents | |||||||||||||||||||
27 | TechSafety | 2018 | <Other org> | conferencePaper | 73 | Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Lesser, Victor R.; Yang, Qiang | Building Ethics into Artificial Intelligence | Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) | ||||||||||||||||||
28 | TechSafety | 2018 | FHI; Ought | conferencePaper | 70 | Saunders, William; Sastry, Girish; Stuhlmueller, Andreas; Evans, Owain | Trial without Error: Towards Safe Reinforcement Learning via Human Intervention | Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems | ||||||||||||||||||
29 | TechSafety | 2016 | FHI; Ought | conferencePaper | 65 | Evans, Owain; Stuhlmüller, Andreas; Goodman, Noah | Learning the preferences of ignorant, inconsistent agents | Thirtieth AAAI Conference on Artificial Intelligence | ||||||||||||||||||
30 | MetaSafety | 2016 | <Other org> | journalArticle | 64 | Gurkaynak, Gonenc; Yilmaz, Ilay; Haksever, Gunes | Stifling artificial intelligence: Human perils | Computer Law & Security Review | ||||||||||||||||||
31 | MetaSafety | 2017 | FHI | journalArticle | 64 | Bostrom, Nick | Strategic implications of openness in AI development | Global Policy | ||||||||||||||||||
32 | TechSafety | 2018 | DeepMind | manuscript | 64 | Dvijotham, Krishnamurthy; Gowal, Sven; Stanforth, Robert; Arandjelovic, Relja; O'Donoghue, Brendan; Uesato, Jonathan; Kohli, Pushmeet | Training verified learners with learned verifiers | |||||||||||||||||||
33 | TechSafety | 2017 | CHAI | conferencePaper | 62 | Basu, C.; Yang, Q.; Hungerman, D.; Sinahal, M.; Draqan, A. D. | Do You Want Your Autonomous Car to Drive Like You? | 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI | ||||||||||||||||||
34 | MetaSafety | 2018 | FHI | report | 59 | Dafoe, Allan | AI governance: a research agenda | |||||||||||||||||||
35 | MetaSafety | 2016 | <Other org> | bookSection | 58 | Pistono, Federico; Yampolskiy, Roman V. | Unethical Research: How to Create a Malevolent Artificial Intelligence | The Age of Artificial Intelligence: An Exploration | ||||||||||||||||||
36 | TechSafety | 2020 | MIRI | journalArticle | 58 | Taylor, Jessica; Yudkowsky, Eliezer; LaVictoire, Patrick; Critch, Andrew | Alignment for Advanced Machine Learning Systems | Ethics of Artificial Intelligence | ||||||||||||||||||
37 | MetaSafety | 2016 | <Other org> | manuscript | 54 | Yampolskiy, Roman V.; Spellchecker, M. S. | Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures | |||||||||||||||||||
38 | TechSafety | 2018 | DeepMind | manuscript | 54 | Dalal, Gal; Dvijotham, Krishnamurthy; Vecerik, Matej; Hester, Todd; Paduraru, Cosmin; Tassa, Yuval | Safe Exploration in Continuous Action Spaces | |||||||||||||||||||
39 | TechSafety | 2018 | CHAI | conferencePaper | 53 | Reddy, Siddharth; Dragan, Anca D.; Levine, Sergey | Shared Autonomy via Deep Reinforcement Learning | Robotics: Science and Systems XIV | ||||||||||||||||||
40 | TechSafety | 2017 | CHAI | conferencePaper | 52 | Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart | The Off-Switch Game | Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence | ||||||||||||||||||
41 | TechSafety | 2018 | <Other org> | conferencePaper | 50 | Everitt, Tom; Lea, Gary; Hutter, Marcus | AGI Safety Literature Review | Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence | ||||||||||||||||||
42 | TechSafety | 2018 | DeepMind | conferencePaper | 50 | Ryffel, Theo; Trask, Andrew; Dahl, Morten; Wagner, Bobby; Mancuso, Jason; Rueckert, Daniel; Passerat-Palmbach, Jonathan | A generic framework for privacy preserving deep learning | arXiv:1811.04017 [cs, stat] | ||||||||||||||||||
43 | TechSafety | 2019 | DeepMind | conferencePaper | 50 | Ren, Jie; Liu, Peter J.; Fertig, Emily; Snoek, Jasper; Poplin, Ryan; DePristo, Mark A.; Dillon, Joshua V.; Lakshminarayanan, Balaji | Likelihood Ratios for Out-of-Distribution Detection | arXiv:1906.02845 [cs, stat] | ||||||||||||||||||
44 | TechSafety | 2017 | <Other org> | manuscript | 48 | Eysenbach, Benjamin; Gu, Shixiang; Ibarz, Julian; Levine, Sergey | Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning | |||||||||||||||||||
45 | MetaSafety | 2020 | <Other org> | journalArticle | 48 | Turchin, Alexey; Denkenberger, David | Classification of global catastrophic risks connected with artificial intelligence | AI & Society | ||||||||||||||||||
46 | TechSafety | 2018 | DeepMind | conferencePaper | 48 | Moosavi-Dezfooli, Seyed-Mohsen; Fawzi, Alhussein; Uesato, Jonathan; Frossard, Pascal | Robustness via curvature regularization, and vice versa | 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) | ||||||||||||||||||
47 | MetaSafety | 2017 | GCRI | journalArticle | 47 | Baum, Seth D. | On the promotion of safe and socially beneficial artificial intelligence | AI & Society | ||||||||||||||||||
48 | MetaSafety | 2018 | CFI; CSER | conferencePaper | 45 | Cave, Stephen; ÓhÉigeartaigh, Seán S. | An AI Race for Strategic Advantage: Rhetoric and Risks | Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society | ||||||||||||||||||
49 | TechSafety | 2017 | DeepMind | conferencePaper | 45 | Everitt, Tom; Krakovna, Victoria; Orseau, Laurent; Hutter, Marcus; Legg, Shane | Reinforcement Learning with a Corrupted Reward Channel | arXiv:1705.08417 [cs, stat] | ||||||||||||||||||
50 | MetaSafety | 2020 | <Other org> | conferencePaper | 43 | Erdélyi, Olivia J.; Goldsmith, Judy | Regulating Artificial Intelligence: Proposal for a Global Solution | Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society | ||||||||||||||||||
51 | TechSafety | 2018 | DeepMind | conferencePaper | 43 | Ibarz, Borja; Leike, Jan; Pohlen, Tobias; Irving, Geoffrey; Legg, Shane; Amodei, Dario | Reward learning from human preferences and demonstrations in Atari | arXiv:1811.06521 [cs, stat] | ||||||||||||||||||
52 | MetaSafety | 2018 | FHI | journalArticle | 42 | Ding, Jeffrey | Deciphering China’s AI dream | Future of Humanity Institute Technical Report | ||||||||||||||||||
53 | TechSafety | 2016 | <Other org> | conferencePaper | 41 | Greene, Joshua; Rossi, Francesca; Tasioulas, John; Venable, Kristen Brent; Williams, Brian | Embedding Ethical Principles in Collective Decision Support Systems | |||||||||||||||||||
54 | TechSafety | 2019 | DeepMind | conferencePaper | 39 | Hendrycks, Dan; Mu, Norman; Cubuk, Ekin D.; Zoph, Barret; Gilmer, Justin; Lakshminarayanan, Balaji | AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty | arXiv:1912.02781 [cs, stat] | ||||||||||||||||||
55 | TechSafety | 2018 | CHAI | conferencePaper | 38 | Fisac, Jaime F.; Bronstein, Eli; Stefansson, Elis; Sadigh, Dorsa; Sastry, S. Shankar; Dragan, Anca D. | Hierarchical Game-Theoretic Planning for Autonomous Vehicles | Robotics: Science and Systems 2019 | ||||||||||||||||||
56 | TechSafety | 2018 | DeepMind | manuscript | 38 | Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane | Scalable agent alignment via reward modeling: a research direction | |||||||||||||||||||
57 | TechSafety | 2018 | CHAI | bookSection | 37 | Reddy, Sid; Dragan, Anca; Levine, Sergey | Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior | Advances in Neural Information Processing Systems 31 | ||||||||||||||||||
58 | TechSafety | 2018 | CHAI | conferencePaper | 36 | Bajcsy, Andrea; Losey, Dylan P.; O'Malley, Marcia K.; Dragan, Anca D. | Learning from Physical Human Corrections, One Feature at a Time | Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction - HRI '18 | ||||||||||||||||||
59 | TechSafety | 2018 | <Other org> | journalArticle | 35 | Arnold, Thomas; Scheutz, Matthias | The “big red button” is too late: an alternative model for the ethical evaluation of AI systems | Ethics and Information Technology | ||||||||||||||||||
60 | TechSafety | 2016 | FHI | book | 35 | Bostrom, Nick | Fundamental issues of artificial intelligence | |||||||||||||||||||
61 | TechSafety | 2020 | GCRI | journalArticle | 34 | Baum, Seth D. | Social choice ethics in artificial intelligence | AI & Society | ||||||||||||||||||
62 | TechSafety | 2018 | CHAI | journalArticle | 34 | Sadigh, Dorsa; Landolfi, Nick; Sastry, Shankar S.; Seshia, Sanjit A.; Dragan, Anca D. | Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state | Autonomous Robots | ||||||||||||||||||
63 | TechSafety | 2019 | DeepMind | conferencePaper | 34 | Qin, Chongli; Martens, James; Gowal, Sven; Krishnan, Dilip; Dvijotham, Krishnamurthy; Fawzi, Alhussein; De, Soham; Stanforth, Robert; Kohli, Pushmeet | Adversarial Robustness through Local Linearization | Advances in Neural Information Processing Systems 32 (NeurIPS 2019) | ||||||||||||||||||
64 | TechSafety | 2018 | CHAI | conferencePaper | 33 | Kwon, Minae; Huang, Sandy H.; Dragan, Anca D. | Expressing Robot Incapability | Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction - HRI '18 | ||||||||||||||||||
65 | TechSafety | 2019 | DeepMind | conferencePaper | 33 | Bahdanau, Dzmitry; Hill, Felix; Leike, Jan; Hughes, Edward; Hosseini, Arian; Kohli, Pushmeet; Grefenstette, Edward | Learning to Understand Goal Specifications by Modelling Reward | arXiv:1806.01946 [cs] | ||||||||||||||||||
66 | MetaSafety | 2016 | FLI | conferencePaper | 32 | Asaro, Peter M | The Liability Problem for Autonomous Artificial Agents | |||||||||||||||||||
67 | TechSafety | 2016 | <Other org> | conferencePaper | 32 | Babcock, James; Kramar, Janos; Yampolskiy, Roman | The AGI Containment Problem | AGI 2016: Artificial General Intelligence | ||||||||||||||||||
68 | TechSafety | 2019 | <Other org> | conferencePaper | 32 | Lütjens, Björn; Everett, Michael; How, Jonathan P. | Safe Reinforcement Learning with Model Uncertainty Estimates | arXiv:1810.08700 [cs] | ||||||||||||||||||
69 | TechSafety | 2019 | <Other org> | journalArticle | 32 | Riedl, Mark O. | Human-Centered Artificial Intelligence and Machine Learning | Human Behavior and Emerging Technologies | ||||||||||||||||||
70 | MetaSafety | 2016 | FHI | book | 32 | Yampolskiy, Roman; Armstrong, Stuart | The Technological Singularity: Managing the Journey | |||||||||||||||||||
71 | MetaSafety | 2017 | FHI | book | 32 | Callaghan, Vic; Miller, James; Yampolskiy, Roman; Armstrong, Stuart | Technological Singularity | |||||||||||||||||||
72 | MetaSafety | 2017 | GCRI | report | 32 | Baum, Seth | A Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy | |||||||||||||||||||
73 | TechSafety | 2018 | CHAI | conferencePaper | 31 | Liu, Chang; Hamrick, Jessica B.; Fisac, Jaime F.; Dragan, Anca D.; Hedrick, J. Karl; Sastry, S. Shankar; Griffiths, Thomas L. | Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration | Proceedings of the 15th International Conferenceon Autonomous Agents and Multiagent Systems (AAMAS 2016) | ||||||||||||||||||
74 | TechSafety | 2017 | MIRI | bookSection | 31 | Soares, Nate; Fallenstein, Benya | Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda | The Technological Singularity | ||||||||||||||||||
75 | TechSafety | 2018 | CHAI | conferencePaper | 31 | Milli, Smitha; Schmidt, Ludwig; Dragan, Anca D.; Hardt, Moritz | Model Reconstruction from Model Explanations | FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency | ||||||||||||||||||
76 | MetaSafety | 2018 | CSER | journalArticle | 30 | Avin, Shahar; Wintle, Bonnie C.; Weitzdörfer, Julius; Ó hÉigeartaigh, Seán S.; Sutherland, William J.; Rees, Martin J. | Classifying global catastrophic risks | Futures | ||||||||||||||||||
77 | TechSafety | 2017 | CHAI | conferencePaper | 30 | Milli, Smitha; Hadfield-Menell, Dylan; Dragan, Anca; Russell, Stuart | Should Robots be Obedient? | IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence | ||||||||||||||||||
78 | TechSafety | 2017 | <Other org> | manuscript | 29 | Babcock, James; Kramar, Janos; Yampolskiy, Roman V. | Guidelines for Artificial Intelligence Containment | |||||||||||||||||||
79 | TechSafety | 2016 | DeepMind | conferencePaper | 29 | Everitt, Tom; Hutter, Marcus | Avoiding Wireheading with Value Reinforcement Learning | AGI 2016: Artificial General Intelligence | ||||||||||||||||||
80 | MetaSafety | 2018 | <Other org> | journalArticle | 29 | Danzig, Richard | Managing Loss of Control as Many Militaries Pursue Technological Superiority | Arms Control Today | ||||||||||||||||||
81 | TechSafety | 2016 | CHAI | bookSection | 29 | Russell, Stuart | Rationality and Intelligence: A Brief Update | Fundamental Issues of Artificial Intelligence | ||||||||||||||||||
82 | TechSafety | 2018 | <Other org> | journalArticle | 28 | Vamplew, Peter; Dazeley, Richard; Foale, Cameron; Firmin, Sally; Mummery, Jane | Human-aligned artificial intelligence is a multiobjective problem | Ethics and Information Technology | ||||||||||||||||||
83 | MetaSafety | 2016 | FHI; GPI | report | 28 | Cotton-Barratt, Owen; Farquhar, Sebastian; Halstead, John; Schubert, Stefan; Snyder-Beattie, Andrew | Global Catastrophic Risks 2016 | |||||||||||||||||||
84 | MetaSafety | 2017 | GCRI | journalArticle | 28 | Barrett, Anthony M.; Baum, Seth D. | A model of pathways to artificial superintelligence catastrophe for risk and decision analysis | Journal of Experimental & Theoretical Artificial Intelligence | ||||||||||||||||||
85 | TechSafety | 2016 | <Other org> | conferencePaper | 27 | Steinhardt, Jacob; Valiant, Gregory; Charikar, Moses | Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction | Advances in Neural Information Processing Systems 29 (NIPS 2016) | ||||||||||||||||||
86 | TechSafety | 2017 | FHI; Ought | conferencePaper | 27 | Abel, David; Salvatier, John; Stuhlmüller, Andreas; Evans, Owain | Agent-agnostic human-in-the-loop reinforcement learning | 30th Conference on Neural Information Processing Systems (NIPS 2016) | ||||||||||||||||||
87 | TechSafety | 2019 | DeepMind | manuscript | 26 | Chow, Yinlam; Nachum, Ofir; Faust, Aleksandra; Duenez-Guzman, Edgar; Ghavamzadeh, Mohammad | Lyapunov-based Safe Policy Optimization for Continuous Control | |||||||||||||||||||
88 | TechSafety | 2019 | Open-AI | manuscript | 25 | Ziegler, Daniel M.; Stiennon, Nisan; Wu, Jeffrey; Brown, Tom B.; Radford, Alec; Amodei, Dario; Christiano, Paul; Irving, Geoffrey | Fine-tuning language models from human preferences | |||||||||||||||||||
89 | TechSafety | 2019 | DeepMind | conferencePaper | 25 | Nalisnick, Eric; Matsukawa, Akihiro; Teh, Yee Whye; Gorur, Dilan; Lakshminarayanan, Balaji | Hybrid Models with Deep and Invertible Features | arXiv:1902.02767 [cs, stat] | ||||||||||||||||||
90 | TechSafety | 2016 | MIRI | conferencePaper | 24 | Everitt, Tom; Filan, Daniel; Daswani, Mayank; Hutter, Marcus | Self-Modification of Policy and Utility Function in Rational Agents | AGI 2016: Artificial General Intelligence | ||||||||||||||||||
91 | MetaSafety | 2016 | FHI | journalArticle | 24 | Bostrom, Nick; Douglas, Thomas; Sandberg, Anders | The Unilateralist’s Curse and the Case for a Principle of Conformity | Social Epistemology | ||||||||||||||||||
92 | TechSafety | 2020 | CHAI | conferencePaper | 24 | Fisac, Jaime F.; Gates, Monica A.; Hamrick, Jessica B.; Liu, Chang; Hadfield-Menell, Dylan; Palaniappan, Malayandi; Malik, Dhruv; Sastry, S. Shankar; Griffiths, Thomas L.; Dragan, Anca D. | Pragmatic-Pedagogic Value Alignment | Robotics Research | ||||||||||||||||||
93 | MetaSafety | 2019 | CFI | conferencePaper | 24 | Whittlestone, Jess; Nyrup, Rune; Alexandrova, Anna; Cave, Stephen | The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions | AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society | ||||||||||||||||||
94 | MetaSafety | 2018 | <Other org> | manuscript | 23 | Hwang, Tim | Computational Power and the Social Impact of Artificial Intelligence | |||||||||||||||||||
95 | TechSafety | 2019 | BERI; CHAI | manuscript | 23 | Gleave, Adam; Dennis, Michael; Kant, Neel; Wild, Cody; Levine, Sergey; Russell, Stuart | Adversarial Policies: Attacking Deep Reinforcement Learning | |||||||||||||||||||
96 | TechSafety | 2018 | Open-AI | manuscript | 23 | Irving, Geoffrey; Christiano, Paul; Amodei, Dario | AI safety via debate | |||||||||||||||||||
97 | TechSafety | 2017 | MIRI | manuscript | 23 | Garrabrant, Scott; Benson-Tilsen, Tsvi; Critch, Andrew; Soares, Nate; Taylor, Jessica | Logical Induction | |||||||||||||||||||
98 | TechSafety | 2019 | MIRI | manuscript | 23 | Manheim, David; Garrabrant, Scott | Categorizing Variants of Goodhart's Law | |||||||||||||||||||
99 | MetaSafety | 2020 | CFI; CHAI; CSER; CSET; FHI; Open-AI | manuscript | 22 | Brundage, Miles; Avin, Shahar; Wang, Jasmine; Belfield, Haydn; Krueger, Gretchen; Hadfield, Gillian; Khlaaf, Heidy; Yang, Jingying; Toner, Helen; Fong, Ruth; Maharaj, Tegan; Koh, Pang Wei; Hooker, Sara; Leung, Jade; Trask, Andrew; Bluemke, Emma; Lebensold, Jonathan; O'Keefe, Cullen; Koren, Mark; Ryffel, Théo; Rubinovitz, J. B.; Besiroglu, Tamay; Carugati, Federica; Clark, Jack; Eckersley, Peter; de Haas, Sarah; Johnson, Maritza; Laurie, Ben; Ingerman, Alex; Krawczuk, Igor; Askell, Amanda; Cammarota, Rosario; Lohn, Andrew; Krueger, David; Stix, Charlotte; Henderson, Peter; Graham, Logan; Prunkl, Carina; Martin, Bianca; Seger, Elizabeth; Zilberman, Noa; hÉigeartaigh, Seán Ó; Kroeger, Frens; Sastry, Girish; Kagan, Rebecca; Weller, Adrian; Tse, Brian; Barnes, Elizabeth; Dafoe, Allan; Scharre, Paul; Herbert-Voss, Ariel; Rasser, Martijn; Sodhani, Shagun; Flynn, Carrick; Gilbert, Thomas Krendl; Dyer, Lisa; Khan, Saif; Bengio, Yoshua; Anderljung, Markus | Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | |||||||||||||||||||
100 | TechSafety | 2019 | DeepMind | conferencePaper | 22 | Huang, Po-Sen; Stanforth, Robert; Welbl, Johannes; Dyer, Chris; Yogatama, Dani; Gowal, Sven; Dvijotham, Krishnamurthy; Kohli, Pushmeet | Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation | arXiv:1909.01492 [cs, stat] |