Possible papers
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
PaperLinkRelevanceYear
Score
Read byComments
2
AI Safety Gridworlds
https://arxiv.org/abs/1711.09883
52017All
3
Thomas Stepleton. The pycolab game engine, 2017.
https://github.com/deepmind/pycolab
52017All
4
Deepmind Lab
https://deepmind.com/blog/open-sourcing-deepmind-lab/
52016-
5
OpenAI Gym
https://arxiv.org/abs/1606.01540
52016-
6
Todd Hester, Learning from demonstrations for real world reinforcement learning.
https://arxiv.org/pdf/1704.03732.pdf
52017
7
Dylan Hadfield-Menell, Smitha Milli, Stuart Russell, Pieter Abbeel, and Anca D Dragan. Inverse reward design. In Advances in Neural Information Processing Systems ,
https://arxiv.org/abs/1711.02827?context=cs.LG
52017Tom
Agent treats rewards provided as information about a still-unknown reward function, which the agent keeps a Bayesian model of. This retains uncertainty over the value of states unseen in training. We can almost certainly use this.
8
Mohamed and Rezende, Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning,
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning,
52015Tom
Empowerment rewards (reward ~ number of possible future states) requires calculation of mutual information, which is computationally expensive. Paper shows how to calculate MI quickly via a variational approximation. Figures 4-7 show some desirable properties of empowerment, but concrete problems points out some issues with using it alone
9
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298
https://arxiv.org/abs/1710.02298
52017Karol
Introduction of Rainbow algorithm. Very relevant if we have to implement it.
10
Teodor Mihai Moldovan and Pieter Abbeel. Safe exploration in Markov decision processes. In International Conference on Machine Learning , pages 1451–1458, 2012.
https://people.eecs.berkeley.edu/~pabbeel/papers/MoldovanAbbeel_ICML2012full-rev2.pdf
52012Jessica
Concept of ERGODICITY raised here is important for us - means that any state is eventually reachable from any other state by following a suitable policy. I.e. that all states are reversible. So can explore safely by restricting space of policies to those that preserve ergodicity with user specified probability (called safety level). Provably ‘safe’ exploration of grid worlds of size up to 50 100
11
Alex Turner, Whitelist Learning
https://www.overleaf.com/read/jrrjqzdjtxjp#/52395179/
52018Jessica
Agent learns (whitelists) acceptable actions by observing human generated examples
12
Faulty reward functions in the wild,
https://blog.openai.com/faulty-reward-functions/
42017Gavin
Blogpost intro to Concrete Problems, OpenAI Universe. Cursory
13
Christoph Salge, Cornelius Glackin, and Daniel Polani. Empowerment—an introduction. In Guided Self-Organization: Inception, pages 67–114. 2014.
https://arxiv.org/abs/1310.1863
42014Tom
Extremely long! Worth skimming if we make use of empowerment but will take a while to read and might not repay the investment
14
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. arXiv preprint arXiv:1705.10528
https://arxiv.org/abs/1705.10528
42017Gavin
New search algorithm for constrained MDPs, building safety into policy optim. Good for safe exploration. "Point-Gather" (collect apples while avoid bombs) is a side-effect environment and is nearly solved by CPO
15
Stuart Armstrong. AI toy control problem
https://www.youtube.com/watch?v=sx8JkdbNgdU
42017Gavin
Cute minimal-viable example of a gridworld but about absent / manipulable supervisor.
16
Stuart Armstrong and Benjamin Levinstein. Low impact artificial intelligences. arXiv preprint arXiv:1705.10720
https://arxiv.org/abs/1705.10720
42017Gavin
Exemplar of one of the main anti-side-effects approaches. Theoretical, not RL or grid focussed
17
Sutton: RL textbook
http://incompleteideas.net/book/bookdraft2018jan1.pdf
42018All-
18
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mane. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565
https://arxiv.org/abs/1606.06565
42016Gavin
19
Pieter Abbeel and Andrew Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning , pages 1–8,
https://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf
42004
20
Riad Akrour, Marc Schoenauer, and Michele Sebag. APRIL: Active preference learning-based reinforcement learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases pages 116–131,
https://link.springer.com/content/pdf/10.1007%2F978-3-642-33486-3_8.pdf
42012
21
James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, David Roberts, Matthew E Taylor, and Michael L Littman. Interactive learning from policy-dependent human feedback. arXiv preprint arXiv:1701.06049,
http://irll.eecs.wsu.edu/wp-content/papercite-data/pdf/2017icml-macglashan.pdf
42017
22
Andrew Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning , pages 663–670, 2000.
https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwju4uaWj9DZAhVIasAKHdmlAO0QFgguMAA&url=http%3A%2F%2Fai.stanford.edu%2F~ang%2Fpapers%2Ficml00-irl.pdf&usg=AOvVaw2s6O55aWhofL-oPpY9txh1
42000
23
Aaron Wilson, Alan Fern, and Prasad Tadepalli. A Bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems , pages 1133–1141, 2012.
https://papers.nips.cc/paper/4805-a-bayesian-approach-for-policy-learning-from-trajectory-preference-queries/
42012
24
Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In AAAI , pages 1433–1438, 2008.
http://www.cs.cmu.edu/~bziebart/AAAI2008-bziebart.pdf
42008Tom
One of the earlier IRL papers - uses a linear rewaard model and softmax-optimal expert demonstrations. Largely superseded by IRL with GPs (https://homes.cs.washington.edu/~zoran/gpirl.pdf) or deep reward models
25
Hester et al, Deep Q-learning from Demonstrations
Deep Q-learning from Demonstrations,
42017Jessica
Small sets of human demo data accelerate learning process using prioritised replay mechanism - combines temporal difference updates with supervised classification of demo actions. Use to pre-train agent for good performance from beginning, even in absence of accurate simulator. Useful in relation to Whitelist Learning? Possibly needs a more experienced eye than mine.
26
Stuart Armstrong and Jan Leike. Towards interactive inverse reinforcement learning. In NIPS Workshop ,
https://jan.leike.name/publications/Towards%20Interactive%20Inverse%20Reinforcement%20Learning%20-%20Armstrong,%20Leike%202016.pdf
42016Karol
Pretty relevant and is a short read. The problem is POMDP where agent starts without a reward function and has to learn it (this makes it not entirely like the gridworld sideffects) while interacting with environment. Some major assumption here is that there exists some finite set of potential reward functions (which is a downside of the approach). Some positive side is that the agent learns the reward and uses it at the same time. Actions of the agent might influence (bias) the function it learns. They try to solve the environment in such a way that this bias is removed.
27
Zachary C Lipton, Jianfeng Gao, Lihong Li, Jianshu Chen, and Li Deng. Combating reinforcement learning’s Sisyphean curse with intrinsic fear. arXiv preprint arXiv:1611.01211
https://arxiv.org/abs/1611.01211
42017Karol
Some states might be rare and catastrophic. We want agents to quickly learn not to visit them (normally they would need to visit them many times before learning to aviod them). Authors propose using a function called intrinsic fear that is supervised learnt and show that it works for this kind of problem.
28
William Saunders, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. Trial without error: Towards safe reinforcement learning via human intervention. arXiv preprint arXiv:1707.05173, 2017.
https://arxiv.org/abs/1707.05173
42017Jessica
Human teaches supervised learner to oversee RL agent, and prevent it causing catastrophes. Videos: https://www.youtube.com/playlist?list=PLjs9WCnnR7PCn_Kzs2-1afCsnsBENWqor Could apply to side effects?
29
Alexander Hans, Daniel Schneegaß, Anton Maximilian Schafer, and Steffen Udluft. Safe exploration for reinforcement learning. In European Symposium on Artificial Neural Networks , pages 143–148,
https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2008-36.pdf
42008Jessica
Idea of ‘safety function’ to determine safety of action a in state s, plus ‘backup policy’ to lead agent from possibly critical state back to safety. Could be useful approach for us if we define critical state as irreversible state?
30
Mark O Riedl and Brent Harrison. Using stories to teach human values to artificial agents. In AAAI Workshop on AI, Ethics, and Society, 2016.
https://www.cc.gatech.edu/~riedl/pubs/aaai-ethics16.pdf
42016Jessica
Agents could reverse engineer values from stories? Trajectory tree of actions extracted from crowdsourced stories of trip to pharmacy - Q-learning agent trained using reward function from trajectory tree, reward for each step that adheres to tree. Similar in a way to the whitelist idea - humans show what is acceptable behaviour.
31
Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for advanced machine learning systems. Technical report, Machine Intelligence Research Institute, 2016.
https://intelligence.org/files/AlignmentMachineLearning.pdf
42016Karol
Directly relevant is one small chapter (2.6 Imapct Measures). It basically states the problem and roughly describes some proposed solutions (measuring impact). The main proposition is from a paper that we have listed here (Armstrong, Levinstein 2017). I give it 4 because the causality idea might inspire somebody :)
32
Christopher Watkins and Peter Dayan. Q-learning. Machine Learning , 8(3-4):279–292, 1992.
http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf
31992Tom
A convergence proof for Q-learning (an on-policy RL algorithm we might want to use). Sutton & Barto is probably a better source for what we need, SARSA is also worth looking at
33
Paul Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems
http://papers.nips.cc/paper/7017-deep-reinforcement-learning-from-human-preferences.pdf
32017Tom
Great paper, but less relevant - even if we ask for human preferences we're probably better off sticking with feature-based RL in initial explorations to remove technical risk
34
Pieter Abbeel, Adam Coates, and Andrew Ng. Autonomous helicopter aerobatics through apprenticeship learning. International Journal of Robotics Research, 29(13):1608–1639,
https://people.eecs.berkeley.edu/~pabbeel/papers/AbbeelCoatesNg_IJRR2010.pdf
32010
35
Javier Garcıa and Fernando Fernandez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research
http://jmlr.org/papers/volume16/garcia15a/garcia15a.pdf
32015Karol
Relevant as a potential source of new papers. Some survey of papers on Safe RL. It has potential further reading for inverse RL and learning from human feedback and demonstrations.
36
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. Cooperative inverse reinforcement learning. In Advances in Neural Information Processing Systems , 2016a.
https://papers.nips.cc/paper/6420-cooperative-inverse-reinforcement-learning
32016Karol
Potential source of papers on Inverse RL. Some new framing of Inverse Reinforcement learning kind of problems. The key observation seems to be twofold: humans policy of instructing an agent should be more than just showing the optimal way; we want the agents to optimize our reward function but not adopt it as it's own.
37
Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, and Stuart Russell. Should robots be obedient? In International Joint Conference on Artificial Intelligence, 2017.
https://www.ijcai.org/proceedings/2017/0662.pdf
32017Jessica
Learning human preferences as IRL problem. Robots should not be blindly obedient, rather should infer reward parameters from human orders. Supervision POMDP model.
38
Yarin Gal. Uncertainty in Deep Learning . PhD thesis, University of Cambridge,
http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf
32016Karol
Current deep learning models are generally deterministic. They may produce probability distributions, but the model parameters are point estimates or the structure is fixed. This PhD thesis elaborates on how we can add uncertanity to the models, so that we are kind of choosing models from some distribution so that they best explain the data. It's not really relevant to us, but could be some good resource on uncertanity modelling (with bayes). Since we will most probably be using some uncertanity about the rewards, this work could be useful.
39
Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. Reinforcement learning with corrupted reward signal. In International Joint Conference on Artificial Intelligence
https://www.ijcai.org/proceedings/2017/0656.pdf
32017Karol
It's about making an agent robust against exploiting of corrupted rewards (erronously high rewards due to some bugs or misspecification). It's indirectly relevant since in some approaches our solution will have to be uncertain or learn some sort of true reward. The solution presented here includes wrong intepretation in case of IRL.
40
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. The off-switch game. In International Joint Conference on Artificial Intelligence , 2016b.
https://www.ijcai.org/proceedings/2017/0032.pdf
22016Tom
Interesting paper on corrigibility in the CIRL framework (c.f. Ryan Carey's paper). Not relevant to side effects in particular, unless we try preference learning. The point is basically that corrigibility can be recovered if the agent is allowed to present candidate actions to the human evaluator
41
Daniel Weld and Oren Etzioni. The first law of robotics (a call to arms). In AAAI , pages 1042–1047, 1994.
https://www.aaai.org/Papers/AAAI/1994/AAAI94-160.pdf
21994
42
Marc G Bellemare, Will Dabney, and Remi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, pages 449–458,
http://proceedings.mlr.press/v70/bellemare17a.html
22017Karol
Learning a value distributions instead of just values for states may improve performance of RL algorithms. Important for RL, but not so much for safety and sideffects in gridworlds
43
Tom Everitt, Jan Leike, and Marcus Hutter. Sequential extensions of causal and evidential decision theory. In Algorithmic Decision Theory, pages 205–221,
http://hutter1.net/publ/seqdts.pdf
22015Karol
Some extensions of casual and evidential decision theories that uses physicalistic environment models (agent is part and not separate from the environment).
44
Mohammad Ghavamzadeh, Marek Petrik, and Yinlam Chow. Safe policy improvement by minimizing robust baseline regret. In Advances in Neural Information Processing Systems , pages 2298–2306,
http://papers.nips.cc/paper/6294-safe-policy-improvement-by-minimizing-robust-baseline-regret.pdf
22016Karol
Using environment models to better improve a policy over some baseline one. The word 'safe' here means that a given ne policy performes not worse then the baseline one.
45
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning , pages 1928–1937, 2016.
https://arxiv.org/abs/1602.01783
22016Karol
Some inprovement on existing RL algorithms. It introduces asynchronous learning. Not really relevant to sideeffects.
46
Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research , 47:253–279,
https://arxiv.org/abs/1207.4708
22013Karol
Some environment for training and evaluating RL agents. Pretty nice to learn rl, but not relevant for us, because we have pycolab and gridworlds.
47
Ulrich Berger. Brown’s original fictitious play. Journal of Economic Theory , 135:572–578,
https://www.researchgate.net/profile/Ulrich_Berger2/publication/4975577_Brown%27s_original_fictitious_play/links/5a13e9704585158aa3e64e05/Browns-original-fictitious-play.pdf
22007Karol
Game theory - fictitious play, doesn't seem relevant to me, but maybe someone better informed on GT might disagree.
48
George W. Brown. Iterative Solution of Games by Fictitious Play . Wiley, 1951.
https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwir7Y7gkdDZAhWJLsAKHc92DDAQFgg3MAI&url=https%3A%2F%2Fwww.math.ucla.edu%2F~tom%2Fstat596%2Ffictitious.pdf&usg=AOvVaw1AUzfTj22aY3_f1uZXZ_UF
21951Karol
Game theory fictitious play again - description of the algorithm - it's relevant to robustness to adversaries problem, not so much to ours.
49
Stefano Coraluppi and Steven Marcus. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica , 35(2):301–309,
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.5590&rep=rep1&type=pdf
21999Karol
The paper is expressed in the language of MDPs and control. In the RL setting, what it is about could be described: we want to train RL agent to maximize reward while caring about the variance of reward while learning. (So the risk here refers to variance of rewards.)
50
Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter. Self-modification of policy and utility function in rational agents. In Artificial General Intelligence, pages 1–11,
http://www.tomeveritt.se/papers/AGI16-sm.pdf
22016Jessica
Agents might self modify by changing future policy or utility function. Paper shows that it is possible to create an agent that despite being able to make any self-modification, will refrain from doing so.
51
Peter Auer and Chiang Chao-Kai. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Conference on Learning Theory
http://proceedings.mlr.press/v49/auer16.pdf
12016Karol
Some significant result for bandit problems.
52
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
https://arxiv.org/abs/1412.6980
12014Karol
Introduces Adam, a state of the art algorithm for gradient-based optimization of stochastic objective function.
53
Stephen Omohundro. The basic AI drives. In Artificial General Intelligence , pages 483–492, 2008.
https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi5wdL1jtDZAhWOa8AKHYeFDYYQFggpMAA&url=https%3A%2F%2Fselfawaresystems.files.wordpress.com%2F2008%2F01%2Fai_drives_final.pdf&usg=AOvVaw0B9Y_tCsVShCMCqcmn50K1
12008Jessica
High level description of ‘drives’ that will appear in any AI system unless explicitly counteracted - essentially paperclip problem.
54
Nader Chmait, David L Dowe, David G Green, and Yuan-Fang Li. Agent coordination and potential risks: Meaningful environments for evaluating multiagent systems. In Evaluating General-Purpose AI, IJCAI Workshop
https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjnlK74kdDZAhXMAMAKHRxuCSkQFggpMAA&url=http%3A%2F%2Fusers.dsic.upv.es%2F~flip%2FEGPAI2017%2FPapers%2FEGPAI_2017_paper_2_NChamait.pdf&usg=AOvVaw3GBQew_o0sDx8GpiN3s97R
12017Jessica
Test environments that allow evaluation of multi-agent systems
55
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
https://arxiv.org/pdf/1412.6572.pdf
12015Jessica
Argues that failure in range of models (incl. state of the art NNs) when adversarial examples are used is due to linear nature.
56
Mark O Riedl and Brent Harrison. Enter the matrix: A virtual world approach to safely interruptable autonomous systems. arXiv preprint arXiv:1703.10284, 2017.
https://arxiv.org/pdf/1703.10284
12017Jessica
Safe interruptibility in RL by using ‘kill switch’ to swap agent to virtual world where it may still receive reward.
57
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77,
http://epubs.siam.org/doi/abs/10.1137/S0097539701398375
12002Karol
Some sort of optimal play in adversarial bandit problem. Cannot access the paper for free.
58
Sebastian Bubeck and Alexander Slivkins. The best of both worlds: stochastic and adversarial bandits. In Conference on Learning Theory
http://proceedings.mlr.press/v23/bubeck12b/bubeck12b.pdf
12012Karol
Some algorithm for bandit problems that is optimal for both stochastic and adversarial case
59
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neural networks. In Computer Aided Verification, pages 3–29, 2017b.
https://arxiv.org/abs/1610.06940
12017Karol
State of the art deep neural nets for image classification can be tricked into beliveing that some slightly changed image is of different class then the original. Authors present a method that is guaranteed to find examples of such adversarial images within some neighborhood of given image. This can be used to make NNs more robust.
60
Laurent Orseau and Stuart Armstrong. Safely interruptible agents. In Uncertainty in Artificial Intelligence , pages 557–566, 2016.
https://intelligence.org/files/Interruptibility.pdf
12016Karol
Some result showing that if we make the interruption probabilistic then it's possible to get safe interruptibility in the limit.
61
Laurent Orseau and Mark Ring. Space-time embedded intelligence. In Artificial General Intelligence , pages 209–218, 2012.
http://www.cs.utexas.edu/users/ring/Orseau,%20Ring%3b%20Space-Time%20Embedded%20Intelligence,%20AGI%202012.pdf
12012Jessica
Previous theories of AGI assume agent & environment are different entities (essentially dualism) which is problematic. This paper formulates physicalist approach in which agent is fully integrated into environment and can be modified by it.
62
Ion Stoica, Dawn Song, Raluca Ada Popa, David A Patterson, Michael W Mahoney, Randy H Katz, Anthony D Joseph, Michael Jordan, Joseph M Hellerstein, Joseph Gonzalez, et al. A Berkeley view of systems challenges for AI. Technical report, UC Berkeley, 2017.
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-159.pdf
12017Karol
Some summary of current state of AI together with propositions on potential directions of doing research. There ar elots of safety issues mentioned, but they relate to very current AIs that are used in production.
63
Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, and Shane Legg. Noisy networks for exploration. arXiv preprint arXiv:1706.10295
https://arxiv.org/abs/1706.10295
2017
64
Jordi Grau-Moya, Felix Leibfried, Tim Genewein, and Daniel A Braun. Planning with information-processing constraints and model uncertainty in Markov decision processes. In Machine Learning and Knowledge Discovery in Databases
https://link.springer.com/chapter/10.1007/978-3-319-46227-1_30
2016
65
Bill Hibbard. Model-based utility functions. Journal of Artificial General Intelligence , 3(1):1–24,
https://arxiv.org/abs/1111.3934
2012
66
Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. In International Conference on Learning Representations , 2017a.
https://arxiv.org/abs/1702.02284
2017
67
Guy Katz, Clark Barrett, David Dill, Kyle Julian, and Mykel Kochenderfer. Towards proving the adversarial robustness of deep neural networks. arXiv preprint arXiv:1709.02802,
https://arxiv.org/abs/1709.02802v1
2017
68
Volodymyr Mnih, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf
2015Gavin
69
Andrew Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning , pages 278–287, 1999.
https://dl.acm.org/citation.cfm?id=645528.657613
1999
70
Laurent Orseau and Mark Ring. Self-modification and mortality in artificial agents. In Artificial General Intelligence , pages 1–10, 2011.
https://link.springer.com/chapter/10.1007%2F978-3-642-22887-2_1
2011
71
Pedro Ortega, Kee-Eung Kim, and Daniel D Lee. Bandits with attitude. In Artificial Intelligence and Statistics, 2015.
http://proceedings.mlr.press/v38/ortega15.pdf
2015
72
Martin Pecka and Tomas Svoboda. Safe exploration techniques for reinforcement learning — an overview. In International Workshop on Modelling and Simulation for Autonomous Systems, pages 357–375, 2014.
https://link.springer.com/chapter/10.1007/978-3-319-13823-7_31
2014
73
Joaquin Quinonero Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil Lawrence. Dataset Shift in Machine Learning . MIT Press, 2009.
http://www.acad.bg/ebook/ml/The.MIT.Press.Dataset.Shift.in.Machine.Learning.Feb.2009.eBook-DDU.pdf
2009
74
Mark Ring and Laurent Orseau. Delusion, survival, and intelligent agents. In Artificial General Intelligence, pages 11–20, 2011.
https://www.researchgate.net/profile/Mark_Ring/publication/221328973_Delusion_Survival_and_Intelligent_Agents/links/0deec52112ede5e943000000/Delusion-Survival-and-Intelligent-Agents.pdf
2011
75
Stuart Russell, Daniel Dewey, and Max Tegmark. Research priorities for robust and beneficial artificial intelligence. AI Magazine, 36(4):105–114, 2015.
https://dspace.mit.edu/openaccess-disseminate/1721.1/108478
2015
76
Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, and Bharat Kaul. RAIL: Risk-averse imitation learning. arXiv preprint arXiv:1707.06658, 2017.
https://arxiv.org/abs/1707.06658
2017
77
Yevgeny Seldin and Alexander Slivkins. One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning, 2014.
http://proceedings.mlr.press/v32/seldinb14-supp.pdf
2014
78
Sanjit A Seshia, Dorsa Sadigh, and S Shankar Sastry. Towards verified artificial intelligence. arXiv preprint arXiv:1606.08514 , 2016.
https://arxiv.org/abs/1606.08514
2016
79
Nate Soares and Benja Fallenstein. Aligning superintelligence with human interests: A technical research agenda. Technical report, Machine Intelligence Research Institute, 2014.
https://pdfs.semanticscholar.org/d803/3a314493c8df3791912272ac4b58d3a7b8c2.pdf
2014
80
Nate Soares, Benja Fallenstein, Stuart Armstrong, and Eliezer Yudkowsky. Corrigibility. In AAAI Workshop on AI, Ethics, and Society, 2015.
https://intelligence.org/wp-content/uploads/2015/01/AAAI-15-corrigibility-slides.pdf
2015
81
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 , 2013.
https://arxiv.org/abs/1312.6199
2013
82
Jessica Taylor. Quantilizers: A safer alternative to maximizers for limited optimization. In AAAI Workshop on AI, Ethics, and Society , pages 1–9, 2016.
https://intelligence.org/2015/11/29/new-paper-quantilizers/
2016
83
Philip Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. High confidence policy improvement. In International Conference on Machine Learning , pages 2380–2388, 2015.
http://psthomas.com/papers/Thomas2015b.pdf
2015
84
Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magnitude.
https://www.coursera.org/learn/neural-networks/lecture/YQHki/rmsprop-divide-the-gradient-by-a-running-average-of-its-recent-magnitude
2014
85
Matteo Turchetta, Felix Berkenkamp, and Andreas Krause. Safe exploration in finite Markov decision processes with Gaussian processes. In Advances in Neural Information Processing Systems, pages 4312–4320, 2016.
https://arxiv.org/abs/1606.04753
2016
86
Bart van den Broek, Wim Wiegerinck, and Hilbert Kappen. Risk sensitive path integral control. arXiv preprint arXiv:1203.3523 , 2012.
https://arxiv.org/abs/1203.3523?context=math
2012
87
Peter Whittle. Optimal Control: Basics and Beyond . John Wiley \& Sons, 1996.
https://dl.acm.org/citation.cfm?id=524847
1996
88
Ronald Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning , 8(3-4):229–256, 1992.
https://link.springer.com/content/pdf/10.1007%2FBF00992696.pdf
1992
89
Shen Yun, Wilhelm Stannat, and Klaus Obermayer. A unified framework for risk-sensitie Markov control processes. In Conference on Decision and Control , 2014.
http://ieeexplore.ieee.org/document/7039524/
2014
90
Kemin Zhou and John C Doyle. Essentials of Robust Control . Pearson, 1997.
http://www.ece.lsu.edu/kemin/essentials.htm
1997
91
92
93
94
95
96
97
98
99
100
Loading...
Main menu