A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Paper | When | Where | Who | Notes | Presentations | ||||||||||||||||||||
2 | Neural Episodic Control | 19:00 08.06.2017 | ШАД, Гарвард | Никишин | This paper is one of the recent silent breakthroughs to intellectual agent with human-like memory | https://yadi.sk/i/wXGVGqp-3JxnUo | ||||||||||||||||||||
3 | Curiosity-driven Exploration by Self-supervised Prediction | 19:00 08.06.2017 | ШАД, Гарвард | Темирчев | Scalable intrinsic based exploration for sparse extrinsic reward, ICML 2017 | http://slides.com/cydoroga/rl_lecture1/fullscreen | ||||||||||||||||||||
4 | Generative Adversarial Imitation Learning | 19:00 15.06.2017 | ШАД, Гарвард | Шарчилев | Shows how to use adversarial rewards when ground truth rewards are not available; Interesting theoretical insights into how RL on IRL-learned reward is related to this GAN framework. | https://yadi.sk/d/hjO3PiqE3K8hdW | ||||||||||||||||||||
5 | Model-based Adversarial Imitation Learning | 19:00 15.06.2017 | ШАД, Гарвард | Шарчилев | ||||||||||||||||||||||
6 | Afterburning of dialogue models | 19:00 15.06.2017 | ШАД, Гарвард | Панин | https://docs.google.com/document/d/1TPcOguGQHpIEsh07pK_PaKozjg-dbm8zMVZL4iM518Y | |||||||||||||||||||||
7 | Защита ВКР | 19:00 22.06.2017 | ШАД, Гарвард | Персиянов | https://yadi.sk/i/gxD6gONM3KN7ij | |||||||||||||||||||||
8 | Защита ВКР | 19:00 22.06.2017 | ШАД, Гарвард | Яронская | https://yadi.sk/i/D_MOtJwj3KN7jc | |||||||||||||||||||||
9 | Deal or No Deal? End-to-End Learning for Negotiation Dialogues | 19:00 22.06.2017 | ШАД, Гарвард | Хальман | The paper introduces the 'negotiation using natural'language' task, introduce the dataset and train an end-to-end model on it | https://yadi.sk/i/ueRSETzq3KiEdC | ||||||||||||||||||||
10 | Deep reinforcement learning from human preferences | 19:00 13.07.2017 | ШАД, Гарвард | Ческидова | Recent OpenAI work with remarkable results aimed at removing the need for humans to write complex goal functions | https://docs.google.com/a/phystech.edu/presentation/d/1gFfxM3tGgfQg9Fnpel4joCQxoCBVMNUh_Hs_QBdc5HQ/edit?usp=sharing | ||||||||||||||||||||
11 | A simple neural network module for relational reasoning | 20:00 13.07.2017 | ШАД, Гарвард | Персиянов | Smart agent should understand how different objects of the world relate to each other. This is a pioneering work in the field of relational reasoning. | https://docs.google.com/presentation/d/1IsmdgizUrTdMhjPqFPlglKfGDx4b67TUEJNJ3bHLi-Q/edit?usp=sharing | ||||||||||||||||||||
12 | Teacher-Student Curriculum Learning | 19:00 20.07.2017 | ШАД, Гарвард | Гришин | OpenAI interns presented a framework for automatic curriculum learning that can be used for supervised and reinforcement learning tasks. | https://yadi.sk/i/uZ-slTBN3LEVW5 | ||||||||||||||||||||
13 | Automatic Goal Generation for Reinforcement Learning Agents | Researches from Berkley proposed a method that allows an agent to automatically discover (i.e. generate) the range of tasks which are always at the appropriate level of difficulty for the agent. | ||||||||||||||||||||||||
14 | Hindsight Experience Replay | 20:00 20.07.2017 | ШАД, Гарвард | Голиков | How to give an agent an ability to learn even from sparse reward; an implicit curriculum learning | |||||||||||||||||||||
15 | 20:30 20.07.2017 | ШАД, Гарвард | Овчаренко | TF Op для Arcade Learning Environment | https://github.com/dudevil/tf-ale-op | |||||||||||||||||||||
16 | Proximal Policy Optimization Algorithms | 19:00 27.07.2017 | ШАД, Гарвард | Гришин | Some math stuff about family of policy gradient methods for reinforcement learning | My stuff | ||||||||||||||||||||
17 | Lecturer's slides | |||||||||||||||||||||||||
18 | GAN in RL | 20:00 27.07.2017 | ШАД, Гарвард | Панин | https://yadi.sk/i/Kuzr5BYL3LUEyh | |||||||||||||||||||||
19 | ||||||||||||||||||||||||||
20 | DARLA: Improving Zero-Shot Transfer in Reinforcement Learning | 19:30 10.08.2017 | ШАД, Гарвард | Гришин | (one of the)First comprehensive empirical demonstration of the strength of disentangled representations for domain adaptation in a deep RL setting. | https://yadi.sk/i/Dl82VVdv3Lt77s | ||||||||||||||||||||
21 | Learning Time-Efficient Deep Architectures with Budgeted Super Networks | 20:30 10.08.2017 | ШАД, Гарвард | Васильев | This paper represents an interesting branch of research devoted to application of RL methods to the neural network architecture search. To be compared with https://openreview.net/forum?id=r1Ue8Hcxg. | https://docs.google.com/presentation/d/1E0KEfu4sjTnGCAKHQbQBcqRSnt5dypK2fngH-je_7MY/edit | ||||||||||||||||||||
22 | Hybrid Reward Architecture for Reinforcement Learning | 19:00 17.08.2017 | ШАД, Гарвард | Ерофеев | Recent very successful application of RL to one of the very complex games – Ms. Pacman. Their approach borrows elements from hierarchical RL, which is by itself very interesting research direction. | https://docs.google.com/presentation/d/1x5bF2WSLpkbZRwvIP1U4_vFN1xZ8HjvKEWnFUgUVJy8/edit | ||||||||||||||||||||
23 | Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning | 19:30 17.08.2017 | ШАД, Гарвард | Печенко | Combining Model-Based and Model-Free Updates for real robotic arms applications | |||||||||||||||||||||
24 | Imagination-augmented Agents in Deep Reinforcement Learning | 19:00 24.08.2017 | ШАД, Гарвард | Конобеев | I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways | https://drive.google.com/file/d/0B2s6_uWHkA0xZlhtczQ0TUpLUWs/view?usp=sharing | ||||||||||||||||||||
25 | Interaction Networks for Learning about Objects, Relations and Physics | 19.00 14.09.2017 | ШАД, Кембридж | Дулат | Graph based framework for dynamic systems.(was able to simulate the physical trajectories of n-body, bouncing ball, and non-rigid string systems accurately over thousands of time steps, after training only on single step predictions) | https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing | ||||||||||||||||||||
26 | Programmable Agents | 19.00 14.09.2017 | ШАД, Кембридж | Дулат | Deep RL agents that execute declarative programs and can generalize to a wide variety of zero-shot semantic tasks. | https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing | ||||||||||||||||||||
27 | A Deep Reinforcement Learning Chatbot | 19.00 28.10.2017 | ШАД, Кембридж | Васильев | We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition | |||||||||||||||||||||
28 | Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning | 20.00 28.09.2017 | ШАД, Кембридж | Яронская | Applied hierarchical rl to multi-domain dialogue management. | |||||||||||||||||||||
29 | Stochastic Computation Graphs | 20.00 05.10.2017 | ШАД | Соболев | Review of methods for backpropagation in stochatic neural networks | slides.com/asobolev/stochastic-computation-graphs/fullscreen | ||||||||||||||||||||
30 | Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks | 19:00 12.10.2017 | ШАД, Кембридж | Вахрамеева | https://yadi.sk/i/NCrM4cEb3NhQux | |||||||||||||||||||||
31 | Approximate Linear Programming for Logistic Markov Decision Processes - Research at Google | 19:00 12.10.2017 | ШАД, Кембридж | Темирчев | http://slides.com/cydoroga/rl_lecture2/fullscreen | |||||||||||||||||||||
32 | A Distributional Perspective on Reinforcement Learning | 19:00 19.10.2017 | ШАД, Кембридж | Гринчук | Authors are interested in the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. | https://drive.google.com/file/d/0B1kx0sWGGxicSk5SR1R2WWZYWnM/view?usp=sharing | ||||||||||||||||||||
33 | Learning to Optimize using Reinforcement Learning | 19:00 09.11.2017 | ШАД, Кембридж | Януш | https://drive.google.com/file/d/15JAF_POatluY3USQKaCuqRoSkJA8BgVv/view?usp=drivesdk | |||||||||||||||||||||
34 | Multi-step Reinforcement Learning: A Unifying Algorithm | 19:00 16.11.2017 | ШАД, Кембридж | Бобырёв | https://github.com/omtcvxyz/talks/blob/master/n-step-q-sigma/slides/Slides.pdf | |||||||||||||||||||||
35 | Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games | 19:00 23.11.2017 | ШАД, Кембридж | Островский | https://drive.google.com/open?id=15fBsS361SFt4YPNkzaTTtatBigNi4iIG | |||||||||||||||||||||
36 | Boosted Fiited Q-Iteration (B-FQI) | 19:00 30.11.2017 | ШАД, Кембридж | Рыжиков | https://drive.google.com/file/d/1AO1j8CatxUZRdCkWPVn-1DT_78n_FtZk/view?usp=sharing | |||||||||||||||||||||
37 | AlphaGo Zero | 19:00 30.11.2017 | ШАД, Кембридж | Гринчук | AlphaGo Zero -- the strongest program for playing the game of Go ever, trained entirely with self-play and reinforcement learning. No human knowledge is involved. | https://drive.google.com/open?id=1L45dPTIDQloNTMkyeUAcjzLh6TA-GfU_ | ||||||||||||||||||||
38 | Equivalence Between Policy-Gradients and Soft Q-learning | 19:00 14.12.2017 | ШАД, Кембридж | Конобеев | no presentation | |||||||||||||||||||||
39 | Equivalence Between Policy-Gradients and Soft Q-learning | 19:00 08.02.2018 | ШАД, Стенфорд | Конобеев | ||||||||||||||||||||||
40 | Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks; Some Considerations on Learning to Explore via Meta-Reinforcement Learning | 19:00 15.02.2018 | ШАД, Стенфорд | Голиков | https://www.overleaf.com/read/rqfkchthwkyp | |||||||||||||||||||||
41 | Rainbow: Combining Improvements in Deep Reinforcement Learning | 19:00 22.02.2018 | ШАД, Стенфорд | Фрицлер | https://docs.google.com/presentation/d/1mTVYQbtzxrESuJi_KvXsFNEIgm9fYZpCnCyMWcPLQqs/edit?usp=sharing | |||||||||||||||||||||
42 | Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control | 19:00 21.03.2018 | ШАД, Стенфорд | Овчаренко | ||||||||||||||||||||||
43 | Model-Ensemble Trust-Region Policy Optimization | 19:00 29.03.2018 | ШАД, Стенфорд | Темирчев | ||||||||||||||||||||||
44 | Soft Actor-Critic | 19:00 05.04.2018 | ШАД, Стенфорд | Гринчук | Authors consider more general reinforcement learning problem defined by maximum entropy objective. Based on theoretical results they introduce novel off-policy Soft Actor-Critic algorithm which outperforms current state-of-the-art on continuous control benchmarks. | https://drive.google.com/file/d/1xvay9iCsUiwabVt9ibWEXpcuiTtdt9vs/view?usp=sharing | ||||||||||||||||||||
45 | DORA The Explorer: Directed Outreaching Reinforcement Action-Selection | 19:00 12.04.2018 | ШАД, Стенфорд | Вахрамеева | ||||||||||||||||||||||
46 | Generative Multi-Agent Behavioral Cloning | 19:00 19.04.2018 | ШАД, Стенфорд | Савельева | ||||||||||||||||||||||
47 | Action-dependent Control Variates for Policy Optimization via Stein Identity | 19:00 26.04.2018 | ШАД, Стенфорд | Рыжиков | ||||||||||||||||||||||
48 | StarCraft Micromanagement with RL and Curriculum Transfer Learning | 19:00 3.05.2018 | ШАД, Стенфорд | Гайнцева | ||||||||||||||||||||||
49 | Learning to run challenge solutions | 19:00 17.05.2018 | ШАД, Стенфорд | Печенко | ||||||||||||||||||||||
50 | Колесников | |||||||||||||||||||||||||
51 | Павлов | |||||||||||||||||||||||||
52 | DRL in a Handful of Trials using Probabilistic Dynamics Models | 19:00 27.09.2018 | ШАД, Стенфорд | Темирчев | ||||||||||||||||||||||
53 | Latent Space Policies for Hierarchical RL | 19:00 04.05.2018 | ШАД, Стенфорд | Гринчук | https://drive.google.com/open?id=1FgY3Yk8wofIcZoLftYnwLOVrZOMy-hye | |||||||||||||||||||||
54 | Soft Actor-Critic | |||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |