ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
PaperWhenWhereWhoNotesPresentations
2
Neural Episodic Control19:00 08.06.2017 ШАД, ГарвардНикишинThis paper is one of the recent silent breakthroughs to intellectual agent with human-like memoryhttps://yadi.sk/i/wXGVGqp-3JxnUo
3
Curiosity-driven Exploration by Self-supervised Prediction19:00 08.06.2017 ШАД, ГарвардТемирчевScalable intrinsic based exploration for sparse extrinsic reward, ICML 2017http://slides.com/cydoroga/rl_lecture1/fullscreen
4
Generative Adversarial Imitation Learning19:00 15.06.2017 ШАД, ГарвардШарчилевShows how to use adversarial rewards when ground truth rewards are not available; Interesting theoretical insights into how RL on IRL-learned reward is related to this GAN framework.https://yadi.sk/d/hjO3PiqE3K8hdW
5
Model-based Adversarial Imitation Learning19:00 15.06.2017 ШАД, ГарвардШарчилев
6
Afterburning of dialogue models19:00 15.06.2017 ШАД, ГарвардПанинhttps://docs.google.com/document/d/1TPcOguGQHpIEsh07pK_PaKozjg-dbm8zMVZL4iM518Y
7
Защита ВКР19:00 22.06.2017 ШАД, ГарвардПерсияновhttps://yadi.sk/i/gxD6gONM3KN7ij
8
Защита ВКР19:00 22.06.2017 ШАД, ГарвардЯронскаяhttps://yadi.sk/i/D_MOtJwj3KN7jc
9
Deal or No Deal? End-to-End Learning for Negotiation Dialogues19:00 22.06.2017 ШАД, ГарвардХальманThe paper introduces the 'negotiation using natural'language' task, introduce the dataset and train an end-to-end model on ithttps://yadi.sk/i/ueRSETzq3KiEdC
10
Deep reinforcement learning from human preferences19:00 13.07.2017 ШАД, ГарвардЧескидоваRecent OpenAI work with remarkable results aimed at removing the need for humans to write complex goal functionshttps://docs.google.com/a/phystech.edu/presentation/d/1gFfxM3tGgfQg9Fnpel4joCQxoCBVMNUh_Hs_QBdc5HQ/edit?usp=sharing
11
A simple neural network module for relational reasoning20:00 13.07.2017 ШАД, ГарвардПерсияновSmart agent should understand how different objects of the world relate to each other. This is a pioneering work in the field of relational reasoning. https://docs.google.com/presentation/d/1IsmdgizUrTdMhjPqFPlglKfGDx4b67TUEJNJ3bHLi-Q/edit?usp=sharing
12
Teacher-Student Curriculum Learning19:00 20.07.2017ШАД, ГарвардГришинOpenAI interns presented a framework for automatic curriculum learning that can be used for supervised and reinforcement learning tasks.https://yadi.sk/i/uZ-slTBN3LEVW5
13
Automatic Goal Generation for Reinforcement
Learning Agents
Researches from Berkley proposed a method that allows an agent to automatically discover (i.e. generate) the range of tasks which are always at the appropriate level of difficulty for the agent.
14
Hindsight Experience Replay20:00 20.07.2017ШАД, ГарвардГоликовHow to give an agent an ability to learn even from sparse reward; an implicit curriculum learning
15
20:30 20.07.2017ШАД, ГарвардОвчаренкоTF Op для Arcade Learning Environmenthttps://github.com/dudevil/tf-ale-op
16
Proximal Policy Optimization Algorithms19:00 27.07.2017ШАД, ГарвардГришинSome math stuff about family of policy gradient methods for reinforcement learningMy stuff
17
Lecturer's slides
18
GAN in RL20:00 27.07.2017ШАД, ГарвардПанинhttps://yadi.sk/i/Kuzr5BYL3LUEyh
19
20
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning19:30 10.08.2017ШАД, ГарвардГришин(one of the)First comprehensive empirical demonstration of the
strength of disentangled representations for domain adaptation
in a deep RL setting.
https://yadi.sk/i/Dl82VVdv3Lt77s
21
Learning Time-Efficient Deep Architectures with Budgeted Super Networks20:30 10.08.2017ШАД, ГарвардВасильевThis paper represents an interesting branch of research devoted to application of RL methods to the neural network architecture search. To be compared with https://openreview.net/forum?id=r1Ue8Hcxg.https://docs.google.com/presentation/d/1E0KEfu4sjTnGCAKHQbQBcqRSnt5dypK2fngH-je_7MY/edit
22
Hybrid Reward Architecture for Reinforcement Learning19:00 17.08.2017ШАД, ГарвардЕрофеевRecent very successful application of RL to one of the very complex games – Ms. Pacman. Their approach borrows elements from hierarchical RL, which is by itself very interesting research direction.https://docs.google.com/presentation/d/1x5bF2WSLpkbZRwvIP1U4_vFN1xZ8HjvKEWnFUgUVJy8/edit
23
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning19:30 17.08.2017ШАД, ГарвардПеченкоCombining Model-Based and Model-Free Updates for real robotic arms applications
24
Imagination-augmented Agents in Deep Reinforcement Learning19:00 24.08.2017ШАД, ГарвардКонобеевI2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary wayshttps://drive.google.com/file/d/0B2s6_uWHkA0xZlhtczQ0TUpLUWs/view?usp=sharing
25
Interaction Networks for Learning about Objects, Relations and Physics19.00 14.09.2017ШАД, КембриджДулатGraph based framework for dynamic systems.(was able to simulate the physical trajectories of n-body, bouncing ball, and non-rigid string systems accurately over thousands of time steps, after training only on single step predictions)https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing
26
Programmable Agents19.00 14.09.2017ШАД, КембриджДулатDeep RL agents that execute declarative programs and can generalize to a wide variety of zero-shot semantic tasks.https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing
27
A Deep Reinforcement Learning Chatbot19.00 28.10.2017ШАД, КембриджВасильевWe present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition
28
Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning20.00 28.09.2017ШАД, КембриджЯронскаяApplied hierarchical rl to multi-domain dialogue management.
29
Stochastic Computation Graphs20.00 05.10.2017ШАДСоболевReview of methods for backpropagation in stochatic neural networks
slides.com/asobolev/stochastic-computation-graphs/fullscreen
30
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks19:00 12.10.2017ШАД, КембриджВахрамееваhttps://yadi.sk/i/NCrM4cEb3NhQux
31
Approximate Linear Programming for Logistic Markov Decision Processes - Research at Google19:00 12.10.2017ШАД, КембриджТемирчевhttp://slides.com/cydoroga/rl_lecture2/fullscreen
32
A Distributional Perspective on Reinforcement Learning19:00 19.10.2017ШАД, КембриджГринчукAuthors are interested in the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value.https://drive.google.com/file/d/0B1kx0sWGGxicSk5SR1R2WWZYWnM/view?usp=sharing
33
Learning to Optimize using Reinforcement Learning19:00 09.11.2017ШАД, КембриджЯнуш
https://drive.google.com/file/d/15JAF_POatluY3USQKaCuqRoSkJA8BgVv/view?usp=drivesdk
34
Multi-step Reinforcement Learning: A Unifying Algorithm19:00 16.11.2017ШАД, КембриджБобырёв
https://github.com/omtcvxyz/talks/blob/master/n-step-q-sigma/slides/Slides.pdf
35
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games19:00 23.11.2017ШАД, КембриджОстровскийhttps://drive.google.com/open?id=15fBsS361SFt4YPNkzaTTtatBigNi4iIG
36
Boosted Fiited Q-Iteration (B-FQI)19:00 30.11.2017ШАД, КембриджРыжиков
https://drive.google.com/file/d/1AO1j8CatxUZRdCkWPVn-1DT_78n_FtZk/view?usp=sharing
37
AlphaGo Zero19:00 30.11.2017ШАД, КембриджГринчукAlphaGo Zero -- the strongest program for playing the game of Go ever, trained entirely with self-play and reinforcement learning. No human knowledge is involved.https://drive.google.com/open?id=1L45dPTIDQloNTMkyeUAcjzLh6TA-GfU_
38
Equivalence Between Policy-Gradients and Soft Q-learning19:00 14.12.2017ШАД, КембриджКонобеевno presentation
39
Equivalence Between Policy-Gradients and Soft Q-learning19:00 08.02.2018ШАД, СтенфордКонобеев
40
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks; Some Considerations on Learning to Explore via Meta-Reinforcement Learning19:00 15.02.2018ШАД, СтенфордГоликовhttps://www.overleaf.com/read/rqfkchthwkyp
41
Rainbow: Combining Improvements in Deep Reinforcement Learning19:00 22.02.2018ШАД, СтенфордФрицлерhttps://docs.google.com/presentation/d/1mTVYQbtzxrESuJi_KvXsFNEIgm9fYZpCnCyMWcPLQqs/edit?usp=sharing
42
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
19:00 21.03.2018ШАД, СтенфордОвчаренко
43
Model-Ensemble Trust-Region Policy Optimization19:00 29.03.2018ШАД, СтенфордТемирчев
44
Soft Actor-Critic19:00 05.04.2018ШАД, СтенфордГринчукAuthors consider more general reinforcement learning problem defined by maximum entropy objective. Based on theoretical results they introduce novel off-policy Soft Actor-Critic algorithm which outperforms current state-of-the-art on continuous control benchmarks.https://drive.google.com/file/d/1xvay9iCsUiwabVt9ibWEXpcuiTtdt9vs/view?usp=sharing
45
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
19:00 12.04.2018ШАД, СтенфордВахрамеева
46
Generative Multi-Agent Behavioral Cloning19:00 19.04.2018ШАД, СтенфордСавельева
47
Action-dependent Control Variates for Policy Optimization via Stein Identity
19:00 26.04.2018ШАД, СтенфордРыжиков
48
StarCraft Micromanagement with RL and Curriculum Transfer Learning19:00 3.05.2018ШАД, СтенфордГайнцева
49
Learning to run challenge solutions19:00 17.05.2018ШАД, СтенфордПеченко
50
Колесников
51
Павлов
52
DRL in a Handful of Trials using Probabilistic Dynamics Models19:00 27.09.2018ШАД, СтенфордТемирчев
53
Latent Space Policies for Hierarchical RL19:00 04.05.2018ШАД, СтенфордГринчукhttps://drive.google.com/open?id=1FgY3Yk8wofIcZoLftYnwLOVrZOMy-hye
54
Soft Actor-Critic
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100