Deep RL reading group.ods

	A	B	C	D	E	F
1	Paper	When	Where	Who	Notes	Presentations
2	Neural Episodic Control	19:00 08.06.2017	ШАД, Гарвард	Никишин	This paper is one of the recent silent breakthroughs to intellectual agent with human-like memory	https://yadi.sk/i/wXGVGqp-3JxnUo
3	Curiosity-driven Exploration by Self-supervised Prediction	19:00 08.06.2017	ШАД, Гарвард	Темирчев	Scalable intrinsic based exploration for sparse extrinsic reward, ICML 2017	http://slides.com/cydoroga/rl_lecture1/fullscreen
4	Generative Adversarial Imitation Learning	19:00 15.06.2017	ШАД, Гарвард	Шарчилев	Shows how to use adversarial rewards when ground truth rewards are not available; Interesting theoretical insights into how RL on IRL-learned reward is related to this GAN framework.	https://yadi.sk/d/hjO3PiqE3K8hdW
5	Model-based Adversarial Imitation Learning	19:00 15.06.2017	ШАД, Гарвард	Шарчилев
6	Afterburning of dialogue models	19:00 15.06.2017	ШАД, Гарвард	Панин		https://docs.google.com/document/d/1TPcOguGQHpIEsh07pK_PaKozjg-dbm8zMVZL4iM518Y
7	Защита ВКР	19:00 22.06.2017	ШАД, Гарвард	Персиянов		https://yadi.sk/i/gxD6gONM3KN7ij
8	Защита ВКР	19:00 22.06.2017	ШАД, Гарвард	Яронская		https://yadi.sk/i/D_MOtJwj3KN7jc
9	Deal or No Deal? End-to-End Learning for Negotiation Dialogues	19:00 22.06.2017	ШАД, Гарвард	Хальман	The paper introduces the 'negotiation using natural'language' task, introduce the dataset and train an end-to-end model on it	https://yadi.sk/i/ueRSETzq3KiEdC
10	Deep reinforcement learning from human preferences	19:00 13.07.2017	ШАД, Гарвард	Ческидова	Recent OpenAI work with remarkable results aimed at removing the need for humans to write complex goal functions	https://docs.google.com/a/phystech.edu/presentation/d/1gFfxM3tGgfQg9Fnpel4joCQxoCBVMNUh_Hs_QBdc5HQ/edit?usp=sharing
11	A simple neural network module for relational reasoning	20:00 13.07.2017	ШАД, Гарвард	Персиянов	Smart agent should understand how different objects of the world relate to each other. This is a pioneering work in the field of relational reasoning.	https://docs.google.com/presentation/d/1IsmdgizUrTdMhjPqFPlglKfGDx4b67TUEJNJ3bHLi-Q/edit?usp=sharing
12	Teacher-Student Curriculum Learning	19:00 20.07.2017	ШАД, Гарвард	Гришин	OpenAI interns presented a framework for automatic curriculum learning that can be used for supervised and reinforcement learning tasks.	https://yadi.sk/i/uZ-slTBN3LEVW5
13	Automatic Goal Generation for Reinforcement Learning Agents	19:00 20.07.2017	ШАД, Гарвард	Гришин	Researches from Berkley proposed a method that allows an agent to automatically discover (i.e. generate) the range of tasks which are always at the appropriate level of difficulty for the agent.	https://yadi.sk/i/uZ-slTBN3LEVW5
14	Hindsight Experience Replay	20:00 20.07.2017	ШАД, Гарвард	Голиков	How to give an agent an ability to learn even from sparse reward; an implicit curriculum learning
15		20:30 20.07.2017	ШАД, Гарвард	Овчаренко	TF Op для Arcade Learning Environment	https://github.com/dudevil/tf-ale-op
16	Proximal Policy Optimization Algorithms	19:00 27.07.2017	ШАД, Гарвард	Гришин	Some math stuff about family of policy gradient methods for reinforcement learning	My stuff
17	Proximal Policy Optimization Algorithms	19:00 27.07.2017	ШАД, Гарвард	Гришин		Lecturer's slides
18	GAN in RL	20:00 27.07.2017	ШАД, Гарвард	Панин		https://yadi.sk/i/Kuzr5BYL3LUEyh
19	GAN in RL	20:00 27.07.2017	ШАД, Гарвард	Панин
20	DARLA: Improving Zero-Shot Transfer in Reinforcement Learning	19:30 10.08.2017	ШАД, Гарвард	Гришин	(one of the)First comprehensive empirical demonstration of the strength of disentangled representations for domain adaptation in a deep RL setting.	https://yadi.sk/i/Dl82VVdv3Lt77s
21	Learning Time-Efficient Deep Architectures with Budgeted Super Networks	20:30 10.08.2017	ШАД, Гарвард	Васильев	This paper represents an interesting branch of research devoted to application of RL methods to the neural network architecture search. To be compared with https://openreview.net/forum?id=r1Ue8Hcxg.	https://docs.google.com/presentation/d/1E0KEfu4sjTnGCAKHQbQBcqRSnt5dypK2fngH-je_7MY/edit
22	Hybrid Reward Architecture for Reinforcement Learning	19:00 17.08.2017	ШАД, Гарвард	Ерофеев	Recent very successful application of RL to one of the very complex games – Ms. Pacman. Their approach borrows elements from hierarchical RL, which is by itself very interesting research direction.	https://docs.google.com/presentation/d/1x5bF2WSLpkbZRwvIP1U4_vFN1xZ8HjvKEWnFUgUVJy8/edit
23	Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning	19:30 17.08.2017	ШАД, Гарвард	Печенко	Combining Model-Based and Model-Free Updates for real robotic arms applications
24	Imagination-augmented Agents in Deep Reinforcement Learning	19:00 24.08.2017	ШАД, Гарвард	Конобеев	I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways	https://drive.google.com/file/d/0B2s6_uWHkA0xZlhtczQ0TUpLUWs/view?usp=sharing
25	Interaction Networks for Learning about Objects, Relations and Physics	19.00 14.09.2017	ШАД, Кембридж	Дулат	Graph based framework for dynamic systems.(was able to simulate the physical trajectories of n-body, bouncing ball, and non-rigid string systems accurately over thousands of time steps, after training only on single step predictions)	https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing
26	Programmable Agents	19.00 14.09.2017	ШАД, Кембридж	Дулат	Deep RL agents that execute declarative programs and can generalize to a wide variety of zero-shot semantic tasks.	https://drive.google.com/file/d/0B1hsJjdjd5a_TlBuaDA2UEVadjg/view?usp=sharing
27	A Deep Reinforcement Learning Chatbot	19.00 28.10.2017	ШАД, Кембридж	Васильев	We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition
28	Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning	20.00 28.09.2017	ШАД, Кембридж	Яронская	Applied hierarchical rl to multi-domain dialogue management.
29	Stochastic Computation Graphs	20.00 05.10.2017	ШАД	Соболев	Review of methods for backpropagation in stochatic neural networks	slides.com/asobolev/stochastic-computation-graphs/fullscreen
30	Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks	19:00 12.10.2017	ШАД, Кембридж	Вахрамеева		https://yadi.sk/i/NCrM4cEb3NhQux
31	Approximate Linear Programming for Logistic Markov Decision Processes - Research at Google	19:00 12.10.2017	ШАД, Кембридж	Темирчев		http://slides.com/cydoroga/rl_lecture2/fullscreen
32	A Distributional Perspective on Reinforcement Learning	19:00 19.10.2017	ШАД, Кембридж	Гринчук	Authors are interested in the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value.	https://drive.google.com/file/d/0B1kx0sWGGxicSk5SR1R2WWZYWnM/view?usp=sharing
33	Learning to Optimize using Reinforcement Learning	19:00 09.11.2017	ШАД, Кембридж	Януш		https://drive.google.com/file/d/15JAF_POatluY3USQKaCuqRoSkJA8BgVv/view?usp=drivesdk
34	Multi-step Reinforcement Learning: A Unifying Algorithm	19:00 16.11.2017	ШАД, Кембридж	Бобырёв		https://github.com/omtcvxyz/talks/blob/master/n-step-q-sigma/slides/Slides.pdf
35	Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games	19:00 23.11.2017	ШАД, Кембридж	Островский		https://drive.google.com/open?id=15fBsS361SFt4YPNkzaTTtatBigNi4iIG
36	Boosted Fiited Q-Iteration (B-FQI)	19:00 30.11.2017	ШАД, Кембридж	Рыжиков		https://drive.google.com/file/d/1AO1j8CatxUZRdCkWPVn-1DT_78n_FtZk/view?usp=sharing
37	AlphaGo Zero	19:00 30.11.2017	ШАД, Кембридж	Гринчук	AlphaGo Zero -- the strongest program for playing the game of Go ever, trained entirely with self-play and reinforcement learning. No human knowledge is involved.	https://drive.google.com/open?id=1L45dPTIDQloNTMkyeUAcjzLh6TA-GfU_
38	Equivalence Between Policy-Gradients and Soft Q-learning	19:00 14.12.2017	ШАД, Кембридж	Конобеев		no presentation
39	Equivalence Between Policy-Gradients and Soft Q-learning	19:00 08.02.2018	ШАД, Стенфорд	Конобеев
40	Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks; Some Considerations on Learning to Explore via Meta-Reinforcement Learning	19:00 15.02.2018	ШАД, Стенфорд	Голиков		https://www.overleaf.com/read/rqfkchthwkyp
41	Rainbow: Combining Improvements in Deep Reinforcement Learning	19:00 22.02.2018	ШАД, Стенфорд	Фрицлер		https://docs.google.com/presentation/d/1mTVYQbtzxrESuJi_KvXsFNEIgm9fYZpCnCyMWcPLQqs/edit?usp=sharing
42	Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control	19:00 21.03.2018	ШАД, Стенфорд	Овчаренко
43	Model-Ensemble Trust-Region Policy Optimization	19:00 29.03.2018	ШАД, Стенфорд	Темирчев
44	Soft Actor-Critic	19:00 05.04.2018	ШАД, Стенфорд	Гринчук	Authors consider more general reinforcement learning problem defined by maximum entropy objective. Based on theoretical results they introduce novel off-policy Soft Actor-Critic algorithm which outperforms current state-of-the-art on continuous control benchmarks.	https://drive.google.com/file/d/1xvay9iCsUiwabVt9ibWEXpcuiTtdt9vs/view?usp=sharing
45	DORA The Explorer: Directed Outreaching Reinforcement Action-Selection	19:00 12.04.2018	ШАД, Стенфорд	Вахрамеева
46	Generative Multi-Agent Behavioral Cloning	19:00 19.04.2018	ШАД, Стенфорд	Савельева
47	Action-dependent Control Variates for Policy Optimization via Stein Identity	19:00 26.04.2018	ШАД, Стенфорд	Рыжиков
48	StarCraft Micromanagement with RL and Curriculum Transfer Learning	19:00 3.05.2018	ШАД, Стенфорд	Гайнцева
49	Learning to run challenge solutions	19:00 17.05.2018	ШАД, Стенфорд	Печенко
50				Колесников
51				Павлов
52	DRL in a Handful of Trials using Probabilistic Dynamics Models	19:00 27.09.2018	ШАД, Стенфорд	Темирчев
53	Latent Space Policies for Hierarchical RL	19:00 04.05.2018	ШАД, Стенфорд	Гринчук		https://drive.google.com/open?id=1FgY3Yk8wofIcZoLftYnwLOVrZOMy-hye
54	Soft Actor-Critic	19:00 04.05.2018	ШАД, Стенфорд	Гринчук
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100