Project AGI (agi.io): Exciting New Directions in ML/AI

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S
1		Q4	Q1		Q2	Q3	Q4		Q1	Q2	Q3	Q4	Q1	Q2	Q3		Q4	Q1
2		2014	2015						2016				2017					2018

3	Deep Reinforcement Learning		Human-level control through deep reinforcement learning (Deep Q Network - DQN)	Deep Recurrent Q-Learning for Partially Observable MDPs (Deep Recurrent Q-Network - DRQN)					Asynchronous Methods for Deep Reinforcement Learning (A3C)					Imagination-Augmented Agents for Deep Reinforcement Learning (I2A)	Rainbow: Combining Improvements in Deep Reinforcement Learning	A distributional perspective on Reinforcement Learning		IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures	Distributed Prioritized Experience Replay
4			Mnih 2015	Hausknecht et al 2015					Mnih et al 2016					Weber 2017	Hessel 2017	Bellemare 2017		Espeholt 2018	Horgan 2018
5			Deepmind	Uni. of Texas Austin					Deepmind					Deepmind	Deepmind	Deepmind		Deepmind	Deepmind
6			http://www.davidqiu.com:8888/research/nature14236.pdf	https://arxiv.org/abs/1507.06527					https://arxiv.org/pdf/1602.01783.pdf					https://arxiv.org/abs/1707.06203	https://arxiv.org/abs/1710.02298	https://arxiv.org/abs/1707.06887		https://arxiv.org/abs/1802.01561	https://arxiv.org/abs/1803.00933
7			This paper derives the Deep-Q-Network from traditional Q-learning, via 3 innovations: (1) Multilayer ANNs are used to estimate Q values. (2) Experience replay buffer is used to train the network, which helps to make training practical and aid learning of the association of rare and distant rewards with their causes. (3) Target network - the system is comprised of 2 networks that collectively implement the agent. The target network is trained to guess Q-values during training, i.e. to simulate the real reward function. This paper was a huge leap forward in RL capability and has been cited widely.	DRQN extends DQN by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. Allows DQN to consider a longer history when modelling behaviour in particular states, and demonstrates that this system can also deal with partial observability, which is very important as many real-world problems are PO-MDPs. Tested on Atari games with partial observability with good results.					Replaces and is demonstrated to be better than DQN. The name A3C (Asynchronous Actor-Critic Agents) is derived from the Actor-Critic RL architecture. The A3 refers to 3 characteristics that begin with A: Actor (i.e. Actor-Critic architecture), Asynchronous (several sub-networks are trained simultaneously) and Advantage - a new formulation for Reward values that seems to be preferable for exploring the space and discovering where rewards are poorly defined. This paper significantly advanced the state of the art in many tests using computer games.					Model-free RL maps inputs to actions directly. But there are problems with generalization due to lack of an internal model. Model-based methods learn a model from the environment, then inputs map to model configurations and model configurations to outputs. The agent can then reason inside the simulated model without inputs, akin to imagination. There is potential to use imagination to learn from fewer experiences. The architecture is demonstrated on Sobokan and Minipacman.	Focus on improved overall performance by combining good ideas. Which improvements to DQN can be effectively combined without problems, versus others which are competing and incompatible improvements? Rainbow is the empirically optimum combination of techniques (via "median human-normalized [difficulty] score", which is valid across multiple problems). Tricks covered include Double-DQN, Prioritized ER, Duelling, Multi-step learning, Distributional RL (this is one of my favourites), Noisy Nets. Ablation (i.e. remove one trick) results showed that Prioritized ER, and Multi-step learning most crucial, then Distributional RL. Benefit of Double-DQN and and Duelling was mixed (+ve, -ve) and overall neutral.	This is the first (in recent work) to model the value distribution rather than the expectation of value. Turns out to be really important (and demonstrates benchmark-beating results. This is a fundamental rethink.		Trained a single algorithm to simultaneously solve several "Lab" and Atari games with shared parameters (learned weights). Improvement on A3C. Wall-clock training speed is emphasized. The architecture is broken down into 'actors' and 'learners' enabling distributed evaluation of parameters. V-trace off-policy correction mechanism: Parallel learning is stable. There are two outcomes: faster learning with better or similar performance, and demonstrated benefits of transfer between tasks.	As with Impala, another attempt to accelerate learning by separating actor and learner. Actors contribute to a shared Experience Replay buffer. Prioritization is over the contents of the shared ER buffer. Comparison to Stochastic Gradient Descent optimization theory (why it helps). Direct heritage from original ER in DQN. Unlike IMPALA does not attempt to learn multiple problems simultaneously nor share learned weights.
8	RL with Episodic Memory									Model-Free Episodic Control (MFEC)			Neural Episodic Control (NEC)
9	Deep RL approach, maintaining history for replay.									Blundell et al 2016			Pritzel et al 2017
10										Deepmind			Deepmind
11										https://arxiv.org/abs/1606.04460			https://arxiv.org/abs/1703.01988
12										Hypothesis: Hippocampal-inspired episodic control module can achieve better performance on sequence learning tasks, and learn more quickly (from fewer episodes), by replaying them as simulations. The episodic replay module is a Q-value table with pruning. Tested on Arcade Learning Environment (Atari games), and was shown to learn faster and achieve higher scores on several games than DQN, and A3C. However, authors expect the approach has limited ability to generalize episodes due to the tabular replay buffer.			Objective is to learn faster, from fewer experiences (episodes) by taking inspiration from hippocampus. Kumaran et al. (2016) suggest that training on replayed experiences from the replay buffer in DQN is similar to the replay of experiences from episodic memory during sleep in animals. Derived from DQN with new components. Improves both learning speed and performance over DQN and MFEC. However we still have a very large Q-table (DND) so generalization is suspect.
13	Attention	NTM Neural Turing Machine										Differentiable Neural Computer (DNC)		Attention is all you need
14		Graves et al 2014										Graves et al 2016		Vaswani et al 2017
15		Deepmind										Deepmind		Google Brain
16		https://arxiv.org/abs/1410.5401										https://www.nature.com/articles/nature20101		https://arxiv.org/abs/1706.03762
17		The key concept of this paper was to add a general purpose working memory to ANNs. Although the name sounds very artificial (ilke a Turing machine with an infinite tape) the work is inspired by and similar to Short Term Memory, or Working Memory in humans (this is mentioned in the introduction). They demonstrated that the ANN could now learn several general purpose algorithms involving storing temporary variables in the memory. They learned some simple programs, including how to reason about a graph. This paper is somewhat related to Long-Short-Term-Memory (LSTM) but different (more flexible and powerful) in the way memory is utilized.										This paper extends and improves on NTM. They start to talk about the benefits of a 'fully differentiable architecture' for deep training of sophisticated memory systems - all components can be trained by deep backpropagation even though the layers are dissimilar in structure and function. Making the memory read and write heads fully differentiable is a significant achievement. Although not stated so clearly, this architecture is also aiming to reproduce some of the capabilities of human general purpose working memory, and/or short-term memory.		A simplified architecture replacing convolution and recurrence with attention instead. Despite reducing architectural complexity, they beat natural language machine translation benchmarks considerably. Thus, the claim that attention is a very powerful tool and does all you need.
18	Hippocampus Inspired - Mixed Biological/ML Studies												The Hippocampus as a Predictive Map	The Successor Representation in Human Reinforcement Learning		Dorsal Hippocampus Contributes to Model-based Planning
19	It is significant that Biologists and Machine Learning researchers are working together to understand neuroscience and to improve ML algorithms. This work can directly influence future approaches to RL, and may have been important in some of the RL approaches summarised elsewhere in this table, in particular those referencing hippocampal concepts such as 'Episodic' learning.												Stachenfeld 2017	Momennejad 2017		Miller 2017
20													Multiple neuroscience institutes & Deepmind	Multiple neuroscience institutes & Deepmind		Multiple neuroscience institutes & Deepmind
21													https://www.nature.com/articles/nn.4650	https://www.nature.com/articles/s41562-017-0180-8		https://www.nature.com/articles/nn.4613
22													This study looks at the function of the Hippocampus from an RL perspective. They find that the Hippocampus forms low dimensional representations that are effective at making predictions, differing from the traditional interpretation of grid cells as simply representing spatial locations.	Looks at the role of hippocampus in human RL in terms of model-free vs model based approaches. They show a combination of both called the Successor Representation (SR). This is again a combined biological / ML study, in this case from a psychological perspective.		Investigation into the neural mechanisms for planning in terms of action selection, in the dorsal hippocampus, a structure long believed to be important for this function. The results suggest that model based planning is employed. Another example of a neuroscientific analysis that can fuel new RL algorithms.
23	Few-shot Learning						Siamese Neural Networks for One-Shot Image Recognition	Human-level Concept Learning Through Probabilistic Program Induction	One-Shot Generalization in Deep Generative Models	Matching Networks for One Shot Learning			Optimization as a Model for Few-shot Learning	Prototypical Networks for Few-shot Learning			A Generative Vision Model That Trains with High Data Efficiency and Breaks Text-based CAPTCHAs (RCN)	Meta-Learning for Semi-Supervised Few-Shot Classification
24							Koch 2015	Lake 2015	Rezende 2016	Vinyals 2016			Ravi 2017	Snell 2017			George 2017	Ren 2018
25							University of Toronto	New York University	Deepmind	Deepmind			Twitter	University of Toronto / Twitter			Vicarious	Google Brain
26							https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf	http://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.pdf	https://arxiv.org/pdf/1603.05106	https://arxiv.org/pdf/1606.04080			https://openreview.net/pdf?id=rJY0-Kcll	https://arxiv.org/pdf/1703.05175			http://science.sciencemag.org/content/early/2017/10/25/science.aag2612.full	https://lld-workshop.github.io/papers/LLD_2017_paper_40.pdf
27							Inspired by earlier work on one-shot learning by Fei Fei Li and Lake from the 90's and early 2000's. Deep siamese CNN are used to compare class labels and verify a pair of images are of the same class. They applied this to verification of unseen classes of Omniglot and MNIST dataset, to show that it is capable of a type of one shot learning. It was one of the first of the recent papers on one-shot learning and has been used for comparison in subsequent papers.	This paper triggered renewed interest in few shot learning and has become a foundational template for testing such problems. The main concept is to quickly learn new classes that are composed of parts that have already been learned, cast as "learning to learn". This is an ability to generalise from few exposures. Tests cover classification as well as generation of novel exemplars as well as generation or 'invention' of exemplars of completely novel classes. The main dataset used was Omniglot. The underlying algorithm is Bayesian.	This paper extended Lake 2015 by incorporating deep learning and use feedback and attentional mechanisms for both inference and generation. They combine the "representational power of deep neural networks embedded within hierarchical latent variable models, with the inferential power of approximate Bayesian reasoning". They similarly focus on the Omniglot dataset. The system is more general, but requires more training data.	This paper took a different approach for one-shot learning to Lake 2015. Together they have defined two templates for subsequent papers. In this approach, the system learns to match an unlabelled exemplar with a small support set. It is then able to learn to match previously unseen classes. In this way, it is also framed as a problem of learning to learn. They utilise a CNN embedding function and ran tests on images Omniglot and ImageNet as well as a language task using the Penn Treebank dataset.			Extends Vinyals by using LSTM. "Rather than training a single model over multiple episodes, the LSTM meta-learner learns to train a custom model for each episode." Snell 2017	This is a variation on Vinyals using prototypical networks. A CNN embedding function is used to transform the input to a metric space where a 'prototype' of the class is the mean of a small support set. Classification is done by finding nearest point in that embedded space. They also show performance on zero-shot, where the prototype is derived from one sample. They tested on several image datasets: Omniglot, miniImageNet and Caltech-UCSD Birds (CUB).			This system is not focussed on few-shot learning specifically, but it is noteworthy in that it requires much smaller training sets, up to 300 fold less than deep networks for comparable tasks. It approaches image recognition by analysing texture and shape separately. For the latter, a hierarchical bayesian model with feedback and lateral connections is utilised. They demonstrated results for MNIST, ICDAR and a variety of CAPTCHAs.	This is an extensions to the prototypical networks of Snell 2017, to work semi-supervised i.e. some examples in the small support sets are unlabelled.
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100