[PUBLIC] Data for training cost trends in machine learning

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z	AA	AB	AC	AD	AE
1	System	Publication date	Training compute (FLOPs)	Method 1 Training cost (2020 USD)	Method 2 Training cost (2020 USD)	Hardware model	Domain	Task	Organization(s)	Organization Categorization	Author(s)	Year	Reference	Link	Training core-hours	Citations	Inclusion criteria	Parameters	Training dataset	Training dataset size (datapoints)	Hidden layers	Inference compute (FLOPs)	Training time (hours)	Equivalent training time (hours)	Inference time (ms)	Training dataset size (GB)	Approach	Dense or sparse model	Training objective	Architecture	Compute Sponsor Categorization

2	GPU DBNs	2009-06-15	1.0E+15	0.05	0.06	NVIDIA GeForce GTX 280	Other		Stanford	Academia	R Raina, A Madhavan, AY Ng	2009	Large-scale Deep Unsupervised Learning using Graphics Processors	http://www.machinelearning.org/archive/icml2009/papers/218.pdf		7.89E+02		1.00E+08		1.00E+06											Academia
3	6-layer MLP (MNIST)	2010-03-01	1.3E+14	0.01	0.01	NVIDIA GeForce GTX 280	Vision	Character recognition	IDSIA ; University of Lugano & SUPSI	Academia	Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, Juergen Schmidhuber	2010	Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition	https://arxiv.org/abs/1003.0358		1.26E+03	Highly cited	1.21E+07	MNIST	6.00E+04											Academia
4	Feedforward NN	2010-05-13	3.5E+14	0.01			Vision	Digit recognition	University of Montreal	Academia	X Glorot, Y Bengio	2010	Understanding the difficulty of training deep feedforward neural networks	https://proceedings.mlr.press/v9/glorot10a.html		1.33E+04	Highly cited	7.08E+06	MNIST			1.40E+07									Academia
5	RNN 500/10 + RT09 LM (NIST RT05)	2010-09-26	3.4E+15	0.11			Speech	Transcription	Brno University of Technology, Johns Hopkins University	Academia	T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur	2010	Recurrent neural network based language model.	https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model		5.67E+03	Highly cited	5.27E+06	NIST RT05	5.40E+06		1.05E+07									Academia
6	KN5 LM + RNN 400/10 (WSJ)	2010-09-26	6.1E+16	2.03			Speech	Transcription	Brno University of Technology, Johns Hopkins University	Academia	T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur	2010	Recurrent neural network based language model.	https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model		5.67E+03	Highly cited	8.00E+07	WSJ	6.40E+06		1.60E+08									Academia
7	MCDNN (MNIST)	2012-02-13	3.7E+15	0.08			Vision	Character recognition	IDSIA	Academia	D Ciregan, U Meier, J Schmidhuber	2012	Multi-column Deep Neural Networks for Image Classification	https://arxiv.org/abs/1202.2745v1		4.83E+03	Highly cited	1.99E+06	MNIST	6.00E+04	3	2.59E+07		490		0.021					Academia
8	Dropout (MNIST)	2012-06-03	6.0E+15	0.12	0.10	NVIDIA GeForce GTX 580	Vision	Character recognition	University of Toronto	Academia	GE Hinton, N Srivastava, A Krizhevsky	2012	Improving neural networks by preventing co-adaptation of feature detectors	https://arxiv.org/abs/1207.0580		6.68E+03	Highly cited	5.59E+06	MNIST	6.00E+04	2	1.12E+07				0.021					Academia
9	AlexNet	2012-09-30	4.7E+17	8.86	8.00	NVIDIA GeForce GTX 580	Vision	Image classification	University of Toronto	Academia	Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton	2012	ImageNet Classification with Deep Convolutional Neural Networks	https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html		8.51E+04	Highly cited	6.00E+07	ImageNet	1.20E+06	8										Academia
10	DQN	2013-01-01	2.3E+15	0.04			Games	Atari	DeepMind	Industry	V Mnih, K Kavukcuoglu, D Silver, A Graves	2013	Playing Atari with Deep Reinforcement Learning	https://arxiv.org/abs/1312.5602		6.68E+03	Highly cited	8.36E+05													Industry
11	Mitosis	2013-09-22	1.4E+17	2.00			Vision		IDSIA	Academia	Dan C. Cireşan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber	2013	Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks	https://link.springer.com/chapter/10.1007/978-3-642-40763-5_51		1.46E+03	ICPR 2012 mitosis detection competition winner	3.72E+04		1.00E+06											Academia
12	Word2Vec (large)	2013-10-16	3.9E+16	0.55			Language	Semantic embedding	Google	Industry	T Mikolov, I Sutskever, K Chen, GS Corrado	2013	Distributed Representations of Words and Phrases and their Compositionality	https://arxiv.org/abs/1310.4546		2.87E+04	Highly cited	6.92E+09		6.92E+05									Predict nearby words	Recurrent Neural Network	Industry
13	Visualizing CNNs	2013-11-12	5.3E+17	7.30	9.02	NVIDIA GeForce GTX 580	Vision		NYU	Academia	MD Zeiler, R Fergus	2013	Visualizing and Understanding Convolutional Networks	https://arxiv.org/abs/1311.2901		1.30E+04	Highly cited												Predict nearby words		Academia
14	TransE	2013-12-05	1.3E+18	17.58			Other	Entity embedding	CNRS, Google	Industry - Academia Collaboration	Antoine Bordes, Nicolas Usunier, Alberto Garcia- Duran, Jason Weston, and Oksana Yakhnenko	2013	Translating Embeddings for Modeling Multi- relational Data	https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html		4.00E+03				1.70E+07											Industry
15	Image generation	2013-12-20	4.8E+14	0.01		Intel Xeon	Vision	Image clustering	Univeristy of Amsterdam	Academia	DP Kingma, M Welling	2013	Auto-Encoding Variational Bayes	https://arxiv.org/abs/1312.6114		1.56E+04	Highly cited		MNIST	6.00E+04											Academia
16	Image Classification with the Fisher Vector: Theory and Practice	2013-06-12	9.1E+13	0.00		Intel Xeon E5-2470	Vision	Image Classifcation	Universidad Nacional de Cordoba, Xerox Research Centre Europe, Inteligent Systems Lab Amsterdam, University of Amsterdam, LEAR Team, INRIA Grenoble	Industry - Academia Collaboration	orge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek	2013	Image Classification with the Fisher Vector: Theory and Practice	https://hal.inria.fr/hal-00830491v2/document		1707	Highly cited		ImageNet				2
17	GANs	2014-06-10	5.2E+17	6.09			Drawing	Image generation	Universite de Montréal	Academia	Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio	2014	Generative Adversarial Networks	https://arxiv.org/abs/1406.2661		3.69E+04	Highly cited		CIFAR-10	6.00E+04											Academia
18	SPPNet	2014-06-18	6.1E+18	70.97	65.07	NVIDIA GeForce GTX TITAN	Vision	Image classification	Microsoft, Xi’an Jiaotong University, University of Science and Technology of China	Industry - Academia Collaboration		2014	Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition	https://arxiv.org/abs/1406.4729		7.41E+03	Highly cited		Imagenet-1k	1.28E+06											Industry
19	RNNsearch-50*	2014-09-01	1.6E+18	17.57	81.48	NVIDIA Quadro K6000	Language	Translation	Universite de Montréal, Jacobs University Bremen	Academia	D Bahdanau, K Cho, Y Bengio	2014	Neural Machine Translation by Jointly Learning to Align and Translate	https://arxiv.org/abs/1409.0473		1.92E+04	Highly cited		WMT'14 + selection	3.84E+08											Academia
20	VGG16	2014-09-04	8.5E+18	93.11	82.80	NVIDIA GeForce GTX TITAN Black	Vision		University of Oxford	Academia	Karen Simonyan; Andrew Zisserman	2014	Very Deep Convolutional Networks for Large-Scale Image Recognition	https://arxiv.org/abs/1409.1556		6.13E+04	Highly cited	1.38E+08	ILSVRC-2012	1.30E+06	16	1.53E+10									Academia
21	Seq2Seq LSTM	2014-09-10	7.3E+18	79.60			Language	Translation	Google	Industry	I Sutskever, O Vinyals, QV Le	2014	Sequence to Sequence Learning with Neural Networks	https://arxiv.org/abs/1409.3215		1.57E+04	Highly cited	3.84E+08	WMT'14 dataset	3.84E+08											Industry
22	ADAM (CIFAR-10)	2014-12-22	6.0E+16	0.60			Vision	Image classification	University of Amsterdam, OpenAI, University of Toronto	Industry - Academia Collaboration	DP Kingma, J Ba	2014	Adam: A Method for Stochastic Optimization	https://arxiv.org/abs/1412.6980		8.11E+04	Highly cited														Industry
23	MSRA (C, PReLU)	2015-01-09	2.4E+19	238.36	2166.22	NVIDIA Tesla K40	Vision	Image classification	Microsoft research	Industry	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun	2015	Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition	https://arxiv.org/abs/1406.4729		1.41E+04	Highly cited	8.70E+07	Imagenet-1k	1.28E+06											Industry
24	GoogLeNet / InceptionV1	2015-06-07	1.6E+18	14.16			Vision	Image classification	Google, University of Michigan, University of North Carolina	Industry - Academia Collaboration (Industry leaning)	Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich	2015	Going deeper with convolutions	https://arxiv.org/abs/1409.4842		3.28E+04	Highly cited	6.80E+06	ILSVRC 2014	1.20E+06	22										Industry
25	AlphaGo Fan	2015-10-01	3.8E+20	3076.07			Games	Go	Google DeepMind	Industry	D Silver, A Huang, CJ Maddison, A Guez, L Sifre	2015	Mastering the game of Go with deep neural networks and tree search	https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ		5.18E+03	SOTA improvement	8.21E+06													Industry
26	DeepSpeech2	2015-12-08	2.6E+19	199.71	150.78	NVIDIA GeForce GTX TITAN X	Speech	Speech recognition	Baidu Research- Silicon Valley AI Lab	Industry	D Amodei, S Ananthanarayanan	2015	Deep Speech 2: End-to-End Speech Recognition in English and Mandarin	https://arxiv.org/abs/1512.02595		2.21E+03	Highly cited	3.80E+07		9.80E+09	11	1.80E+09									Industry
27	ResNet-152 (ImageNet)	2015-12-10	1.2E+19	92.03			Vision	Image classification	Microsoft	Industry	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun	2015	Deep Residual Learning for Image Recognition	https://arxiv.org/abs/1512.03385		8.58E+04	Highly cited	6.00E+07	ILSVRC 2012	1.20E+06	152	2.26E+10				138					Industry
28	AlphaGo Lee	2016-01-27	1.9E+21	14041.80		Google TPU V1	Games	Go	DeepMind	Industry	D Silver, A Huang, CJ Maddison, A Guez, L Sifre	2016	Mastering the game of Go with deep neural networks and tree search	https://www.nature.com/articles/nature16961		1.08E+04	Highly cited			2.94E+07											Industry
29	R-FCN	2016-06-21	6.1E+16	0.40	5.51	NVIDIA Tesla K40	Vision	Object detection	Microsoft research, Tsinghua university	Industry - Academia Collaboration (Industry leaning)	Jifeng Dai, Y. Li, Kaiming He, and Jian Sun	2016	R-fcn: Object detection via region-based fully convolutional networks.	https://arxiv.org/abs/1605.06409		4.49E+03			PASCAL VOC (2007 and 2012 vesrions) + MS COCO	9.44E+04				12.06567222	170						Industry
30	Part-of-sentence tagging model	2016-07-21	1.5E+17	0.97			Language	POS tagging	University of Toronto	Academia	Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton	2016	Layer Normalization.	https://arxiv.org/abs/1607.06450		4.13E+03	Highly cited							12							Academia
31	Named Entity Recognition model	2016-07-21	9.7E+16	0.63			Language	Named Entity Recognition model	University of Toronto	Academia	Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton	2016	Layer Normalization.	https://arxiv.org/abs/1607.06450		4.13E+03	Highly cited							8							Academia
32	GNMT	2016-09-26	6.9E+21	42275.13	307573.50	NVIDIA Tesla K80	Language	Translation	Google	Industry	Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean	2016	Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation	https://research.google/pubs/pub45610/		4.50E+03	Highly cited	2.78E+08		3.60E+08											Industry
33	Xception	2016-10-07	4.4E+19	267.30	1961.34	NVIDIA Tesla K80	Vision	Image classification	Google	Industry	François Chollet	2016	Xception: Deep Learning with Depthwise Separable Convolutions	https://arxiv.org/abs/1610.02357		5.84E+03	Highly cited	2.29E+07	JFT	3.50E+08		1.68E+10									Industry
34	NASv3 (CIFAR-10)	2016-11-05	2.2E+21	13069.35			Vision		Google Brain	Industry	Barret Zoph, Quoc V. Le	2016	Neural Architecture Search with Reinforcement Learning	https://arxiv.org/abs/1611.01578		2.97E+03	Highly cited	3.74E+07			39										Industry
35	Libratus	2017-01-01	1.1E+21	6253.49		Intel Xeon E5-2695 v3	Games	Poker	Carnegie Mellon University	Academia	N Brown, T Sandholm, S Machine	2017	Libratus: The Superhuman AI for No-Limit Poker	https://www.cs.cmu.edu/~noamb/papers/17-IJCAI-Libratus.pdf		6.40E+01	SOTA improvement							3000000							Academia
36	AlphaGo Master	2017-01-01	1.5E+23	852748.08		Google TPU V1	Games	Go	DeepMind	Industry	D Silver, J Schrittwieser, K Simonyan, I Antonoglou	2017	Mastering the game of Go without human knowledge	https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge		5.81E+03	Highly cited														Industry
37	DeepStack	2017-01-06	1.5E+14	0.00			Games	Poker	University of Alberta, Charles University, Czech Technical University	Academia	Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling	2017	DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker	https://arxiv.org/abs/1701.01724		6.18E+02		2.50E+06		1.00E+07											Academia
38	MoE	2017-01-23	9.4E+19	525.39	8484.35	NVIDIA Tesla K40	Language	Language modelling / Machine translation	Google Brain, Jagiellonian University, Cracow	Industry - Academia Collaboration (Industry leaning)	N Shazeer, A Mirhoseini, K Maziarz, A Davis	2017	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	https://arxiv.org/abs/1701.06538		6.87E+02		8.70E+09		1.00E+11								Sparse		Long Short-Term Memory Mixture-Of-Experts	Industry
39	Transformer	2017-06-12	7.4E+18	37.13	111.17	NVIDIA Tesla P100	Language	Translation	Google Brain ; Google Research	Industry	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin	2017	Attention Is All You Need	https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf		2.52E+04	Highly cited	2.13E+08		3.60E+08		5.40E+10		672							Industry
40	JFT	2017-08-04	4.8E+20	2311.64	21396.42	NVIDIA Tesla K80	Vision		Google Research, CMU	Industry - Academia Collaboration	ChenSun,AbhinavShrivastava,SaurabhSingh,andAbhinavGupta	2017	Revisiting Unreasonable Effectiveness of Data in Deep Learning Era.	https://arxiv.org/abs/1707.02968		1.14E+03	Highly cited		JFT-300M	3.00E+08											Industry
41	OpenAI TI7 DOTA 1v1	2017-08-11	6.0E+20	2873.99			Games	DOTA	OpenAI	Industry	A Radford, K Narasimhan, T Salimans, I Sutskever	2017	Dota 2	https://openai.com/five/		NA		1.50E+08													Industry
42	AlphaGo Zero	2017-10-19	3.4E+23	1544149.42		Google TPU V1	Games	Go	DeepMind	Industry	D Silver, J Schrittwieser, K Simonyan, I Antonoglou	2017	Mastering the game of Go without human knowledge	https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge		5.81E+03	Highly cited	4.64E+07		5.80E+09											Industry
43	PNASNet-5	2017-12-02	6.6E+19	289.74	991.48	NVIDIA Tesla P100			Johns Hopkins University, Stanford, Google AI	Industry - Academia Collaboration (Industry leaning)	C Liu, B Zoph, M Neumann, J Shlens	2017	Progressive Neural Architecture Search	https://arxiv.org/abs/1712.00559		1.34E+03	Highly cited		Imagenet-1k	1.28E+06											Industry
44	AlphaZero	2017-12-05	3.7E+22	162054.70		Google TPU V2	Games		DeepMind	Industry	D Silver, T Hubert, J Schrittwieser, I Antonoglou	2017	Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm	https://arxiv.org/abs/1712.01815		1.08E+03	Highly cited			7.00E+05									Score		Industry
45	IMPALA	2018-02-05	1.7E+20	709.79	2553.82	NVIDIA Tesla P100	Games	Atari	DeepMind	Industry	Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu	2018	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures	https://arxiv.org/abs/1802.01561		6.75E+02		1.60E+06		2.40E+11											Industry
46	AmoebaNet-A (F=448)	2018-02-05	3.9E+20	1628.35	5858.75	NVIDIA Tesla P100	Vision	Image classification	Google Brain	Industry	Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le	2018	Regularized Evolution for Image Classifier Architecture Search	https://arxiv.org/abs/1802.01548		1.71E+03	Highly cited	4.69E+08	Imagenet-1k	1.28E+06											Industry
47	YOLOv3	2018-04-08	5.1E+19	202.99	295.76	NVIDIA GeForce GTX TITAN X	Vision	Object detection	University of Washington	Academia	Joseph Redmon, Ali Farhadi	2018	YOLOv3: An Incremental Improvement	https://arxiv.org/abs/1804.02767		7.71E+03	Highly cited	1.06E+08	ImageNet	1.28E+06		7.10E+10									Academia
48	GPT	2018-06-01	1.8E+19	68.72		NVIDIA Quadro P600	Language		OpenAI	Industry	A Radford, K Narasimhan, T Salimans, I Sutskever	2018	Improving Language Understanding by Generative Pre-Training	https://openai.com/blog/language-unsupervised/		2.26E+03	Highly cited	1.17E+08	BooksCorpus	1.00E+09		3.00E+10									Industry
49	Population-based DRL	2018-07-03	3.5E+19	130.36			Games	Capture the flag	DeepMind	Industry	Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel	2018	Human-level performance in first-person multiplayer games with population-based deep reinforcement learning	https://arxiv.org/abs/1807.01281		4.34E+02		1.22E+08				6.00E+10									Industry
50	BigGAN-deep 512x512	2018-09-28	3.0E+21	10448.44			Drawing	Image generation	Heriot-Watt University, DeepMind	Industry - Academia Collaboration	A Brock, J Donahue, K Simonyan	2018	Large Scale GAN Training for High Fidelity Natural Image Synthesis	https://arxiv.org/abs/1809.11096		1.98E+03	Highly cited	1.13E+08	JFT-300M	2.92E+08											Industry
51	BERT-Large	2018-10-11	2.9E+20	999.93		Google TPU V2	Language	Next sentence prediction	Google AI	Industry	J Devlin, MW Chang, K Lee, K Toutanova	2018	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	https://arxiv.org/abs/1810.04805		2.38E+04	Highly cited	3.40E+08		3.30E+09		7.90E+10									Industry
52	Decoupled weight decay regularization	2019-01-04	2.5E+18	8.07			Vision	Image classification	University of Freiburg	Academia	Ilya Loshchilov and Frank Hutter	2019	Decoupled weight decay regularization.	https://arxiv.org/abs/1711.05101		2.06E+03		3.65E+07	CIFAR-10	5.00E+04		1.73E+09									Academia
53	Hanabi 4 player	2019-02-01	9.2E+16	0.29	0.34	NVIDIA Tesla V100 PCIe	Games	Hanabi	DeepMind, University of Oxford, Google Brain, Carnegie Mellon University,	Industry - Academia Collaboration (Industry leaning)		2019	The Hanabi Challenge: A New Frontier for AI Research	https://arxiv.org/abs/1902.00506		1.15E+02		7.64E+05													Industry
54	GPT-2	2019-02-14	1.5E+21	4692.89			Language		OpenAI	Industry	A Radford, J Wu, R Child, D Luan, D Amodei	2019	Language Models are Unsupervised Multitask Learners	https://openai.com/blog/better-language-models/		1.70E+03	Highly cited	1.50E+09		3.00E+09		3.40E+12				40					Industry
55	ProxylessNAS	2019-02-23	3.7E+19	114.96	135.04	NVIDIA Tesla V100 PCIe	Vision		MIT	Academia	Han Cai, Ligeng Zhu, and Song Han	2019	ProxylessNAS: Direct neural architecture search on target task and hardware	https://arxiv.org/abs/1812.00332		9.96E+02			ImageNet	1.28E+06		2.63E+11		200	5.1						Academia
56	Cross-lingual alignment	2019-04-04	2.6E+18	7.83					Tel Aviv University, MIT	Academia	Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson.	2019	Cross-lingual alignment of contextual word embeddings, with applications to zero- shot dependency parsing.	https://arxiv.org/abs/1902.09492		1.29E+02						3.66E+12									Academia
57	MnasNet-A1 + SSDLite	2019-05-29	1.5E+21	4331.00			Vision	Performing image classification and object detection on mobile devices	Google	Industry	Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le	2019	MnasNet: Platform-Aware Neural Architecture Search for Mobile	https://arxiv.org/abs/1807.11626		1.43E+03	Highly cited	4.90E+06	MS COCO	1.18E+05											Industry
58	MnasNet-A3	2019-05-29	1.5E+21	4331.00			Vision	Performing image classification and object detection on mobile devices	Google	Industry	Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le	2019	MnasNet: Platform-Aware Neural Architecture Search for Mobile	https://arxiv.org/abs/1807.11626		1.43E+03	Highly cited	5.20E+06	ImageNet	1.28E+06											Industry
59	DLRM-2020	2019-05-31	4.0E+18	11.53	14.60	NVIDIA Tesla V100 PCIe	Recommendation		Facebook AI	Industry	M Naumov, D Mudigere, HJM Shi, J Huang	2019	Deep Learning Recommendation Model for Personalization and Recommendation Systems	https://arxiv.org/abs/1906.00091		1.40E+02		1.00E+11													Industry
60	FTW	2019-05-31	7.3E+21	21045.02			Games	Capture the flag	DeepMind	Industry	M Jaderberg, WM Czarnecki, I Dunning, L Marris	2019	Human-level performance in 3D multiplayer games with population-based reinforcement learning	https://deepmind.com/research/publications/capture-the-flag		4.25E+02		1.26E+08				1.21E+12									Industry
61	ObjectNet	2019-09-06	1.9E+19	50.79			Vision	Object recognition	MIT	Academia	Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfre- und, Josh Tenenbaum, and Boris Katz	2019	Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models	https://papers.nips.cc/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf		2.39E+03	Highly cited	3.80E+07	Internal data	5.00E+04				108							Academia
62	Hide and Seek	2019-09-17	3.0E+17	0.80			Games	Hide and Seek	OpenAI	Industry	B Baker, I Kanitscheider, T Markov, Y Wu	2019	Emergent Tool Use From Multi-Agent Autocurricula	https://openai.com/blog/emergent-tool-use/		2.24E+02		1.60E+06		3.17E+10											Industry
63	Megatron-LM (Original, 8.3B)	2019-09-17	9.1E+21	24117.93	33212.51	NVIDIA Tesla V100 PCIe	Language		NVIDIA	Industry	M Shoeybi, M Patwary, R Puri, P LeGresley	2019	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	https://arxiv.org/abs/1909.08053		2.46E+02		8.30E+09		3.48E+10		1.80E+13									Industry
64	Megatron-BERT	2019-09-17	5.7E+22	151068.35	208034.39	NVIDIA Tesla V100 PCIe	Language		NVIDIA	Industry	M Shoeybi, M Patwary, R Puri, P LeGresley	2019	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	https://arxiv.org/abs/1909.08053		2.46E+02		3.90E+09		3.48E+10											Industry
65	AlphaX-1	2019-10-02	7.6E+18	19.91	24.10	NVIDIA GeForce GTX 1080 Ti	Vision	Neural architecture search for computer vision	Brown and Facebook AI Research	Industry - Academia Collaboration (Academia leaning)	Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian, Rodrigo Fonseca1	2019	AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search	https://arxiv.org/abs/1903.11059		5.00E+01		5.79E+08	ImageNet												Industry
66	Rubik's cube	2019-10-15	8.5E+20	2204.62	3102.27	NVIDIA Tesla V100 PCIe	Robotics		Open AI	Industry	Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang	2019	Solving Rubik’s Cube with a Robot Hand	https://arxiv.org/abs/1910.07113		2.27E+02		2.78E+07		6.24E+07											Industry
67	T5-3B	2019-10-23	1.0E+22	25777.12		Google TPU V3	Language	Text autocompletion	Google	Industry	Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu	2019	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	https://arxiv.org/abs/1910.10683		1.54E+03	Highly cited	3.00E+09	Colossal Clean Crawled Corpus (C4)	1.50E+11										Transformer (encoder-decoder performed best)	Industry
68	T5-11B	2019-10-23	4.1E+22	105686.20		Google TPU V3	Language	Text autocompletion	Google	Industry	Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu	2019	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	https://arxiv.org/abs/1910.10683		1.54E+03	Highly cited	1.10E+10	Colossal Clean Crawled Corpus (C4)	1.50E+11										Transformer (encoder-decoder performed best)	Industry
69	AlphaStar	2019-10-30	2.0E+23	512765.27		Google TPU V3	Games	StarCraft	DeepMind	Industry	Oriol Vinyals,Igor Babuschkin,Wojciech M. Czarnecki,Michaël Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander S. Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom L. Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver	2019	Grandmaster level in StarCraft II using multi-agent reinforcement learning	https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning		1.04E+03	Highly cited	1.39E+08													Industry
70	MuZero	2019-11-19	4.8E+19	121.18		Google TPU V3	Games	Atari Games	DeepMind	Industry	J Schrittwieser, I Antonoglou, T Hubert, K Simonyan	2019	Mastering Atari Go Chess and Shogi by Planning with a Learned Model	https://arxiv.org/abs/1911.08265v2		4.12E+02	SOTA improvement	3.69E+07		2.00E+10											Industry
71	OpenAI Five Rerun	2019-12-13	1.3E+22	32217.13			Games	Dota 2	OpenAI	Industry	Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung,Przemysław “Psyho" Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang	2019	Dota 2 with Large Scale Deep Reinforcement Learning	https://cdn.openai.com/dota-2.pdf		3.49E+02	SOTA improvement	1.59E+08		5.31E+10											Industry
72	OpenAI Five	2019-12-13	6.7E+22	166042.11			Games	Dota 2	OpenAI	Industry	J Raiman, S Zhang, F Wolski	2019	Dota 2 with Large Scale Deep Reinforcement Learning	https://arxiv.org/abs/1912.06680		4.54E+02	SOTA improvement	1.59E+08		4.54E+11											Industry
73	DLRM-2021	2020-07-01	3.0E+20	636.66	1094.92	NVIDIA Tesla V100 PCIe	Recommendation		Facebook AI	Industry	D Mudigere, Y Hao, J Huang, A Tulloch	2020	High- performance, Distributed Training of Large scale Deep Learning Recommendation Models	https://www.arxiv-vanity.com/papers/2104.05158/		2.00E+00		1.00E+12													Industry
74	AlphaFold	2020-01-15	1.0E+20	241.59			Other	Protein folding prediction	DeepMind	Industry	Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu & Demis Hassabis	2020	Improved protein structure prediction using potentials from deep learning	https://www.nature.com/articles/s41586-019-1923-7		8.40E+02		6.90E+07											Score		Industry
75	Meena	2020-01-28	1.1E+23	263099.94		Google TPU V3	Language	Text autocompletion	Google AI	Industry	Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang	2020	Towards a Human-like Open-Domain Chatbot	https://arxiv.org/abs/2001.09977		2.57E+02		2.60E+09		4.00E+10										Evolved Transformer seq2seq model	Industry
76	ALBERT-xxlarge	2020-02-09	2.5E+21	5924.43		Google TPU V3	Language		Google research, Toyota Technological Institute at Chicago	Industry - Academia Collaboration	Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut	2020	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.	https://arxiv.org/abs/1909.11942		2.18E+03	Highly cited	2.35E+08		3.30E+09		2.50E+12		17408							Industry
77	Turing NLG	2020-02-13	1.6E+22	37799.51	58395.62	NVIDIA Tesla V100 PCIe	Language	Text autocompletion	Microsoft	Industry	C Rosset	2020	Turing-NLG: A 17-billion-parameter language model by Microsoft	https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/		3.40E+01		1.70E+10		3.48E+10		3.60E+13							Next token prediction		Industry
78	ProGen	2020-03-13	2.7E+20	623.75		Google TPU V3	Other	Protein generation	Salesforce research, Stanford	Industry - Academia Collaboration	A Madani, B McCann, N Naik, NS Keskar	2020	ProGen: Language Modeling for Protein Generation	https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2	8.60E+04	4.60E+01		1.20E+09													Industry
79	GPT-3 175B (davinci)	2020-04-28	3.1E+23	691184.67	1131415.12	NVIDIA Tesla V100 PCIe	Language	Text autocompletion	OpenAI	Industry	Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei	2020	Language models are Few- Shot Learners	https://arxiv.org/abs/2005.14165		1.53E+03	Highly cited	1.75E+11	CommonCrawl; WebText2; Books1; Books2; Wikipedia	3.74E+11		7.40E+14				45TB					Industry
80	Once for All	2020-04-29	1.8E+21	4010.23	6569.51	NVIDIA Tesla V100 PCIe	Vision		MIT-IBM Watson AI Lab	Industry	Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han	2020	Once for all: Train one network and specialize it for efficient deployment.	https://arxiv.org/abs/1908.09791	1.20E+03	3.71E+02		7.70E+06	Imagenet												Industry
81	iGPT-L	2020-06-17	8.9E+21	19092.67	32482.56	NVIDIA Tesla V100 PCIe	Drawing	Image completion	Open AI	Industry	Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever	2020	Generative Pretraining from Pixels	https://openai.com/blog/image-gpt/	6.00E+04	1.82E+02		1.36E+09	ILSVRC 2012	9.60E+06											Industry
82	iGPT-XL	2020-06-17	3.3E+22	70793.03	120440.96	NVIDIA Tesla V100 PCIe	Drawing	Image completion	Open AI	Industry	Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever	2020	Generative Pretraining from Pixels	https://openai.com/blog/image-gpt/		1.82E+02		6.80E+09	ILSVRC 2012	9.60E+06											Industry
83	GShard (600B)	2020-06-30	1.3E+22	27609.81		Google TPU V3	Language	Translation	Google Brain	Industry	Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen	2020	GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding	https://arxiv.org/abs/2006.16668	1.93E+05	9.10E+01		6.00E+11		2.60E+11											Industry
84	GShard (dense)	2020-06-30	2.6E+22	55219.61		Google TPU V3	Language	Translation	Google Brain	Industry	Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen	2020	GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding	https://arxiv.org/abs/2006.16668	2.06E+06	9.10E+01		2.30E+09		2.60E+11											Industry
85	ViT-H/14	2020-09-28	1.3E+22	25757.45		Google TPU V3	Vision	Image representation	Google Research, Brain Team	Industry	Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby	2020	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	https://openreview.net/forum?id=YicbFdNTTy	6.00E+04	1.91E+03	Highly cited		Imagenet-1k	1.28E+06											Industry
86	wave2vec 2.0 LARGE	2020-10-22	4.3E+20	836.34	1569.38	NVIDIA Tesla V100 PCIe	Speech	Speech completion	Facebook	Industry	Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli	2020	wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	https://arxiv.org/abs/2006.11477	2.30E+04	4.10E+02	SOTA Improvement	3.17E+08	LibriSpeech	4.37E+10											Industry
87	KEPLER	2020-11-23	1.2E+20	227.71	437.97	NVIDIA Tesla V100 PCIe	Language	Relation Extraction	Tsinghua University, Princeton, Mila- Quebec AI, University de Montreal, HEC, CIFAR	Academia	Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang.	2020	KEPLER: A Unified Model for Knowledge Embedding and Pre- trained Language Representation.	https://arxiv.org/abs/1911.06136		9.60E+01		1.10E+08	Wikipedia+BookCorpus	3.30E+09											Academia
88	CPM-Large	2020-12-01	1.8E+21	3394.57	6569.51	NVIDIA Tesla V100 PCIe	Language		Tsinghua University, BAAI	Industry - Academia Collaboration	Z Zhang, X Han, H Zhou, P Ke, Y Gu, D Ye, Y Qin, Y Su	2020	CPM: A Large-scale Generative Chinese Pre-trained Language Model	https://arxiv.org/abs/2012.00413	2.15E+04	1.00E+01		2.60E+09		1.67E+10										Left-To-Right Transformer Decoder	Industry
89	AraGPT2-Mega	2020-12-31	2.0E+21	3685.43		Google TPU V3	Language		American University of Beirut	Academia	W Antoun, F Baly, H Hajj	2020	AraGPT2: Pre-Trained Transformer for Arabic Language Generation	https://arxiv.org/abs/2012.15520		4.00E+00		1.50E+09		8.80E+09											Academia
90	NEO (DL:RM-2022)	2021-09-15	1.1E+21	1661.08	2394.07	NVIDIA A100	Recommendation		Facebook	Industry	D Mudigere, Y Hao, J Huang, A Tulloch	2021	Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models	https://arxiv.org/abs/2104.05158		2.00E+00		3.00E+12													Industry
91	Switch	2021-01-11	8.2E+22	149825.60		Google TPU V3	Language	Text autocompletion	Google Brain	Industry	William Fedus, Barret Zoph, Noam Shazeer	2021	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	https://arxiv.org/abs/2101.03961		8.00E+01		1.60E+12		4.32E+11										Switch Transformer	Industry
92	CLIP (ViT L/14@336px)	2021-01-05	1.1E+22	20191.82	40146.99	NVIDIA Tesla V100 PCIe	Multimodal	Zero-shot image classification	Open AI	Industry	Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever	2021	Learning Transferable Visual Models From Natural Language Supervision	https://arxiv.org/abs/2103.00020	7.37E+04	1.30E+02		3.70E+08	Custom image-text pairs from the internet	4.00E+08		1.10E+08		86016							Industry
93	DALL-E	2021-01-05	4.7E+22	86274.16	171537.13	NVIDIA Tesla V100 PCIe	Drawing	Text-to-image	OpenAI	Industry	Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever	2021	Zero-Shot Text-to-Image Generation	https://openai.com/blog/dall-e/		8.00E+01		1.20E+10		2.50E+08											Industry
94	Meta Pseudo Labels	2021-03-01	2.1E+23	369462.82			Vision	Image Classification	Google AI, Brain team	Industry	Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, and Quoc V. Le	2021	Meta pseudo labels	https://arxiv.org/abs/2003.10580		1.31E+02	SOTA Improvement	4.80E+08	ImageNet	1.30E+08											Industry
95	GPT-Neo	2021-03-21	7.9E+21	13685.99			Language		EleutherAI	Research collective		2021	GPT-Neo	https://www.eleuther.ai/projects/gpt-neo/				2.70E+09	The Pile	8.86E+11											Industry
96	PanGu-α	2021-04-25	5.8E+22	97802.06		Ascend 910	Language		PanGu-α team	Industry	Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi LiaoZhiwei WangXin JiangZhenzhang YangKaisheng WangXiaoda ZhangChen LiZiyan GongYifan YaoXinjing HuangJun WangJianfeng YuQi GuoYue YuYan ZhangJin WangHengtao TaoDasen YanZexuan YiFang PengFangqing JiangHan ZhangLingfeng DengYehong ZhangZhe LinChao ZhangShaojie ZhangMingyue GuoShanzhi GuGaojun FanYaowei WangXuefeng JinQun LiuYonghong Tian	2021	PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation	https://arxiv.org/abs/2104.12369		5.00E+00		2.07E+11	Custom dataset	2.00E+11										unidirectional transformer decoder	Industry
97	GPT-J-6B	2021-05-01	1.5E+22	25176.80			Language			Research collective	Aran Komatsuzaki	2021	GPT-J-6B: 6B JAX-Based Transformer	https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/				6.05E+09		1.60E+11											Industry
98	HyperClova	2021-05-25	6.3E+22	103802.31			Language		Naver Corp	Industry		2021	Hyperclova	https://www.navercorp.com/promotion/pressReleasesView/30546				2.04E+11		5.60E+11											Industry
99	ProtT5-XXL	2021-05-04	7.4E+22	123918.36		Google TPU V3	Other	Proteins	Technical University of Munich, Med AI Technology, Google AI, NVIDIA, Oak Ridge National Laboratory	Industry - Academia Collaboration	A Elnaggar, M Heinzinger, C Dallago, G Rihawi	2021	ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning	https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3		5.70E+01		1.10E+10	UniRef; BDF	3.93E+11											Industry
100	ERNIE 3.0	2021-07-05	2.4E+18	3.83			Language		Baidu Inc.	Industry	Y Sun, S Wang, S Feng, S Ding, C Pang	2021	ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation	http://research.baidu.com/Blog/index-view?id=160		1.00E+00		1.00E+10		6.68E+11										Transformer-XL: Transformer with auxilary recurrence memory module	Industry