ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAE
1
System
Publication date
Training compute (FLOPs)
Method 1 Training cost (2020 USD)
Method 2 Training cost (2020 USD)
Hardware modelDomainTask
Organization(s)
Organization Categorization
Author(s)YearReferenceLink
Training core-hours
Citations
Inclusion criteria
Parameters
Training dataset
Training dataset size (datapoints)
Hidden layers
Inference compute (FLOPs)
Training time (hours)
Equivalent training time (hours)
Inference time (ms)
Training dataset size (GB)
Approach
Dense or sparse model
Training objective
Architecture
Compute Sponsor Categorization
2
GPU DBNs2009-06-151.0E+150.050.06NVIDIA GeForce GTX 280OtherStanfordAcademia
R Raina, A Madhavan, AY Ng
2009
Large-scale Deep Unsupervised Learning using Graphics Processors
http://www.machinelearning.org/archive/icml2009/papers/218.pdf7.89E+021.00E+081.00E+06Academia
3
6-layer MLP (MNIST)2010-03-011.3E+140.010.01NVIDIA GeForce GTX 280Vision
Character recognition
IDSIA ; University of Lugano & SUPSI
Academia
Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, Juergen Schmidhuber
2010
Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
https://arxiv.org/abs/1003.03581.26E+03Highly cited1.21E+07MNIST6.00E+04Academia
4
Feedforward NN2010-05-133.5E+140.01VisionDigit recognition
University of Montreal
Academia
X Glorot, Y Bengio
2010
Understanding the difficulty of training deep feedforward neural networks
https://proceedings.mlr.press/v9/glorot10a.html1.33E+04Highly cited7.08E+06MNIST1.40E+07Academia
5
RNN 500/10 + RT09 LM (NIST RT05)
2010-09-263.4E+150.11SpeechTranscription
Brno University of Technology, Johns Hopkins University
Academia
T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur
2010
Recurrent neural network based language model.
https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model5.67E+03Highly cited5.27E+06NIST RT055.40E+061.05E+07Academia
6
KN5 LM + RNN 400/10 (WSJ)
2010-09-266.1E+162.03SpeechTranscription
Brno University of Technology, Johns Hopkins University
Academia
T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur
2010
Recurrent neural network based language model.
https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model5.67E+03Highly cited8.00E+07WSJ6.40E+061.60E+08Academia
7
MCDNN (MNIST)2012-02-133.7E+150.08Vision
Character recognition
IDSIAAcademia
D Ciregan, U Meier, J Schmidhuber
2012
Multi-column Deep Neural Networks for Image Classification
https://arxiv.org/abs/1202.2745v14.83E+03Highly cited1.99E+06MNIST6.00E+0432.59E+074900.021Academia
8
Dropout (MNIST)2012-06-036.0E+150.120.10NVIDIA GeForce GTX 580Vision
Character recognition
University of Toronto
Academia
GE Hinton, N Srivastava, A Krizhevsky
2012
Improving neural networks by preventing co-adaptation of feature detectors
https://arxiv.org/abs/1207.05806.68E+03Highly cited5.59E+06MNIST6.00E+0421.12E+070.021Academia
9
AlexNet2012-09-304.7E+178.868.00NVIDIA GeForce GTX 580Vision
Image classification
University of Toronto
Academia
Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
2012
ImageNet Classification with Deep Convolutional Neural Networks
https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html8.51E+04Highly cited6.00E+07ImageNet1.20E+068Academia
10
DQN2013-01-012.3E+150.04GamesAtariDeepMindIndustry
V Mnih, K Kavukcuoglu, D Silver, A Graves
2013
Playing Atari with Deep Reinforcement Learning
https://arxiv.org/abs/1312.56026.68E+03Highly cited8.36E+05Industry
11
Mitosis2013-09-221.4E+172.00VisionIDSIAAcademia
Dan C. Cireşan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber
2013
Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks
https://link.springer.com/chapter/10.1007/978-3-642-40763-5_511.46E+03
ICPR 2012 mitosis detection competition winner
3.72E+041.00E+06Academia
12
Word2Vec (large)2013-10-163.9E+160.55Language
Semantic embedding
GoogleIndustry
T Mikolov, I Sutskever, K Chen, GS Corrado
2013
Distributed Representations of Words and Phrases and their Compositionality
https://arxiv.org/abs/1310.45462.87E+04Highly cited6.92E+096.92E+05
Predict nearby words
Recurrent Neural Network
Industry
13
Visualizing CNNs2013-11-125.3E+177.309.02NVIDIA GeForce GTX 580VisionNYUAcademia
MD Zeiler, R Fergus
2013
Visualizing and Understanding Convolutional Networks
https://arxiv.org/abs/1311.29011.30E+04Highly cited
Predict nearby words
Academia
14
TransE2013-12-051.3E+1817.58Other
Entity embedding
CNRS, Google
Industry - Academia Collaboration
Antoine Bordes, Nicolas Usunier, Alberto Garcia- Duran, Jason Weston, and Oksana Yakhnenko
2013
Translating Embeddings for Modeling Multi- relational Data
https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html4.00E+031.70E+07Industry
15
Image generation2013-12-204.8E+140.01Intel XeonVisionImage clustering
Univeristy of Amsterdam
Academia
DP Kingma, M Welling
2013
Auto-Encoding Variational Bayes
https://arxiv.org/abs/1312.61141.56E+04Highly citedMNIST6.00E+04Academia
16
Image Classification with the Fisher Vector: Theory and Practice
2013-06-129.1E+130.00Intel Xeon E5-2470Vision
Image Classifcation
Universidad Nacional de Cordoba, Xerox Research Centre Europe, Inteligent Systems Lab Amsterdam, University of Amsterdam, LEAR Team, INRIA Grenoble
Industry - Academia Collaboration
orge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek
2013
Image Classification with the Fisher Vector: Theory and Practice
https://hal.inria.fr/hal-00830491v2/document1707Highly citedImageNet2
17
GANs2014-06-105.2E+176.09Drawing
Image generation
Universite de Montréal
Academia
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
2014
Generative Adversarial Networks
https://arxiv.org/abs/1406.26613.69E+04Highly citedCIFAR-106.00E+04Academia
18
SPPNet2014-06-186.1E+1870.9765.07NVIDIA GeForce GTX TITANVision
Image classification
Microsoft, Xi’an Jiaotong University, University of Science and Technology of China
Industry - Academia Collaboration
2014
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
https://arxiv.org/abs/1406.47297.41E+03Highly citedImagenet-1k1.28E+06Industry
19
RNNsearch-50*2014-09-011.6E+1817.5781.48NVIDIA Quadro K6000LanguageTranslation
Universite de Montréal, Jacobs University Bremen
Academia
D Bahdanau, K Cho, Y Bengio
2014
Neural Machine Translation by Jointly Learning to Align and Translate
https://arxiv.org/abs/1409.04731.92E+04Highly cited
WMT'14 + selection
3.84E+08Academia
20
VGG162014-09-048.5E+1893.1182.80NVIDIA GeForce GTX TITAN BlackVision
University of Oxford
Academia
Karen Simonyan; Andrew Zisserman
2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
https://arxiv.org/abs/1409.15566.13E+04Highly cited1.38E+08ILSVRC-20121.30E+06161.53E+10Academia
21
Seq2Seq LSTM2014-09-107.3E+1879.60LanguageTranslationGoogleIndustry
I Sutskever, O Vinyals, QV Le
2014
Sequence to Sequence Learning with Neural Networks
https://arxiv.org/abs/1409.32151.57E+04Highly cited3.84E+08WMT'14 dataset3.84E+08Industry
22
ADAM (CIFAR-10)2014-12-226.0E+160.60Vision
Image classification
University of Amsterdam, OpenAI, University of Toronto
Industry - Academia Collaboration
DP Kingma, J Ba2014
Adam: A Method for Stochastic Optimization
https://arxiv.org/abs/1412.69808.11E+04Highly citedIndustry
23
MSRA (C, PReLU)2015-01-092.4E+19238.362166.22NVIDIA Tesla K40Vision
Image classification
Microsoft research
Industry
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
2015
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
https://arxiv.org/abs/1406.47291.41E+04Highly cited8.70E+07Imagenet-1k1.28E+06Industry
24
GoogLeNet / InceptionV1
2015-06-071.6E+1814.16Vision
Image classification
Google, University of Michigan, University of North Carolina
Industry - Academia Collaboration (Industry leaning)
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
2015
Going deeper with convolutions
https://arxiv.org/abs/1409.48423.28E+04Highly cited6.80E+06ILSVRC 20141.20E+0622Industry
25
AlphaGo Fan2015-10-013.8E+203076.07GamesGo
Google DeepMind
Industry
D Silver, A Huang, CJ Maddison, A Guez, L Sifre
2015
Mastering the game of Go with deep neural networks and tree search
https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ5.18E+03
SOTA improvement
8.21E+06Industry
26
DeepSpeech22015-12-082.6E+19199.71150.78NVIDIA GeForce GTX TITAN XSpeech
Speech recognition
Baidu Research- Silicon Valley AI Lab
Industry
D Amodei, S Ananthanarayanan
2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
https://arxiv.org/abs/1512.025952.21E+03Highly cited3.80E+079.80E+09111.80E+09Industry
27
ResNet-152 (ImageNet)2015-12-101.2E+1992.03Vision
Image classification
MicrosoftIndustry
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
2015
Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1512.033858.58E+04Highly cited6.00E+07ILSVRC 20121.20E+061522.26E+10138Industry
28
AlphaGo Lee2016-01-271.9E+2114041.80Google TPU V1GamesGoDeepMindIndustry
D Silver, A Huang, CJ Maddison, A Guez, L Sifre
2016
Mastering the game of Go with deep neural networks and tree search
https://www.nature.com/articles/nature169611.08E+04Highly cited2.94E+07Industry
29
R-FCN2016-06-216.1E+160.405.51NVIDIA Tesla K40VisionObject detection
Microsoft research, Tsinghua university
Industry - Academia Collaboration (Industry leaning)
Jifeng Dai, Y. Li, Kaiming He, and Jian Sun
2016
R-fcn: Object detection via region-based fully convolutional networks.
https://arxiv.org/abs/1605.064094.49E+03
PASCAL VOC (2007 and 2012 vesrions) + MS COCO
9.44E+0412.06567222170Industry
30
Part-of-sentence tagging model
2016-07-211.5E+170.97LanguagePOS tagging
University of Toronto
Academia
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton
2016
Layer Normalization.
https://arxiv.org/abs/1607.064504.13E+03Highly cited12Academia
31
Named Entity Recognition model
2016-07-219.7E+160.63Language
Named Entity Recognition model
University of Toronto
Academia
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton
2016
Layer Normalization.
https://arxiv.org/abs/1607.064504.13E+03Highly cited8Academia
32
GNMT2016-09-266.9E+2142275.13307573.50NVIDIA Tesla K80LanguageTranslationGoogleIndustry
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
https://research.google/pubs/pub45610/4.50E+03Highly cited2.78E+083.60E+08Industry
33
Xception2016-10-074.4E+19267.301961.34NVIDIA Tesla K80Vision
Image classification
GoogleIndustryFrançois Chollet2016
Xception: Deep Learning with Depthwise Separable Convolutions
https://arxiv.org/abs/1610.023575.84E+03Highly cited2.29E+07JFT3.50E+081.68E+10Industry
34
NASv3 (CIFAR-10)2016-11-052.2E+2113069.35VisionGoogle BrainIndustry
Barret Zoph, Quoc V. Le
2016
Neural Architecture Search with Reinforcement Learning
https://arxiv.org/abs/1611.015782.97E+03Highly cited3.74E+0739Industry
35
Libratus2017-01-011.1E+216253.49Intel Xeon E5-2695 v3GamesPoker
Carnegie Mellon University
Academia
N Brown, T Sandholm, S Machine
2017
Libratus: The Superhuman AI for No-Limit Poker
https://www.cs.cmu.edu/~noamb/papers/17-IJCAI-Libratus.pdf6.40E+01
SOTA improvement
3000000Academia
36
AlphaGo Master2017-01-011.5E+23852748.08Google TPU V1GamesGoDeepMindIndustry
D Silver, J Schrittwieser, K Simonyan, I Antonoglou
2017
Mastering the game of Go without human knowledge
https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge5.81E+03Highly citedIndustry
37
DeepStack2017-01-061.5E+140.00GamesPoker
University of Alberta, Charles University, Czech Technical University
Academia
Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling
2017
DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker
https://arxiv.org/abs/1701.017246.18E+022.50E+061.00E+07Academia
38
MoE2017-01-239.4E+19525.398484.35NVIDIA Tesla K40Language
Language modelling / Machine translation
Google Brain, Jagiellonian University, Cracow
Industry - Academia Collaboration (Industry leaning)
N Shazeer, A Mirhoseini, K Maziarz, A Davis
2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
https://arxiv.org/abs/1701.065386.87E+028.70E+091.00E+11Sparse
Long Short-Term Memory Mixture-Of-Experts
Industry
39
Transformer2017-06-127.4E+1837.13111.17NVIDIA Tesla P100LanguageTranslation
Google Brain ; Google Research
Industry
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
2017
Attention Is All You Need
https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf2.52E+04Highly cited2.13E+083.60E+085.40E+10672Industry
40
JFT2017-08-044.8E+202311.6421396.42NVIDIA Tesla K80Vision
Google Research, CMU
Industry - Academia Collaboration
ChenSun,AbhinavShrivastava,SaurabhSingh,andAbhinavGupta
2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era.
https://arxiv.org/abs/1707.029681.14E+03Highly citedJFT-300M3.00E+08Industry
41
OpenAI TI7 DOTA 1v12017-08-116.0E+202873.99GamesDOTAOpenAIIndustry
A Radford, K Narasimhan, T Salimans, I Sutskever
2017Dota 2 https://openai.com/five/NA1.50E+08Industry
42
AlphaGo Zero2017-10-193.4E+231544149.42Google TPU V1GamesGoDeepMindIndustry
D Silver, J Schrittwieser, K Simonyan, I Antonoglou
2017
Mastering the game of Go without human knowledge
https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge5.81E+03Highly cited4.64E+075.80E+09Industry
43
PNASNet-52017-12-026.6E+19289.74991.48NVIDIA Tesla P100
Johns Hopkins University, Stanford, Google AI
Industry - Academia Collaboration (Industry leaning)
C Liu, B Zoph, M Neumann, J Shlens
2017
Progressive Neural Architecture Search
https://arxiv.org/abs/1712.005591.34E+03Highly citedImagenet-1k1.28E+06Industry
44
AlphaZero2017-12-053.7E+22162054.70Google TPU V2GamesDeepMindIndustry
D Silver, T Hubert, J Schrittwieser, I Antonoglou
2017
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
https://arxiv.org/abs/1712.018151.08E+03Highly cited7.00E+05ScoreIndustry
45
IMPALA2018-02-051.7E+20709.792553.82NVIDIA Tesla P100GamesAtariDeepMindIndustry
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
https://arxiv.org/abs/1802.015616.75E+021.60E+062.40E+11Industry
46
AmoebaNet-A (F=448)2018-02-053.9E+201628.355858.75NVIDIA Tesla P100Vision
Image classification
Google BrainIndustry
Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le
2018
Regularized Evolution for Image Classifier Architecture Search
https://arxiv.org/abs/1802.015481.71E+03Highly cited4.69E+08Imagenet-1k1.28E+06Industry
47
YOLOv32018-04-085.1E+19202.99295.76NVIDIA GeForce GTX TITAN XVisionObject detection
University of Washington
Academia
Joseph Redmon, Ali Farhadi
2018
YOLOv3: An Incremental Improvement
https://arxiv.org/abs/1804.027677.71E+03Highly cited1.06E+08ImageNet1.28E+067.10E+10Academia
48
GPT2018-06-011.8E+1968.72NVIDIA Quadro P600LanguageOpenAIIndustry
A Radford, K Narasimhan, T Salimans, I Sutskever
2018
Improving Language Understanding by Generative Pre-Training
https://openai.com/blog/language-unsupervised/2.26E+03Highly cited1.17E+08BooksCorpus1.00E+093.00E+10Industry
49
Population-based DRL2018-07-033.5E+19130.36GamesCapture the flagDeepMindIndustry
Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel
2018
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
https://arxiv.org/abs/1807.012814.34E+021.22E+086.00E+10Industry
50
BigGAN-deep 512x5122018-09-283.0E+2110448.44Drawing
Image generation
Heriot-Watt University, DeepMind
Industry - Academia Collaboration
A Brock, J Donahue, K Simonyan
2018
Large Scale GAN Training for High Fidelity Natural Image Synthesis
https://arxiv.org/abs/1809.110961.98E+03Highly cited1.13E+08JFT-300M2.92E+08Industry
51
BERT-Large2018-10-112.9E+20999.93Google TPU V2Language
Next sentence prediction
Google AIIndustry
J Devlin, MW Chang, K Lee, K Toutanova
2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
https://arxiv.org/abs/1810.048052.38E+04Highly cited3.40E+083.30E+097.90E+10Industry
52
Decoupled weight decay regularization
2019-01-042.5E+188.07Vision
Image classification
University of Freiburg
Academia
Ilya Loshchilov and Frank Hutter
2019
Decoupled weight decay regularization.
https://arxiv.org/abs/1711.051012.06E+033.65E+07CIFAR-105.00E+041.73E+09Academia
53
Hanabi 4 player2019-02-019.2E+160.290.34NVIDIA Tesla V100 PCIeGamesHanabi
DeepMind, University of Oxford, Google Brain, Carnegie Mellon University,
Industry - Academia Collaboration (Industry leaning)
2019
The Hanabi Challenge: A New Frontier for AI Research
https://arxiv.org/abs/1902.005061.15E+027.64E+05Industry
54
GPT-22019-02-141.5E+214692.89LanguageOpenAIIndustry
A Radford, J Wu, R Child, D Luan, D Amodei
2019
Language Models are Unsupervised Multitask Learners
https://openai.com/blog/better-language-models/1.70E+03Highly cited1.50E+093.00E+093.40E+1240Industry
55
ProxylessNAS2019-02-233.7E+19114.96135.04NVIDIA Tesla V100 PCIeVisionMITAcademia
Han Cai, Ligeng Zhu, and Song Han
2019
ProxylessNAS: Direct neural architecture search on target task and hardware
https://arxiv.org/abs/1812.003329.96E+02ImageNet1.28E+062.63E+112005.1Academia
56
Cross-lingual alignment2019-04-042.6E+187.83
Tel Aviv University, MIT
Academia
Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson.
2019
Cross-lingual alignment of contextual word embeddings, with applications to zero- shot dependency parsing.
https://arxiv.org/abs/1902.094921.29E+023.66E+12Academia
57
MnasNet-A1 + SSDLite2019-05-291.5E+214331.00Vision
Performing image classification and object detection on mobile devices
Google Industry
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
2019
MnasNet: Platform-Aware Neural Architecture Search for Mobile
https://arxiv.org/abs/1807.116261.43E+03Highly cited4.90E+06MS COCO1.18E+05Industry
58
MnasNet-A32019-05-291.5E+214331.00Vision
Performing image classification and object detection on mobile devices
Google Industry
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
2019
MnasNet: Platform-Aware Neural Architecture Search for Mobile
https://arxiv.org/abs/1807.116261.43E+03Highly cited5.20E+06ImageNet1.28E+06Industry
59
DLRM-20202019-05-314.0E+1811.5314.60NVIDIA Tesla V100 PCIe
Recommendation
Facebook AIIndustry
M Naumov, D Mudigere, HJM Shi, J Huang
2019
Deep Learning Recommendation Model for Personalization and Recommendation Systems
https://arxiv.org/abs/1906.000911.40E+021.00E+11Industry
60
FTW2019-05-317.3E+2121045.02GamesCapture the flagDeepMindIndustry
M Jaderberg, WM Czarnecki, I Dunning, L Marris
2019
Human-level performance in 3D multiplayer games with population-based reinforcement learning
https://deepmind.com/research/publications/capture-the-flag4.25E+021.26E+081.21E+12Industry
61
ObjectNet2019-09-061.9E+1950.79Vision
Object recognition
MITAcademia
Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfre- und, Josh Tenenbaum, and Boris Katz
2019
Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
https://papers.nips.cc/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf2.39E+03Highly cited3.80E+07Internal data5.00E+04108Academia
62
Hide and Seek2019-09-173.0E+170.80GamesHide and SeekOpenAIIndustry
B Baker, I Kanitscheider, T Markov, Y Wu
2019
Emergent Tool Use From Multi-Agent Autocurricula
https://openai.com/blog/emergent-tool-use/2.24E+021.60E+063.17E+10Industry
63
Megatron-LM (Original, 8.3B)
2019-09-179.1E+2124117.9333212.51NVIDIA Tesla V100 PCIeLanguageNVIDIAIndustry
M Shoeybi, M Patwary, R Puri, P LeGresley
2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
https://arxiv.org/abs/1909.080532.46E+028.30E+093.48E+101.80E+13Industry
64
Megatron-BERT2019-09-175.7E+22151068.35208034.39NVIDIA Tesla V100 PCIeLanguageNVIDIAIndustry
M Shoeybi, M Patwary, R Puri, P LeGresley
2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
https://arxiv.org/abs/1909.080532.46E+023.90E+093.48E+10Industry
65
AlphaX-12019-10-027.6E+1819.9124.10NVIDIA GeForce GTX 1080 TiVision
Neural architecture search for computer vision
Brown and Facebook AI Research
Industry - Academia Collaboration (Academia leaning)
Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian, Rodrigo Fonseca1
2019
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search
https://arxiv.org/abs/1903.110595.00E+015.79E+08ImageNetIndustry
66
Rubik's cube2019-10-158.5E+202204.623102.27NVIDIA Tesla V100 PCIeRoboticsOpen AIIndustry
Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang
2019
Solving Rubik’s Cube with a Robot Hand
https://arxiv.org/abs/1910.071132.27E+022.78E+076.24E+07Industry
67
T5-3B2019-10-231.0E+2225777.12Google TPU V3Language
Text autocompletion
GoogleIndustry
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://arxiv.org/abs/1910.106831.54E+03Highly cited3.00E+09
Colossal Clean Crawled Corpus (C4)
1.50E+11
Transformer (encoder-decoder performed best)
Industry
68
T5-11B2019-10-234.1E+22105686.20Google TPU V3Language
Text autocompletion
GoogleIndustry
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://arxiv.org/abs/1910.106831.54E+03Highly cited1.10E+10
Colossal Clean Crawled Corpus (C4)
1.50E+11
Transformer (encoder-decoder performed best)
Industry
69
AlphaStar2019-10-302.0E+23512765.27Google TPU V3GamesStarCraftDeepMindIndustry
Oriol Vinyals,Igor Babuschkin,Wojciech M. Czarnecki,Michaël Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander S. Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom L. Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver
2019
Grandmaster level in StarCraft II using multi-agent reinforcement learning
https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning1.04E+03Highly cited1.39E+08Industry
70
MuZero2019-11-194.8E+19121.18Google TPU V3GamesAtari GamesDeepMindIndustry
J Schrittwieser, I Antonoglou, T Hubert, K Simonyan
2019
Mastering Atari Go Chess and Shogi by Planning with a Learned Model
https://arxiv.org/abs/1911.08265v24.12E+02
SOTA improvement
3.69E+072.00E+10Industry
71
OpenAI Five Rerun2019-12-131.3E+2232217.13GamesDota 2OpenAIIndustry
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung,Przemysław “Psyho" Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang
2019
Dota 2 with Large Scale Deep Reinforcement Learning
https://cdn.openai.com/dota-2.pdf3.49E+02
SOTA improvement
1.59E+085.31E+10Industry
72
OpenAI Five2019-12-136.7E+22166042.11GamesDota 2OpenAIIndustry
J Raiman, S Zhang, F Wolski
2019
Dota 2 with Large Scale Deep Reinforcement Learning
https://arxiv.org/abs/1912.066804.54E+02
SOTA improvement
1.59E+084.54E+11Industry
73
DLRM-20212020-07-013.0E+20636.661094.92NVIDIA Tesla V100 PCIe
Recommendation
Facebook AI Industry
D Mudigere, Y Hao, J Huang, A Tulloch
2020
High- performance, Distributed Training of Large scale Deep Learning Recommendation Models
https://www.arxiv-vanity.com/papers/2104.05158/2.00E+001.00E+12Industry
74
AlphaFold2020-01-151.0E+20241.59Other
Protein folding prediction
DeepMindIndustry
Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu & Demis Hassabis
2020
Improved protein structure prediction using potentials from deep learning
https://www.nature.com/articles/s41586-019-1923-78.40E+026.90E+07ScoreIndustry
75
Meena2020-01-281.1E+23263099.94Google TPU V3Language
Text autocompletion
Google AIIndustry
Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
2020
Towards a Human-like Open-Domain Chatbot
https://arxiv.org/abs/2001.099772.57E+022.60E+094.00E+10
Evolved Transformer seq2seq model
Industry
76
ALBERT-xxlarge2020-02-092.5E+215924.43Google TPU V3Language
Google research, Toyota Technological Institute at Chicago
Industry - Academia Collaboration
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut
2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.
https://arxiv.org/abs/1909.119422.18E+03Highly cited2.35E+083.30E+092.50E+1217408Industry
77
Turing NLG2020-02-131.6E+2237799.5158395.62NVIDIA Tesla V100 PCIeLanguage
Text autocompletion
MicrosoftIndustryC Rosset2020
Turing-NLG: A 17-billion-parameter language model by Microsoft
https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/3.40E+011.70E+103.48E+103.60E+13
Next token prediction
Industry
78
ProGen2020-03-132.7E+20623.75Google TPU V3Other
Protein generation
Salesforce research, Stanford
Industry - Academia Collaboration
A Madani, B McCann, N Naik, NS Keskar
2020
ProGen: Language Modeling for Protein Generation
https://www.biorxiv.org/content/10.1101/2020.03.07.982272v28.60E+044.60E+011.20E+09Industry
79
GPT-3 175B (davinci)2020-04-283.1E+23691184.671131415.12NVIDIA Tesla V100 PCIeLanguage
Text autocompletion
OpenAIIndustry
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
2020
Language models are Few- Shot Learners
https://arxiv.org/abs/2005.141651.53E+03Highly cited1.75E+11
CommonCrawl; WebText2; Books1; Books2; Wikipedia
3.74E+117.40E+1445TBIndustry
80
Once for All2020-04-291.8E+214010.236569.51NVIDIA Tesla V100 PCIeVision
MIT-IBM Watson AI Lab
Industry
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han
2020
Once for all: Train one network and specialize it for efficient deployment.
https://arxiv.org/abs/1908.097911.20E+033.71E+027.70E+06ImagenetIndustry
81
iGPT-L2020-06-178.9E+2119092.6732482.56NVIDIA Tesla V100 PCIeDrawing
Image completion
Open AIIndustry
Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
2020
Generative Pretraining from Pixels
https://openai.com/blog/image-gpt/6.00E+041.82E+021.36E+09ILSVRC 20129.60E+06Industry
82
iGPT-XL2020-06-173.3E+2270793.03120440.96NVIDIA Tesla V100 PCIeDrawing
Image completion
Open AIIndustry
Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
2020
Generative Pretraining from Pixels
https://openai.com/blog/image-gpt/1.82E+026.80E+09ILSVRC 20129.60E+06Industry
83
GShard (600B)2020-06-301.3E+2227609.81Google TPU V3LanguageTranslationGoogle BrainIndustry
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen
2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
https://arxiv.org/abs/2006.166681.93E+059.10E+016.00E+112.60E+11Industry
84
GShard (dense)2020-06-302.6E+2255219.61Google TPU V3LanguageTranslationGoogle BrainIndustry
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen
2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
https://arxiv.org/abs/2006.166682.06E+069.10E+012.30E+092.60E+11Industry
85
ViT-H/142020-09-281.3E+2225757.45Google TPU V3Vision
Image representation
Google Research, Brain Team
Industry
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://openreview.net/forum?id=YicbFdNTTy6.00E+041.91E+03Highly citedImagenet-1k1.28E+06Industry
86
wave2vec 2.0 LARGE2020-10-224.3E+20836.341569.38NVIDIA Tesla V100 PCIeSpeech
Speech completion
FacebookIndustry
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
https://arxiv.org/abs/2006.114772.30E+044.10E+02
SOTA Improvement
3.17E+08LibriSpeech4.37E+10Industry
87
KEPLER2020-11-231.2E+20227.71437.97NVIDIA Tesla V100 PCIeLanguage
Relation Extraction
Tsinghua University, Princeton, Mila- Quebec AI, University de Montreal, HEC, CIFAR
Academia
Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang.
2020
KEPLER: A Unified Model for Knowledge Embedding and Pre- trained Language Representation.
https://arxiv.org/abs/1911.061369.60E+011.10E+08
Wikipedia+BookCorpus
3.30E+09Academia
88
CPM-Large2020-12-011.8E+213394.576569.51NVIDIA Tesla V100 PCIeLanguage
Tsinghua University, BAAI
Industry - Academia Collaboration
Z Zhang, X Han, H Zhou, P Ke, Y Gu, D Ye, Y Qin, Y Su
2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
https://arxiv.org/abs/2012.004132.15E+041.00E+012.60E+091.67E+10
Left-To-Right Transformer Decoder
Industry
89
AraGPT2-Mega2020-12-312.0E+213685.43Google TPU V3Language
American University of Beirut
Academia
W Antoun, F Baly, H Hajj
2020
AraGPT2: Pre-Trained Transformer for Arabic Language Generation
https://arxiv.org/abs/2012.155204.00E+001.50E+098.80E+09Academia
90
NEO (DL:RM-2022)2021-09-151.1E+211661.082394.07NVIDIA A100
Recommendation
FacebookIndustry
D Mudigere, Y Hao, J Huang, A Tulloch
2021
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
https://arxiv.org/abs/2104.051582.00E+003.00E+12Industry
91
Switch2021-01-118.2E+22149825.60Google TPU V3Language
Text autocompletion
Google BrainIndustry
William Fedus, Barret Zoph, Noam Shazeer
2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
https://arxiv.org/abs/2101.039618.00E+011.60E+124.32E+11
Switch Transformer
Industry
92
CLIP (ViT L/14@336px)2021-01-051.1E+2220191.8240146.99NVIDIA Tesla V100 PCIeMultimodal
Zero-shot image classification
Open AIIndustry
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
2021
Learning Transferable Visual Models From Natural Language Supervision
https://arxiv.org/abs/2103.000207.37E+041.30E+023.70E+08
Custom image-text pairs from the internet
4.00E+081.10E+0886016Industry
93
DALL-E2021-01-054.7E+2286274.16171537.13NVIDIA Tesla V100 PCIeDrawingText-to-imageOpenAIIndustry
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
2021
Zero-Shot Text-to-Image Generation
https://openai.com/blog/dall-e/8.00E+011.20E+102.50E+08Industry
94
Meta Pseudo Labels2021-03-012.1E+23369462.82Vision
Image Classification
Google AI, Brain team
Industry
Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, and Quoc V. Le
2021
Meta pseudo labels
https://arxiv.org/abs/2003.105801.31E+02
SOTA Improvement
4.80E+08ImageNet1.30E+08Industry
95
GPT-Neo2021-03-217.9E+2113685.99LanguageEleutherAI
Research collective
2021GPT-Neohttps://www.eleuther.ai/projects/gpt-neo/2.70E+09The Pile8.86E+11Industry
96
PanGu-α2021-04-255.8E+2297802.06Ascend 910LanguagePanGu-α teamIndustry
Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi LiaoZhiwei WangXin JiangZhenzhang YangKaisheng WangXiaoda ZhangChen LiZiyan GongYifan YaoXinjing HuangJun WangJianfeng YuQi GuoYue YuYan ZhangJin WangHengtao TaoDasen YanZexuan YiFang PengFangqing JiangHan ZhangLingfeng DengYehong ZhangZhe LinChao ZhangShaojie ZhangMingyue GuoShanzhi GuGaojun FanYaowei WangXuefeng JinQun LiuYonghong Tian
2021
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
https://arxiv.org/abs/2104.123695.00E+002.07E+11Custom dataset2.00E+11
unidirectional transformer decoder
Industry
97
GPT-J-6B2021-05-011.5E+2225176.80Language
Research collective
Aran Komatsuzaki
2021GPT-J-6B: 6B JAX-Based Transformerhttps://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/6.05E+091.60E+11Industry
98
HyperClova2021-05-256.3E+22103802.31LanguageNaver CorpIndustry2021Hyperclovahttps://www.navercorp.com/promotion/pressReleasesView/305462.04E+115.60E+11Industry
99
ProtT5-XXL2021-05-047.4E+22123918.36Google TPU V3OtherProteins
Technical University of Munich, Med AI Technology, Google AI, NVIDIA, Oak Ridge National Laboratory
Industry - Academia Collaboration
A Elnaggar, M Heinzinger, C Dallago, G Rihawi
2021
ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning
https://www.biorxiv.org/content/10.1101/2020.07.12.199554v35.70E+011.10E+10UniRef; BDF3.93E+11Industry
100
ERNIE 3.02021-07-052.4E+183.83LanguageBaidu Inc. Industry
Y Sun, S Wang, S Feng, S Ding, C Pang
2021
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
http://research.baidu.com/Blog/index-view?id=1601.00E+001.00E+106.68E+11
Transformer-XL: Transformer with auxilary recurrence memory module
Industry