A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | System | Publication date | Training compute (FLOPs) | Method 1 Training cost (2020 USD) | Method 2 Training cost (2020 USD) | Hardware model | Domain | Task | Organization(s) | Organization Categorization | Author(s) | Year | Reference | Link | Training core-hours | Citations | Inclusion criteria | Parameters | Training dataset | Training dataset size (datapoints) | Hidden layers | Inference compute (FLOPs) | Training time (hours) | Equivalent training time (hours) | Inference time (ms) | Training dataset size (GB) | Approach | Dense or sparse model | Training objective | Architecture | Compute Sponsor Categorization | |
2 | GPU DBNs | 2009-06-15 | 1.0E+15 | 0.05 | 0.06 | NVIDIA GeForce GTX 280 | Other | Stanford | Academia | R Raina, A Madhavan, AY Ng | 2009 | Large-scale Deep Unsupervised Learning using Graphics Processors | http://www.machinelearning.org/archive/icml2009/papers/218.pdf | 7.89E+02 | 1.00E+08 | 1.00E+06 | Academia | |||||||||||||||
3 | 6-layer MLP (MNIST) | 2010-03-01 | 1.3E+14 | 0.01 | 0.01 | NVIDIA GeForce GTX 280 | Vision | Character recognition | IDSIA ; University of Lugano & SUPSI | Academia | Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, Juergen Schmidhuber | 2010 | Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition | https://arxiv.org/abs/1003.0358 | 1.26E+03 | Highly cited | 1.21E+07 | MNIST | 6.00E+04 | Academia | ||||||||||||
4 | Feedforward NN | 2010-05-13 | 3.5E+14 | 0.01 | Vision | Digit recognition | University of Montreal | Academia | X Glorot, Y Bengio | 2010 | Understanding the difficulty of training deep feedforward neural networks | https://proceedings.mlr.press/v9/glorot10a.html | 1.33E+04 | Highly cited | 7.08E+06 | MNIST | 1.40E+07 | Academia | ||||||||||||||
5 | RNN 500/10 + RT09 LM (NIST RT05) | 2010-09-26 | 3.4E+15 | 0.11 | Speech | Transcription | Brno University of Technology, Johns Hopkins University | Academia | T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur | 2010 | Recurrent neural network based language model. | https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model | 5.67E+03 | Highly cited | 5.27E+06 | NIST RT05 | 5.40E+06 | 1.05E+07 | Academia | |||||||||||||
6 | KN5 LM + RNN 400/10 (WSJ) | 2010-09-26 | 6.1E+16 | 2.03 | Speech | Transcription | Brno University of Technology, Johns Hopkins University | Academia | T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and S. Khudanpur | 2010 | Recurrent neural network based language model. | https://www.researchgate.net/publication/221489926_Recurrent_neural_network_based_language_model | 5.67E+03 | Highly cited | 8.00E+07 | WSJ | 6.40E+06 | 1.60E+08 | Academia | |||||||||||||
7 | MCDNN (MNIST) | 2012-02-13 | 3.7E+15 | 0.08 | Vision | Character recognition | IDSIA | Academia | D Ciregan, U Meier, J Schmidhuber | 2012 | Multi-column Deep Neural Networks for Image Classification | https://arxiv.org/abs/1202.2745v1 | 4.83E+03 | Highly cited | 1.99E+06 | MNIST | 6.00E+04 | 3 | 2.59E+07 | 490 | 0.021 | Academia | ||||||||||
8 | Dropout (MNIST) | 2012-06-03 | 6.0E+15 | 0.12 | 0.10 | NVIDIA GeForce GTX 580 | Vision | Character recognition | University of Toronto | Academia | GE Hinton, N Srivastava, A Krizhevsky | 2012 | Improving neural networks by preventing co-adaptation of feature detectors | https://arxiv.org/abs/1207.0580 | 6.68E+03 | Highly cited | 5.59E+06 | MNIST | 6.00E+04 | 2 | 1.12E+07 | 0.021 | Academia | |||||||||
9 | AlexNet | 2012-09-30 | 4.7E+17 | 8.86 | 8.00 | NVIDIA GeForce GTX 580 | Vision | Image classification | University of Toronto | Academia | Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton | 2012 | ImageNet Classification with Deep Convolutional Neural Networks | https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html | 8.51E+04 | Highly cited | 6.00E+07 | ImageNet | 1.20E+06 | 8 | Academia | |||||||||||
10 | DQN | 2013-01-01 | 2.3E+15 | 0.04 | Games | Atari | DeepMind | Industry | V Mnih, K Kavukcuoglu, D Silver, A Graves | 2013 | Playing Atari with Deep Reinforcement Learning | https://arxiv.org/abs/1312.5602 | 6.68E+03 | Highly cited | 8.36E+05 | Industry | ||||||||||||||||
11 | Mitosis | 2013-09-22 | 1.4E+17 | 2.00 | Vision | IDSIA | Academia | Dan C. Cireşan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber | 2013 | Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks | https://link.springer.com/chapter/10.1007/978-3-642-40763-5_51 | 1.46E+03 | ICPR 2012 mitosis detection competition winner | 3.72E+04 | 1.00E+06 | Academia | ||||||||||||||||
12 | Word2Vec (large) | 2013-10-16 | 3.9E+16 | 0.55 | Language | Semantic embedding | Industry | T Mikolov, I Sutskever, K Chen, GS Corrado | 2013 | Distributed Representations of Words and Phrases and their Compositionality | https://arxiv.org/abs/1310.4546 | 2.87E+04 | Highly cited | 6.92E+09 | 6.92E+05 | Predict nearby words | Recurrent Neural Network | Industry | ||||||||||||||
13 | Visualizing CNNs | 2013-11-12 | 5.3E+17 | 7.30 | 9.02 | NVIDIA GeForce GTX 580 | Vision | NYU | Academia | MD Zeiler, R Fergus | 2013 | Visualizing and Understanding Convolutional Networks | https://arxiv.org/abs/1311.2901 | 1.30E+04 | Highly cited | Predict nearby words | Academia | |||||||||||||||
14 | TransE | 2013-12-05 | 1.3E+18 | 17.58 | Other | Entity embedding | CNRS, Google | Industry - Academia Collaboration | Antoine Bordes, Nicolas Usunier, Alberto Garcia- Duran, Jason Weston, and Oksana Yakhnenko | 2013 | Translating Embeddings for Modeling Multi- relational Data | https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html | 4.00E+03 | 1.70E+07 | Industry | |||||||||||||||||
15 | Image generation | 2013-12-20 | 4.8E+14 | 0.01 | Intel Xeon | Vision | Image clustering | Univeristy of Amsterdam | Academia | DP Kingma, M Welling | 2013 | Auto-Encoding Variational Bayes | https://arxiv.org/abs/1312.6114 | 1.56E+04 | Highly cited | MNIST | 6.00E+04 | Academia | ||||||||||||||
16 | Image Classification with the Fisher Vector: Theory and Practice | 2013-06-12 | 9.1E+13 | 0.00 | Intel Xeon E5-2470 | Vision | Image Classifcation | Universidad Nacional de Cordoba, Xerox Research Centre Europe, Inteligent Systems Lab Amsterdam, University of Amsterdam, LEAR Team, INRIA Grenoble | Industry - Academia Collaboration | orge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek | 2013 | Image Classification with the Fisher Vector: Theory and Practice | https://hal.inria.fr/hal-00830491v2/document | 1707 | Highly cited | ImageNet | 2 | |||||||||||||||
17 | GANs | 2014-06-10 | 5.2E+17 | 6.09 | Drawing | Image generation | Universite de Montréal | Academia | Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio | 2014 | Generative Adversarial Networks | https://arxiv.org/abs/1406.2661 | 3.69E+04 | Highly cited | CIFAR-10 | 6.00E+04 | Academia | |||||||||||||||
18 | SPPNet | 2014-06-18 | 6.1E+18 | 70.97 | 65.07 | NVIDIA GeForce GTX TITAN | Vision | Image classification | Microsoft, Xi’an Jiaotong University, University of Science and Technology of China | Industry - Academia Collaboration | 2014 | Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition | https://arxiv.org/abs/1406.4729 | 7.41E+03 | Highly cited | Imagenet-1k | 1.28E+06 | Industry | ||||||||||||||
19 | RNNsearch-50* | 2014-09-01 | 1.6E+18 | 17.57 | 81.48 | NVIDIA Quadro K6000 | Language | Translation | Universite de Montréal, Jacobs University Bremen | Academia | D Bahdanau, K Cho, Y Bengio | 2014 | Neural Machine Translation by Jointly Learning to Align and Translate | https://arxiv.org/abs/1409.0473 | 1.92E+04 | Highly cited | WMT'14 + selection | 3.84E+08 | Academia | |||||||||||||
20 | VGG16 | 2014-09-04 | 8.5E+18 | 93.11 | 82.80 | NVIDIA GeForce GTX TITAN Black | Vision | University of Oxford | Academia | Karen Simonyan; Andrew Zisserman | 2014 | Very Deep Convolutional Networks for Large-Scale Image Recognition | https://arxiv.org/abs/1409.1556 | 6.13E+04 | Highly cited | 1.38E+08 | ILSVRC-2012 | 1.30E+06 | 16 | 1.53E+10 | Academia | |||||||||||
21 | Seq2Seq LSTM | 2014-09-10 | 7.3E+18 | 79.60 | Language | Translation | Industry | I Sutskever, O Vinyals, QV Le | 2014 | Sequence to Sequence Learning with Neural Networks | https://arxiv.org/abs/1409.3215 | 1.57E+04 | Highly cited | 3.84E+08 | WMT'14 dataset | 3.84E+08 | Industry | |||||||||||||||
22 | ADAM (CIFAR-10) | 2014-12-22 | 6.0E+16 | 0.60 | Vision | Image classification | University of Amsterdam, OpenAI, University of Toronto | Industry - Academia Collaboration | DP Kingma, J Ba | 2014 | Adam: A Method for Stochastic Optimization | https://arxiv.org/abs/1412.6980 | 8.11E+04 | Highly cited | Industry | |||||||||||||||||
23 | MSRA (C, PReLU) | 2015-01-09 | 2.4E+19 | 238.36 | 2166.22 | NVIDIA Tesla K40 | Vision | Image classification | Microsoft research | Industry | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun | 2015 | Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition | https://arxiv.org/abs/1406.4729 | 1.41E+04 | Highly cited | 8.70E+07 | Imagenet-1k | 1.28E+06 | Industry | ||||||||||||
24 | GoogLeNet / InceptionV1 | 2015-06-07 | 1.6E+18 | 14.16 | Vision | Image classification | Google, University of Michigan, University of North Carolina | Industry - Academia Collaboration (Industry leaning) | Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich | 2015 | Going deeper with convolutions | https://arxiv.org/abs/1409.4842 | 3.28E+04 | Highly cited | 6.80E+06 | ILSVRC 2014 | 1.20E+06 | 22 | Industry | |||||||||||||
25 | AlphaGo Fan | 2015-10-01 | 3.8E+20 | 3076.07 | Games | Go | Google DeepMind | Industry | D Silver, A Huang, CJ Maddison, A Guez, L Sifre | 2015 | Mastering the game of Go with deep neural networks and tree search | https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ | 5.18E+03 | SOTA improvement | 8.21E+06 | Industry | ||||||||||||||||
26 | DeepSpeech2 | 2015-12-08 | 2.6E+19 | 199.71 | 150.78 | NVIDIA GeForce GTX TITAN X | Speech | Speech recognition | Baidu Research- Silicon Valley AI Lab | Industry | D Amodei, S Ananthanarayanan | 2015 | Deep Speech 2: End-to-End Speech Recognition in English and Mandarin | https://arxiv.org/abs/1512.02595 | 2.21E+03 | Highly cited | 3.80E+07 | 9.80E+09 | 11 | 1.80E+09 | Industry | |||||||||||
27 | ResNet-152 (ImageNet) | 2015-12-10 | 1.2E+19 | 92.03 | Vision | Image classification | Microsoft | Industry | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun | 2015 | Deep Residual Learning for Image Recognition | https://arxiv.org/abs/1512.03385 | 8.58E+04 | Highly cited | 6.00E+07 | ILSVRC 2012 | 1.20E+06 | 152 | 2.26E+10 | 138 | Industry | |||||||||||
28 | AlphaGo Lee | 2016-01-27 | 1.9E+21 | 14041.80 | Google TPU V1 | Games | Go | DeepMind | Industry | D Silver, A Huang, CJ Maddison, A Guez, L Sifre | 2016 | Mastering the game of Go with deep neural networks and tree search | https://www.nature.com/articles/nature16961 | 1.08E+04 | Highly cited | 2.94E+07 | Industry | |||||||||||||||
29 | R-FCN | 2016-06-21 | 6.1E+16 | 0.40 | 5.51 | NVIDIA Tesla K40 | Vision | Object detection | Microsoft research, Tsinghua university | Industry - Academia Collaboration (Industry leaning) | Jifeng Dai, Y. Li, Kaiming He, and Jian Sun | 2016 | R-fcn: Object detection via region-based fully convolutional networks. | https://arxiv.org/abs/1605.06409 | 4.49E+03 | PASCAL VOC (2007 and 2012 vesrions) + MS COCO | 9.44E+04 | 12.06567222 | 170 | Industry | ||||||||||||
30 | Part-of-sentence tagging model | 2016-07-21 | 1.5E+17 | 0.97 | Language | POS tagging | University of Toronto | Academia | Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton | 2016 | Layer Normalization. | https://arxiv.org/abs/1607.06450 | 4.13E+03 | Highly cited | 12 | Academia | ||||||||||||||||
31 | Named Entity Recognition model | 2016-07-21 | 9.7E+16 | 0.63 | Language | Named Entity Recognition model | University of Toronto | Academia | Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton | 2016 | Layer Normalization. | https://arxiv.org/abs/1607.06450 | 4.13E+03 | Highly cited | 8 | Academia | ||||||||||||||||
32 | GNMT | 2016-09-26 | 6.9E+21 | 42275.13 | 307573.50 | NVIDIA Tesla K80 | Language | Translation | Industry | Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean | 2016 | Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation | https://research.google/pubs/pub45610/ | 4.50E+03 | Highly cited | 2.78E+08 | 3.60E+08 | Industry | ||||||||||||||
33 | Xception | 2016-10-07 | 4.4E+19 | 267.30 | 1961.34 | NVIDIA Tesla K80 | Vision | Image classification | Industry | François Chollet | 2016 | Xception: Deep Learning with Depthwise Separable Convolutions | https://arxiv.org/abs/1610.02357 | 5.84E+03 | Highly cited | 2.29E+07 | JFT | 3.50E+08 | 1.68E+10 | Industry | ||||||||||||
34 | NASv3 (CIFAR-10) | 2016-11-05 | 2.2E+21 | 13069.35 | Vision | Google Brain | Industry | Barret Zoph, Quoc V. Le | 2016 | Neural Architecture Search with Reinforcement Learning | https://arxiv.org/abs/1611.01578 | 2.97E+03 | Highly cited | 3.74E+07 | 39 | Industry | ||||||||||||||||
35 | Libratus | 2017-01-01 | 1.1E+21 | 6253.49 | Intel Xeon E5-2695 v3 | Games | Poker | Carnegie Mellon University | Academia | N Brown, T Sandholm, S Machine | 2017 | Libratus: The Superhuman AI for No-Limit Poker | https://www.cs.cmu.edu/~noamb/papers/17-IJCAI-Libratus.pdf | 6.40E+01 | SOTA improvement | 3000000 | Academia | |||||||||||||||
36 | AlphaGo Master | 2017-01-01 | 1.5E+23 | 852748.08 | Google TPU V1 | Games | Go | DeepMind | Industry | D Silver, J Schrittwieser, K Simonyan, I Antonoglou | 2017 | Mastering the game of Go without human knowledge | https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge | 5.81E+03 | Highly cited | Industry | ||||||||||||||||
37 | DeepStack | 2017-01-06 | 1.5E+14 | 0.00 | Games | Poker | University of Alberta, Charles University, Czech Technical University | Academia | Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling | 2017 | DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker | https://arxiv.org/abs/1701.01724 | 6.18E+02 | 2.50E+06 | 1.00E+07 | Academia | ||||||||||||||||
38 | MoE | 2017-01-23 | 9.4E+19 | 525.39 | 8484.35 | NVIDIA Tesla K40 | Language | Language modelling / Machine translation | Google Brain, Jagiellonian University, Cracow | Industry - Academia Collaboration (Industry leaning) | N Shazeer, A Mirhoseini, K Maziarz, A Davis | 2017 | Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | https://arxiv.org/abs/1701.06538 | 6.87E+02 | 8.70E+09 | 1.00E+11 | Sparse | Long Short-Term Memory Mixture-Of-Experts | Industry | ||||||||||||
39 | Transformer | 2017-06-12 | 7.4E+18 | 37.13 | 111.17 | NVIDIA Tesla P100 | Language | Translation | Google Brain ; Google Research | Industry | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin | 2017 | Attention Is All You Need | https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf | 2.52E+04 | Highly cited | 2.13E+08 | 3.60E+08 | 5.40E+10 | 672 | Industry | |||||||||||
40 | JFT | 2017-08-04 | 4.8E+20 | 2311.64 | 21396.42 | NVIDIA Tesla K80 | Vision | Google Research, CMU | Industry - Academia Collaboration | ChenSun,AbhinavShrivastava,SaurabhSingh,andAbhinavGupta | 2017 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. | https://arxiv.org/abs/1707.02968 | 1.14E+03 | Highly cited | JFT-300M | 3.00E+08 | Industry | ||||||||||||||
41 | OpenAI TI7 DOTA 1v1 | 2017-08-11 | 6.0E+20 | 2873.99 | Games | DOTA | OpenAI | Industry | A Radford, K Narasimhan, T Salimans, I Sutskever | 2017 | Dota 2 | https://openai.com/five/ | NA | 1.50E+08 | Industry | |||||||||||||||||
42 | AlphaGo Zero | 2017-10-19 | 3.4E+23 | 1544149.42 | Google TPU V1 | Games | Go | DeepMind | Industry | D Silver, J Schrittwieser, K Simonyan, I Antonoglou | 2017 | Mastering the game of Go without human knowledge | https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge | 5.81E+03 | Highly cited | 4.64E+07 | 5.80E+09 | Industry | ||||||||||||||
43 | PNASNet-5 | 2017-12-02 | 6.6E+19 | 289.74 | 991.48 | NVIDIA Tesla P100 | Johns Hopkins University, Stanford, Google AI | Industry - Academia Collaboration (Industry leaning) | C Liu, B Zoph, M Neumann, J Shlens | 2017 | Progressive Neural Architecture Search | https://arxiv.org/abs/1712.00559 | 1.34E+03 | Highly cited | Imagenet-1k | 1.28E+06 | Industry | |||||||||||||||
44 | AlphaZero | 2017-12-05 | 3.7E+22 | 162054.70 | Google TPU V2 | Games | DeepMind | Industry | D Silver, T Hubert, J Schrittwieser, I Antonoglou | 2017 | Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm | https://arxiv.org/abs/1712.01815 | 1.08E+03 | Highly cited | 7.00E+05 | Score | Industry | |||||||||||||||
45 | IMPALA | 2018-02-05 | 1.7E+20 | 709.79 | 2553.82 | NVIDIA Tesla P100 | Games | Atari | DeepMind | Industry | Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu | 2018 | IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | https://arxiv.org/abs/1802.01561 | 6.75E+02 | 1.60E+06 | 2.40E+11 | Industry | ||||||||||||||
46 | AmoebaNet-A (F=448) | 2018-02-05 | 3.9E+20 | 1628.35 | 5858.75 | NVIDIA Tesla P100 | Vision | Image classification | Google Brain | Industry | Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le | 2018 | Regularized Evolution for Image Classifier Architecture Search | https://arxiv.org/abs/1802.01548 | 1.71E+03 | Highly cited | 4.69E+08 | Imagenet-1k | 1.28E+06 | Industry | ||||||||||||
47 | YOLOv3 | 2018-04-08 | 5.1E+19 | 202.99 | 295.76 | NVIDIA GeForce GTX TITAN X | Vision | Object detection | University of Washington | Academia | Joseph Redmon, Ali Farhadi | 2018 | YOLOv3: An Incremental Improvement | https://arxiv.org/abs/1804.02767 | 7.71E+03 | Highly cited | 1.06E+08 | ImageNet | 1.28E+06 | 7.10E+10 | Academia | |||||||||||
48 | GPT | 2018-06-01 | 1.8E+19 | 68.72 | NVIDIA Quadro P600 | Language | OpenAI | Industry | A Radford, K Narasimhan, T Salimans, I Sutskever | 2018 | Improving Language Understanding by Generative Pre-Training | https://openai.com/blog/language-unsupervised/ | 2.26E+03 | Highly cited | 1.17E+08 | BooksCorpus | 1.00E+09 | 3.00E+10 | Industry | |||||||||||||
49 | Population-based DRL | 2018-07-03 | 3.5E+19 | 130.36 | Games | Capture the flag | DeepMind | Industry | Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel | 2018 | Human-level performance in first-person multiplayer games with population-based deep reinforcement learning | https://arxiv.org/abs/1807.01281 | 4.34E+02 | 1.22E+08 | 6.00E+10 | Industry | ||||||||||||||||
50 | BigGAN-deep 512x512 | 2018-09-28 | 3.0E+21 | 10448.44 | Drawing | Image generation | Heriot-Watt University, DeepMind | Industry - Academia Collaboration | A Brock, J Donahue, K Simonyan | 2018 | Large Scale GAN Training for High Fidelity Natural Image Synthesis | https://arxiv.org/abs/1809.11096 | 1.98E+03 | Highly cited | 1.13E+08 | JFT-300M | 2.92E+08 | Industry | ||||||||||||||
51 | BERT-Large | 2018-10-11 | 2.9E+20 | 999.93 | Google TPU V2 | Language | Next sentence prediction | Google AI | Industry | J Devlin, MW Chang, K Lee, K Toutanova | 2018 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | https://arxiv.org/abs/1810.04805 | 2.38E+04 | Highly cited | 3.40E+08 | 3.30E+09 | 7.90E+10 | Industry | |||||||||||||
52 | Decoupled weight decay regularization | 2019-01-04 | 2.5E+18 | 8.07 | Vision | Image classification | University of Freiburg | Academia | Ilya Loshchilov and Frank Hutter | 2019 | Decoupled weight decay regularization. | https://arxiv.org/abs/1711.05101 | 2.06E+03 | 3.65E+07 | CIFAR-10 | 5.00E+04 | 1.73E+09 | Academia | ||||||||||||||
53 | Hanabi 4 player | 2019-02-01 | 9.2E+16 | 0.29 | 0.34 | NVIDIA Tesla V100 PCIe | Games | Hanabi | DeepMind, University of Oxford, Google Brain, Carnegie Mellon University, | Industry - Academia Collaboration (Industry leaning) | 2019 | The Hanabi Challenge: A New Frontier for AI Research | https://arxiv.org/abs/1902.00506 | 1.15E+02 | 7.64E+05 | Industry | ||||||||||||||||
54 | GPT-2 | 2019-02-14 | 1.5E+21 | 4692.89 | Language | OpenAI | Industry | A Radford, J Wu, R Child, D Luan, D Amodei | 2019 | Language Models are Unsupervised Multitask Learners | https://openai.com/blog/better-language-models/ | 1.70E+03 | Highly cited | 1.50E+09 | 3.00E+09 | 3.40E+12 | 40 | Industry | ||||||||||||||
55 | ProxylessNAS | 2019-02-23 | 3.7E+19 | 114.96 | 135.04 | NVIDIA Tesla V100 PCIe | Vision | MIT | Academia | Han Cai, Ligeng Zhu, and Song Han | 2019 | ProxylessNAS: Direct neural architecture search on target task and hardware | https://arxiv.org/abs/1812.00332 | 9.96E+02 | ImageNet | 1.28E+06 | 2.63E+11 | 200 | 5.1 | Academia | ||||||||||||
56 | Cross-lingual alignment | 2019-04-04 | 2.6E+18 | 7.83 | Tel Aviv University, MIT | Academia | Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson. | 2019 | Cross-lingual alignment of contextual word embeddings, with applications to zero- shot dependency parsing. | https://arxiv.org/abs/1902.09492 | 1.29E+02 | 3.66E+12 | Academia | |||||||||||||||||||
57 | MnasNet-A1 + SSDLite | 2019-05-29 | 1.5E+21 | 4331.00 | Vision | Performing image classification and object detection on mobile devices | Industry | Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le | 2019 | MnasNet: Platform-Aware Neural Architecture Search for Mobile | https://arxiv.org/abs/1807.11626 | 1.43E+03 | Highly cited | 4.90E+06 | MS COCO | 1.18E+05 | Industry | |||||||||||||||
58 | MnasNet-A3 | 2019-05-29 | 1.5E+21 | 4331.00 | Vision | Performing image classification and object detection on mobile devices | Industry | Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le | 2019 | MnasNet: Platform-Aware Neural Architecture Search for Mobile | https://arxiv.org/abs/1807.11626 | 1.43E+03 | Highly cited | 5.20E+06 | ImageNet | 1.28E+06 | Industry | |||||||||||||||
59 | DLRM-2020 | 2019-05-31 | 4.0E+18 | 11.53 | 14.60 | NVIDIA Tesla V100 PCIe | Recommendation | Facebook AI | Industry | M Naumov, D Mudigere, HJM Shi, J Huang | 2019 | Deep Learning Recommendation Model for Personalization and Recommendation Systems | https://arxiv.org/abs/1906.00091 | 1.40E+02 | 1.00E+11 | Industry | ||||||||||||||||
60 | FTW | 2019-05-31 | 7.3E+21 | 21045.02 | Games | Capture the flag | DeepMind | Industry | M Jaderberg, WM Czarnecki, I Dunning, L Marris | 2019 | Human-level performance in 3D multiplayer games with population-based reinforcement learning | https://deepmind.com/research/publications/capture-the-flag | 4.25E+02 | 1.26E+08 | 1.21E+12 | Industry | ||||||||||||||||
61 | ObjectNet | 2019-09-06 | 1.9E+19 | 50.79 | Vision | Object recognition | MIT | Academia | Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfre- und, Josh Tenenbaum, and Boris Katz | 2019 | Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models | https://papers.nips.cc/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf | 2.39E+03 | Highly cited | 3.80E+07 | Internal data | 5.00E+04 | 108 | Academia | |||||||||||||
62 | Hide and Seek | 2019-09-17 | 3.0E+17 | 0.80 | Games | Hide and Seek | OpenAI | Industry | B Baker, I Kanitscheider, T Markov, Y Wu | 2019 | Emergent Tool Use From Multi-Agent Autocurricula | https://openai.com/blog/emergent-tool-use/ | 2.24E+02 | 1.60E+06 | 3.17E+10 | Industry | ||||||||||||||||
63 | Megatron-LM (Original, 8.3B) | 2019-09-17 | 9.1E+21 | 24117.93 | 33212.51 | NVIDIA Tesla V100 PCIe | Language | NVIDIA | Industry | M Shoeybi, M Patwary, R Puri, P LeGresley | 2019 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | https://arxiv.org/abs/1909.08053 | 2.46E+02 | 8.30E+09 | 3.48E+10 | 1.80E+13 | Industry | ||||||||||||||
64 | Megatron-BERT | 2019-09-17 | 5.7E+22 | 151068.35 | 208034.39 | NVIDIA Tesla V100 PCIe | Language | NVIDIA | Industry | M Shoeybi, M Patwary, R Puri, P LeGresley | 2019 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | https://arxiv.org/abs/1909.08053 | 2.46E+02 | 3.90E+09 | 3.48E+10 | Industry | |||||||||||||||
65 | AlphaX-1 | 2019-10-02 | 7.6E+18 | 19.91 | 24.10 | NVIDIA GeForce GTX 1080 Ti | Vision | Neural architecture search for computer vision | Brown and Facebook AI Research | Industry - Academia Collaboration (Academia leaning) | Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian, Rodrigo Fonseca1 | 2019 | AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search | https://arxiv.org/abs/1903.11059 | 5.00E+01 | 5.79E+08 | ImageNet | Industry | ||||||||||||||
66 | Rubik's cube | 2019-10-15 | 8.5E+20 | 2204.62 | 3102.27 | NVIDIA Tesla V100 PCIe | Robotics | Open AI | Industry | Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang | 2019 | Solving Rubik’s Cube with a Robot Hand | https://arxiv.org/abs/1910.07113 | 2.27E+02 | 2.78E+07 | 6.24E+07 | Industry | |||||||||||||||
67 | T5-3B | 2019-10-23 | 1.0E+22 | 25777.12 | Google TPU V3 | Language | Text autocompletion | Industry | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | https://arxiv.org/abs/1910.10683 | 1.54E+03 | Highly cited | 3.00E+09 | Colossal Clean Crawled Corpus (C4) | 1.50E+11 | Transformer (encoder-decoder performed best) | Industry | |||||||||||||
68 | T5-11B | 2019-10-23 | 4.1E+22 | 105686.20 | Google TPU V3 | Language | Text autocompletion | Industry | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | https://arxiv.org/abs/1910.10683 | 1.54E+03 | Highly cited | 1.10E+10 | Colossal Clean Crawled Corpus (C4) | 1.50E+11 | Transformer (encoder-decoder performed best) | Industry | |||||||||||||
69 | AlphaStar | 2019-10-30 | 2.0E+23 | 512765.27 | Google TPU V3 | Games | StarCraft | DeepMind | Industry | Oriol Vinyals,Igor Babuschkin,Wojciech M. Czarnecki,Michaël Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander S. Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom L. Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver | 2019 | Grandmaster level in StarCraft II using multi-agent reinforcement learning | https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning | 1.04E+03 | Highly cited | 1.39E+08 | Industry | |||||||||||||||
70 | MuZero | 2019-11-19 | 4.8E+19 | 121.18 | Google TPU V3 | Games | Atari Games | DeepMind | Industry | J Schrittwieser, I Antonoglou, T Hubert, K Simonyan | 2019 | Mastering Atari Go Chess and Shogi by Planning with a Learned Model | https://arxiv.org/abs/1911.08265v2 | 4.12E+02 | SOTA improvement | 3.69E+07 | 2.00E+10 | Industry | ||||||||||||||
71 | OpenAI Five Rerun | 2019-12-13 | 1.3E+22 | 32217.13 | Games | Dota 2 | OpenAI | Industry | Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung,Przemysław “Psyho" Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang | 2019 | Dota 2 with Large Scale Deep Reinforcement Learning | https://cdn.openai.com/dota-2.pdf | 3.49E+02 | SOTA improvement | 1.59E+08 | 5.31E+10 | Industry | |||||||||||||||
72 | OpenAI Five | 2019-12-13 | 6.7E+22 | 166042.11 | Games | Dota 2 | OpenAI | Industry | J Raiman, S Zhang, F Wolski | 2019 | Dota 2 with Large Scale Deep Reinforcement Learning | https://arxiv.org/abs/1912.06680 | 4.54E+02 | SOTA improvement | 1.59E+08 | 4.54E+11 | Industry | |||||||||||||||
73 | DLRM-2021 | 2020-07-01 | 3.0E+20 | 636.66 | 1094.92 | NVIDIA Tesla V100 PCIe | Recommendation | Facebook AI | Industry | D Mudigere, Y Hao, J Huang, A Tulloch | 2020 | High- performance, Distributed Training of Large scale Deep Learning Recommendation Models | https://www.arxiv-vanity.com/papers/2104.05158/ | 2.00E+00 | 1.00E+12 | Industry | ||||||||||||||||
74 | AlphaFold | 2020-01-15 | 1.0E+20 | 241.59 | Other | Protein folding prediction | DeepMind | Industry | Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu & Demis Hassabis | 2020 | Improved protein structure prediction using potentials from deep learning | https://www.nature.com/articles/s41586-019-1923-7 | 8.40E+02 | 6.90E+07 | Score | Industry | ||||||||||||||||
75 | Meena | 2020-01-28 | 1.1E+23 | 263099.94 | Google TPU V3 | Language | Text autocompletion | Google AI | Industry | Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang | 2020 | Towards a Human-like Open-Domain Chatbot | https://arxiv.org/abs/2001.09977 | 2.57E+02 | 2.60E+09 | 4.00E+10 | Evolved Transformer seq2seq model | Industry | ||||||||||||||
76 | ALBERT-xxlarge | 2020-02-09 | 2.5E+21 | 5924.43 | Google TPU V3 | Language | Google research, Toyota Technological Institute at Chicago | Industry - Academia Collaboration | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut | 2020 | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. | https://arxiv.org/abs/1909.11942 | 2.18E+03 | Highly cited | 2.35E+08 | 3.30E+09 | 2.50E+12 | 17408 | Industry | |||||||||||||
77 | Turing NLG | 2020-02-13 | 1.6E+22 | 37799.51 | 58395.62 | NVIDIA Tesla V100 PCIe | Language | Text autocompletion | Microsoft | Industry | C Rosset | 2020 | Turing-NLG: A 17-billion-parameter language model by Microsoft | https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/ | 3.40E+01 | 1.70E+10 | 3.48E+10 | 3.60E+13 | Next token prediction | Industry | ||||||||||||
78 | ProGen | 2020-03-13 | 2.7E+20 | 623.75 | Google TPU V3 | Other | Protein generation | Salesforce research, Stanford | Industry - Academia Collaboration | A Madani, B McCann, N Naik, NS Keskar | 2020 | ProGen: Language Modeling for Protein Generation | https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2 | 8.60E+04 | 4.60E+01 | 1.20E+09 | Industry | |||||||||||||||
79 | GPT-3 175B (davinci) | 2020-04-28 | 3.1E+23 | 691184.67 | 1131415.12 | NVIDIA Tesla V100 PCIe | Language | Text autocompletion | OpenAI | Industry | Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei | 2020 | Language models are Few- Shot Learners | https://arxiv.org/abs/2005.14165 | 1.53E+03 | Highly cited | 1.75E+11 | CommonCrawl; WebText2; Books1; Books2; Wikipedia | 3.74E+11 | 7.40E+14 | 45TB | Industry | ||||||||||
80 | Once for All | 2020-04-29 | 1.8E+21 | 4010.23 | 6569.51 | NVIDIA Tesla V100 PCIe | Vision | MIT-IBM Watson AI Lab | Industry | Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han | 2020 | Once for all: Train one network and specialize it for efficient deployment. | https://arxiv.org/abs/1908.09791 | 1.20E+03 | 3.71E+02 | 7.70E+06 | Imagenet | Industry | ||||||||||||||
81 | iGPT-L | 2020-06-17 | 8.9E+21 | 19092.67 | 32482.56 | NVIDIA Tesla V100 PCIe | Drawing | Image completion | Open AI | Industry | Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever | 2020 | Generative Pretraining from Pixels | https://openai.com/blog/image-gpt/ | 6.00E+04 | 1.82E+02 | 1.36E+09 | ILSVRC 2012 | 9.60E+06 | Industry | ||||||||||||
82 | iGPT-XL | 2020-06-17 | 3.3E+22 | 70793.03 | 120440.96 | NVIDIA Tesla V100 PCIe | Drawing | Image completion | Open AI | Industry | Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever | 2020 | Generative Pretraining from Pixels | https://openai.com/blog/image-gpt/ | 1.82E+02 | 6.80E+09 | ILSVRC 2012 | 9.60E+06 | Industry | |||||||||||||
83 | GShard (600B) | 2020-06-30 | 1.3E+22 | 27609.81 | Google TPU V3 | Language | Translation | Google Brain | Industry | Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | https://arxiv.org/abs/2006.16668 | 1.93E+05 | 9.10E+01 | 6.00E+11 | 2.60E+11 | Industry | ||||||||||||||
84 | GShard (dense) | 2020-06-30 | 2.6E+22 | 55219.61 | Google TPU V3 | Language | Translation | Google Brain | Industry | Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | https://arxiv.org/abs/2006.16668 | 2.06E+06 | 9.10E+01 | 2.30E+09 | 2.60E+11 | Industry | ||||||||||||||
85 | ViT-H/14 | 2020-09-28 | 1.3E+22 | 25757.45 | Google TPU V3 | Vision | Image representation | Google Research, Brain Team | Industry | Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby | 2020 | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | https://openreview.net/forum?id=YicbFdNTTy | 6.00E+04 | 1.91E+03 | Highly cited | Imagenet-1k | 1.28E+06 | Industry | |||||||||||||
86 | wave2vec 2.0 LARGE | 2020-10-22 | 4.3E+20 | 836.34 | 1569.38 | NVIDIA Tesla V100 PCIe | Speech | Speech completion | Industry | Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli | 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | https://arxiv.org/abs/2006.11477 | 2.30E+04 | 4.10E+02 | SOTA Improvement | 3.17E+08 | LibriSpeech | 4.37E+10 | Industry | ||||||||||||
87 | KEPLER | 2020-11-23 | 1.2E+20 | 227.71 | 437.97 | NVIDIA Tesla V100 PCIe | Language | Relation Extraction | Tsinghua University, Princeton, Mila- Quebec AI, University de Montreal, HEC, CIFAR | Academia | Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. | 2020 | KEPLER: A Unified Model for Knowledge Embedding and Pre- trained Language Representation. | https://arxiv.org/abs/1911.06136 | 9.60E+01 | 1.10E+08 | Wikipedia+BookCorpus | 3.30E+09 | Academia | |||||||||||||
88 | CPM-Large | 2020-12-01 | 1.8E+21 | 3394.57 | 6569.51 | NVIDIA Tesla V100 PCIe | Language | Tsinghua University, BAAI | Industry - Academia Collaboration | Z Zhang, X Han, H Zhou, P Ke, Y Gu, D Ye, Y Qin, Y Su | 2020 | CPM: A Large-scale Generative Chinese Pre-trained Language Model | https://arxiv.org/abs/2012.00413 | 2.15E+04 | 1.00E+01 | 2.60E+09 | 1.67E+10 | Left-To-Right Transformer Decoder | Industry | |||||||||||||
89 | AraGPT2-Mega | 2020-12-31 | 2.0E+21 | 3685.43 | Google TPU V3 | Language | American University of Beirut | Academia | W Antoun, F Baly, H Hajj | 2020 | AraGPT2: Pre-Trained Transformer for Arabic Language Generation | https://arxiv.org/abs/2012.15520 | 4.00E+00 | 1.50E+09 | 8.80E+09 | Academia | ||||||||||||||||
90 | NEO (DL:RM-2022) | 2021-09-15 | 1.1E+21 | 1661.08 | 2394.07 | NVIDIA A100 | Recommendation | Industry | D Mudigere, Y Hao, J Huang, A Tulloch | 2021 | Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models | https://arxiv.org/abs/2104.05158 | 2.00E+00 | 3.00E+12 | Industry | |||||||||||||||||
91 | Switch | 2021-01-11 | 8.2E+22 | 149825.60 | Google TPU V3 | Language | Text autocompletion | Google Brain | Industry | William Fedus, Barret Zoph, Noam Shazeer | 2021 | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | https://arxiv.org/abs/2101.03961 | 8.00E+01 | 1.60E+12 | 4.32E+11 | Switch Transformer | Industry | ||||||||||||||
92 | CLIP (ViT L/14@336px) | 2021-01-05 | 1.1E+22 | 20191.82 | 40146.99 | NVIDIA Tesla V100 PCIe | Multimodal | Zero-shot image classification | Open AI | Industry | Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever | 2021 | Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/abs/2103.00020 | 7.37E+04 | 1.30E+02 | 3.70E+08 | Custom image-text pairs from the internet | 4.00E+08 | 1.10E+08 | 86016 | Industry | ||||||||||
93 | DALL-E | 2021-01-05 | 4.7E+22 | 86274.16 | 171537.13 | NVIDIA Tesla V100 PCIe | Drawing | Text-to-image | OpenAI | Industry | Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever | 2021 | Zero-Shot Text-to-Image Generation | https://openai.com/blog/dall-e/ | 8.00E+01 | 1.20E+10 | 2.50E+08 | Industry | ||||||||||||||
94 | Meta Pseudo Labels | 2021-03-01 | 2.1E+23 | 369462.82 | Vision | Image Classification | Google AI, Brain team | Industry | Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, and Quoc V. Le | 2021 | Meta pseudo labels | https://arxiv.org/abs/2003.10580 | 1.31E+02 | SOTA Improvement | 4.80E+08 | ImageNet | 1.30E+08 | Industry | ||||||||||||||
95 | GPT-Neo | 2021-03-21 | 7.9E+21 | 13685.99 | Language | EleutherAI | Research collective | 2021 | GPT-Neo | https://www.eleuther.ai/projects/gpt-neo/ | 2.70E+09 | The Pile | 8.86E+11 | Industry | ||||||||||||||||||
96 | PanGu-α | 2021-04-25 | 5.8E+22 | 97802.06 | Ascend 910 | Language | PanGu-α team | Industry | Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi LiaoZhiwei WangXin JiangZhenzhang YangKaisheng WangXiaoda ZhangChen LiZiyan GongYifan YaoXinjing HuangJun WangJianfeng YuQi GuoYue YuYan ZhangJin WangHengtao TaoDasen YanZexuan YiFang PengFangqing JiangHan ZhangLingfeng DengYehong ZhangZhe LinChao ZhangShaojie ZhangMingyue GuoShanzhi GuGaojun FanYaowei WangXuefeng JinQun LiuYonghong Tian | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | https://arxiv.org/abs/2104.12369 | 5.00E+00 | 2.07E+11 | Custom dataset | 2.00E+11 | unidirectional transformer decoder | Industry | ||||||||||||||
97 | GPT-J-6B | 2021-05-01 | 1.5E+22 | 25176.80 | Language | Research collective | Aran Komatsuzaki | 2021 | GPT-J-6B: 6B JAX-Based Transformer | https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/ | 6.05E+09 | 1.60E+11 | Industry | |||||||||||||||||||
98 | HyperClova | 2021-05-25 | 6.3E+22 | 103802.31 | Language | Naver Corp | Industry | 2021 | Hyperclova | https://www.navercorp.com/promotion/pressReleasesView/30546 | 2.04E+11 | 5.60E+11 | Industry | |||||||||||||||||||
99 | ProtT5-XXL | 2021-05-04 | 7.4E+22 | 123918.36 | Google TPU V3 | Other | Proteins | Technical University of Munich, Med AI Technology, Google AI, NVIDIA, Oak Ridge National Laboratory | Industry - Academia Collaboration | A Elnaggar, M Heinzinger, C Dallago, G Rihawi | 2021 | ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning | https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3 | 5.70E+01 | 1.10E+10 | UniRef; BDF | 3.93E+11 | Industry | ||||||||||||||
100 | ERNIE 3.0 | 2021-07-05 | 2.4E+18 | 3.83 | Language | Baidu Inc. | Industry | Y Sun, S Wang, S Feng, S Ding, C Pang | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | http://research.baidu.com/Blog/index-view?id=160 | 1.00E+00 | 1.00E+10 | 6.68E+11 | Transformer-XL: Transformer with auxilary recurrence memory module | Industry |