A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Model | Training end | Chip type | TFLOP/s (max) | Chip count | Wall clock time (days) | Total time (hours) | Total time (years) | Retail cost ($US) | MMLU ▼ | Check (calculated column) | |||||||||
2 | GPT-3 | Apr/2020 | V100 | 130 | 10,000 | 15 days | 3,552,000 | 405 years | $9M | 43.9 | $8,808,960 | |||||||||
3 | Llama 1 | Jan/2023 | A100 | 312 | 2,048 | 21 days | 1,032,192 | 118 years | $4M | 63.4 | $4,056,515 | |||||||||
4 | Llama 2 | Jun/2023 | A100 | 312 | 2,048 | 35 days | 1,720,320 | 196 years | $7M | 68.0 | $6,760,858 | |||||||||
5 | Titan | Apr/2023 | A100 | 312 | 13,760 | 48 days | 11,558,400 | 1,319 years | $45M | 70.4 | $45,424,512 | |||||||||
6 | GPT-4 | Aug/2022 | A100 | 312 | 25,000 | 95 days | 57,000,000 | 6,503 years | $224M | 86.4 | $224,010,000 | |||||||||
7 | Gemini | Nov/2023 | TPUv4 | 275 | 57,000 | 100 days | 136,800,000 | 15,606 years | $440M | 90.0 | $440,496,000 | |||||||||
8 | Llama 3 70B | Apr/2024 | H100 | 989 | 24,576 | 11 days | 6,300,000 | 719 years | $7M | 82.0 | $7,560,000 | |||||||||
9 | Llama 3 405B | Apr/2024 | H100 | 989 | 24,576 | 50 days | 29,491,200 | 3,364 years | $125M | 88.6 | $125,337,600 | |||||||||
10 | GPT-5 | Mar/2024 | H100 | 989 | 50,000 | 120 days | 144,000,000 | 16,428 years | $612M | $612,000,000 | ||||||||||
11 | Olympus | Aug/2024 | H100 | 989 | $0 | |||||||||||||||
12 | Grok 2 | Jun/2024 | H100 | 989 | 20,000 | 50 days | 57,600,000 | 6,571 years | $245M | $244,800,000 | ||||||||||
13 | Gemini 2 | Nov/2024 | TPUv6 | 1847 | $0 | |||||||||||||||
14 | Grok 3 | Dec/2024 | H100 | 989 | 100,000 | 50 days | 288,000,000 | 32,855 years | $1.2B | $1,224,000,000 | ||||||||||
15 | $0 | |||||||||||||||||||
16 | ||||||||||||||||||||
17 | Model | Source type (primary, analysis, informed estimate) | Link | Quote | Chip type | Pricing date | $ per chip-hour | Source | 1M hours | |||||||||||
18 | GPT-3 | Primary | https://arxiv.org/abs/2005.14165 | “All models were trained on V100 GPU’s on part of a high-bandwidth cluster provided by Microsoft.” | V100 | 2020 | $0.66 | https://web.archive.org/web/20200611063415/https://lambdalabs.com/service/gpu-cloud | $660,000 | Big pricing disparity between lambda and GCP | ||||||||||
19 | Analysis | https://arxiv.org/pdf/2204.05149.pdf | Note: this is a nearly-primary source by authors including Google's Dr Jeff Dean. “GPT-3 was trained on 10,000 V100 GPUs... GPT-3 took 405 V100 years to train in 2020.” | V100 | 2020 | $2.48 | https://www.top500.org/news/google-expands-its-gpu-cloud-options/ | $2,480,000 | Using this number for GPT-3 in final table | |||||||||||
20 | GPT-4 | Analysis | https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training | “According to SemiAnalysis, GPT-4 was trained on 25,000 A100’s for roughly 95 days” | A100 | 2023 | $3.93 | https://gpus.llm-utils.org/a100-gpu-cloud-availability-and-pricing/ | $3,930,000 | |||||||||||
21 | Llama 1 | Primary | https://arxiv.org/abs/2302.13971 | “When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days.” | H100 | 2023 | $4.25 | https://web.archive.org/web/20240108002155/https://coreweave.com/gpu-cloud-pricing | $4,250,000 | Todo: | ||||||||||
22 | Llama 2 | Primary | https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md | “Llama 2 70B: Time (GPU Hours) = 1720320” | TPUv4 | 2023 | $3.22 | https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing | $3,220,000 | https://www.reddit.com/r/LocalLLaMA/comments/1hn8ams/deepseek_v3_was_trained_on_811x_less_the_normal/ | ||||||||||
23 | Estimate only | 2,048 chips is an assumption based on Llama 1. Could/should be 2x-20x. | TPUv5e | 2024 | $1.20 | https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing | $1,200,000 | |||||||||||||
24 | Titan | Analysis | https://importai.substack.com/p/import-ai-365-wmd-benchmark-amazon | "200B dense model on 4T tokens of data across 13,760 NVIDIA A100 chips (using 1,720 P4d nodes). It took 48 days to train. " | TPUv5p | 2024 | $4.20 | https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing | $4,200,000 | |||||||||||
25 | Primary | https://aws.amazon.com/blogs/aws/build-rag-and-agent-based-generative-ai-applications-with-new-amazon-titan-text-premier-model-available-in-amazon-bedrock/ | "MMLU=70.4" | TPUv6 (Trillium) | 2024 | $0 | ||||||||||||||
26 | Gemini | Primary | https://arxiv.org/abs/2312.11805 | “Training Gemini Ultra used a large fleet of TPUv4 accelerators across multiple datacenters.” | ||||||||||||||||
27 | Analysis | https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training | “The TPUv4 has a maximum performance of 275 TFLOP/s in bf16.” | |||||||||||||||||
28 | Analysis | https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training | “According to SemiAnalysis... Gemini Ultra was trained on roughly 57.000 TPUv4’s for 100 days.” | This sheet is owned and maintained by Dr Alan D. Thompson at LifeArchitect.ai. | ||||||||||||||||
29 | GPT-5 | Estimate only | https://lifearchitect.ai/gpt-5/ | Alan's estimate based on comparison and extrapolating from Morgan Stanley research note and other sources. May be 2x-5x. | ||||||||||||||||
30 | Llama 3 | Primary | https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/#:~:text=Meta%20engineers%20trained%20Llama%203,NVIDIA%20Quantum%2D2%20InfiniBand%20networks. | "Meta engineers trained Llama 3 on computer clusters packing 24,576 NVIDIA H100 Tensor Core GPUs" | ||||||||||||||||
31 | Primary | https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md | "Llama 3 70B 6.4M [GPU hours]" extrap to 405B params | |||||||||||||||||
32 | Olympus | |||||||||||||||||||
33 | Grok 2 | Primary | https://www.datacenterdynamics.com/en/news/elon-musk-xais-grok-2-requires-20000-nvidia-h100-gpus-grok-3-may-need-100000/ | "In an interview with Norway wealth fund CEO Nicolai Tangen on Twitter/X spaces... Musk said training the Grok 2 model takes about 20,000 Nvidia H100 GPUs... training the Grok 3 model and beyond will require 100,000 Nvidia H100s." | ||||||||||||||||
34 | Gemini 2 | |||||||||||||||||||
35 | Grok 3 | Primary | https://www.datacenterdynamics.com/en/news/elon-musk-xais-grok-2-requires-20000-nvidia-h100-gpus-grok-3-may-need-100000/ | "In an interview with Norway wealth fund CEO Nicolai Tangen on Twitter/X spaces... Musk said training the Grok 2 model takes about 20,000 Nvidia H100 GPUs... training the Grok 3 model and beyond will require 100,000 Nvidia H100s." | ||||||||||||||||
36 | Other | 1. Training end: Most training end dates assume 1 month before release for easy figures. | ||||||||||||||||||
37 | 2. Total time (hours): column hidden in final report. | |||||||||||||||||||
38 | 3. US$ estimate: Cost estimates using GCP in 2023: A100 @$3.93/hr, TPUv4 @$3.22/hr, H100 @$4.25/hr. | |||||||||||||||||||
39 | ||||||||||||||||||||
40 | ||||||||||||||||||||
41 | ||||||||||||||||||||
42 | ||||||||||||||||||||
43 | ||||||||||||||||||||
44 | ||||||||||||||||||||
45 | ||||||||||||||||||||
46 | ||||||||||||||||||||
47 | ||||||||||||||||||||
48 | ||||||||||||||||||||
49 | ||||||||||||||||||||
50 | ||||||||||||||||||||
51 | ||||||||||||||||||||
52 | ||||||||||||||||||||
53 | ||||||||||||||||||||
54 | ||||||||||||||||||||
55 | ||||||||||||||||||||
56 | ||||||||||||||||||||
57 | ||||||||||||||||||||
58 | ||||||||||||||||||||
59 | ||||||||||||||||||||
60 | ||||||||||||||||||||
61 | ||||||||||||||||||||
62 | ||||||||||||||||||||
63 | ||||||||||||||||||||
64 | ||||||||||||||||||||
65 | ||||||||||||||||||||
66 | ||||||||||||||||||||
67 | ||||||||||||||||||||
68 | ||||||||||||||||||||
69 | ||||||||||||||||||||
70 | ||||||||||||||||||||
71 | ||||||||||||||||||||
72 | ||||||||||||||||||||
73 | ||||||||||||||||||||
74 | ||||||||||||||||||||
75 | ||||||||||||||||||||
76 | ||||||||||||||||||||
77 | ||||||||||||||||||||
78 | ||||||||||||||||||||
79 | ||||||||||||||||||||
80 | ||||||||||||||||||||
81 | ||||||||||||||||||||
82 | ||||||||||||||||||||
83 | ||||||||||||||||||||
84 | ||||||||||||||||||||
85 | ||||||||||||||||||||
86 | ||||||||||||||||||||
87 | ||||||||||||||||||||
88 | ||||||||||||||||||||
89 | ||||||||||||||||||||
90 | ||||||||||||||||||||
91 | ||||||||||||||||||||
92 | ||||||||||||||||||||
93 | ||||||||||||||||||||
94 | ||||||||||||||||||||
95 | ||||||||||||||||||||
96 | ||||||||||||||||||||
97 | ||||||||||||||||||||
98 | ||||||||||||||||||||
99 | ||||||||||||||||||||
100 |