ABCDEFGHIJKLMNOPQRST
1
ModelTraining endChip typeTFLOP/s (max)Chip countWall clock time (days)Total time (hours)Total time (years)Retail cost ($US)MMLU ▼Check (calculated column)
2
GPT-3Apr/2020V10013010,00015 days3,552,000405 years$9M43.9$8,808,960
3
Llama 1Jan/2023A1003122,04821 days1,032,192118 years$4M63.4$4,056,515
4
Llama 2Jun/2023A1003122,04835 days1,720,320196 years$7M68.0$6,760,858
5
TitanApr/2023A10031213,76048 days11,558,4001,319 years$45M70.4$45,424,512
6
GPT-4Aug/2022A10031225,00095 days57,000,0006,503 years$224M86.4$224,010,000
7
GeminiNov/2023TPUv427557,000100 days136,800,00015,606 years$440M90.0$440,496,000
8
Llama 3 70BApr/2024H10098924,57611 days6,300,000719 years$7M82.0$7,560,000
9
Llama 3 405BApr/2024H10098924,57650 days29,491,2003,364 years$125M88.6$125,337,600
10
GPT-5Mar/2024H10098950,000120 days144,000,00016,428 years$612M$612,000,000
11
OlympusAug/2024H100989$0
12
Grok 2Jun/2024H10098920,00050 days57,600,0006,571 years$245M$244,800,000
13
Gemini 2Nov/2024TPUv61847$0
14
Grok 3Dec/2024H100989100,00050 days288,000,00032,855 years$1.2B$1,224,000,000
15
$0
16
17
Model
Source type (primary, analysis, informed estimate)
LinkQuoteChip typePricing date$ per chip-hourSource1M hours
18
GPT-3Primary
https://arxiv.org/abs/2005.14165
“All models were trained on V100 GPU’s on part of a high-bandwidth cluster provided by Microsoft.”V1002020$0.66
https://web.archive.org/web/20200611063415/https://lambdalabs.com/service/gpu-cloud
$660,000
Big pricing disparity between lambda and GCP
19
Analysis
https://arxiv.org/pdf/2204.05149.pdf
Note: this is a nearly-primary source by authors including Google's Dr Jeff Dean. “GPT-3 was trained on 10,000 V100 GPUs... GPT-3 took 405 V100 years to train in 2020.”V1002020$2.48https://www.top500.org/news/google-expands-its-gpu-cloud-options/$2,480,000
Using this number for GPT-3 in final table
20
GPT-4Analysishttps://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training“According to SemiAnalysis, GPT-4 was trained on 25,000 A100’s for roughly 95 days”A1002023$3.93https://gpus.llm-utils.org/a100-gpu-cloud-availability-and-pricing/$3,930,000
21
Llama 1Primaryhttps://arxiv.org/abs/2302.13971“When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days.”H1002023$4.25https://web.archive.org/web/20240108002155/https://coreweave.com/gpu-cloud-pricing$4,250,000Todo:
22
Llama 2Primaryhttps://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md“Llama 2 70B: Time (GPU Hours) = 1720320”TPUv42023$3.22https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing$3,220,000
https://www.reddit.com/r/LocalLLaMA/comments/1hn8ams/deepseek_v3_was_trained_on_811x_less_the_normal/
23
Estimate only2,048 chips is an assumption based on Llama 1. Could/should be 2x-20x.TPUv5e2024$1.20https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing$1,200,000
24
TitanAnalysis
https://importai.substack.com/p/import-ai-365-wmd-benchmark-amazon
"200B dense model on 4T tokens of data across 13,760 NVIDIA A100 chips (using 1,720 P4d nodes). It took 48 days to train. "TPUv5p2024$4.20https://web.archive.org/web/20240105115832/https://cloud.google.com/tpu/pricing$4,200,000
25
Primary
https://aws.amazon.com/blogs/aws/build-rag-and-agent-based-generative-ai-applications-with-new-amazon-titan-text-premier-model-available-in-amazon-bedrock/
"MMLU=70.4"TPUv6 (Trillium)2024$0
26
GeminiPrimary
https://arxiv.org/abs/2312.11805
“Training Gemini Ultra used a large fleet of TPUv4 accelerators across multiple datacenters.”
27
Analysishttps://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training“The TPUv4 has a maximum performance of 275 TFLOP/s in bf16.”
28
Analysishttps://www.semianalysis.com/p/google-gemini-eats-the-world-gemini cited in https://www.lesswrong.com/posts/tJAD2LG9uweeEfjwq/estimating-efficiency-improvements-in-llm-pre-training“According to SemiAnalysis... Gemini Ultra was trained on roughly 57.000 TPUv4’s for 100 days.”
This sheet is owned and maintained by Dr Alan D. Thompson at LifeArchitect.ai.
29
GPT-5Estimate only
https://lifearchitect.ai/gpt-5/
Alan's estimate based on comparison and extrapolating from Morgan Stanley research note and other sources. May be 2x-5x.
30
Llama 3Primary
https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/#:~:text=Meta%20engineers%20trained%20Llama%203,NVIDIA%20Quantum%2D2%20InfiniBand%20networks.
"Meta engineers trained Llama 3 on computer clusters packing 24,576 NVIDIA H100 Tensor Core GPUs"
31
Primary
https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
"Llama 3 70B 6.4M [GPU hours]" extrap to 405B params
32
Olympus
33
Grok 2Primary
https://www.datacenterdynamics.com/en/news/elon-musk-xais-grok-2-requires-20000-nvidia-h100-gpus-grok-3-may-need-100000/
"In an interview with Norway wealth fund CEO Nicolai Tangen on Twitter/X spaces... Musk said training the Grok 2 model takes about 20,000 Nvidia H100 GPUs... training the Grok 3 model and beyond will require 100,000 Nvidia H100s."
34
Gemini 2
35
Grok 3Primary
https://www.datacenterdynamics.com/en/news/elon-musk-xais-grok-2-requires-20000-nvidia-h100-gpus-grok-3-may-need-100000/
"In an interview with Norway wealth fund CEO Nicolai Tangen on Twitter/X spaces... Musk said training the Grok 2 model takes about 20,000 Nvidia H100 GPUs... training the Grok 3 model and beyond will require 100,000 Nvidia H100s."
36
Other
1. Training end: Most training end dates assume 1 month before release for easy figures.
37
2. Total time (hours): column hidden in final report.
38
3. US$ estimate: Cost estimates using GCP in 2023: A100 @$3.93/hr, TPUv4 @$3.22/hr, H100 @$4.25/hr.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100