ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Demand Assumptions
2
3
Number of people4,000,000,000
We have 8bn in total, and approx. 4bn people with the economic means to afford using AI in their personal and professional lives.
4
5
Token per request1,000
Let's assume each task consumes 1'000 output token per request. Let's assume each person uses some sort of AI inference 10x per day.
6
# of requests per day10
7
Token per day per person10,000
The maximum number of output tokens for Claude is around 4'000 per request. Only three full outputs would get us to 10'000 tokens.
8
9
Other use cases (by corporates, governments, militaries)5,000
This is excluding any corporate, government, and military uses for automated phone calls, transcriptions, data processing, written communication, drone guidance, missle guidance, remote sensing, automated security monitoring, predictive policing, and hundreds of other use cases I could think of.
10
11
Daily token demand15,000
I only consider "token" demand. In computer vision and other non-NLP task, the throughput would not be measured in tokens. As language models are currently a dominating modality in AI, I stick to "tokens" for simplification.
12
13
Token demand for the entire world per day60,000,000,000,000
60tn tokens per day.
14
15
Supply Assumptions
16
17
Number of available GPUs (conservative estimate, the real number is probably lower)
18
Volta400,000
Too low throughput to be economically viable for commercial inference.
19
Turing5,300,000
Too low throughput to be economically viable for commercial inference.
20
Ampere20,500,000
Too low throughput to be economically viable for commercial inference.
21
Hopper1,000,000
22
Ada Lovelace5,000,000
23
Blackwell500,000
24
Total number32,700,000
25
Total number of viable GPUs (excluding Volta, Turing, Ampere)6,500,000
26
27
Token throughput per second per GPU (Hopper)100
Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
28
Token throughput per second per GPU (Ada Lovelace)10
Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
29
Token throughput per second per GPU (Blackwell)200
Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
30
31
Number of seconds per day per GPU86,400
32
33
Token throughput per day per GPU (Hopper)8,640,000
34
Token throughput per day per GPU (Ada Lovelace)864,000
35
Token throughput per day per GPU (Blackwell)17,280,000
36
37
Token throughput per day for all existing GPUs (Hopper)8,640,000,000,000
38
Token throughput per day for all existing GPU (Ada Lovelace)4,320,000,000,000
39
Token throughput per day for all existing GPU (Blackwell)8,640,000,000,000
40
41
Average throughput per GPU per day3,323,077
42
43
GPU inference utilization (vs. other uses, maintenance, and other downtime)
90%
Let us generously assume that 90% of GPUs are NOT used for training, research, maintenance, and other non-inference uses. Please note that there can be ineraction effects between training and inference. GPUs may be used for training at night when inference demand is low.
44
GPU utilization for inference30%
We want to be able to serve all people at the same time, even at peak demand. Therefore, the minimum utilization will be low.
45
46
Effective token throughput per day per GPU (Hopper)2,332,800
47
Effective token throughput per day per GPU (Ada Lovelace)233,280
48
Effective token throughput per day per GPU (Blackwell)4,665,600
49
50
Average effective throughput per GPU per day897,231
51
52
Token throughput for the entire world's GPU stock per day5,832,000,000,000
4tn token capacity per day!
53
54
Results
55
56
GPUs needed to satisfy daily token demand66,872,428
57
GPUs in existence (only viable GPUs)6,500,000
We need 15x more GPUs than we already have. (Assuming the average inference speed over the past years of GPU architectures.)
58
59
Gap in daily inference tokens (minimum peak capacity needed)54,168,000,000,000
60
Number of Blackwell GPUs needed to fill the gap11,610,082
61
62
Revenue Implications
63
64
Cost per Blackwell GPU20,000
Lower bound for an average price per GPU.
65
Revenue232,201,646,091
This is 2-3x of NVIDIA's annual revenue.
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100