NVIDIA GPU Napkin Calculation

	A	B	D
1	Demand Assumptions
2
3	Number of people	4,000,000,000	We have 8bn in total, and approx. 4bn people with the economic means to afford using AI in their personal and professional lives.
4
5	Token per request	1,000	Let's assume each task consumes 1'000 output token per request. Let's assume each person uses some sort of AI inference 10x per day.
6	# of requests per day	10
7	Token per day per person	10,000	The maximum number of output tokens for Claude is around 4'000 per request. Only three full outputs would get us to 10'000 tokens.
8
9	Other use cases (by corporates, governments, militaries)	5,000	This is excluding any corporate, government, and military uses for automated phone calls, transcriptions, data processing, written communication, drone guidance, missle guidance, remote sensing, automated security monitoring, predictive policing, and hundreds of other use cases I could think of.
10
11	Daily token demand	15,000	I only consider "token" demand. In computer vision and other non-NLP task, the throughput would not be measured in tokens. As language models are currently a dominating modality in AI, I stick to "tokens" for simplification.
12
13	Token demand for the entire world per day	60,000,000,000,000	60tn tokens per day.
14
15	Supply Assumptions
16
17	Number of available GPUs (conservative estimate, the real number is probably lower)
18	Volta	400,000	Too low throughput to be economically viable for commercial inference.
19	Turing	5,300,000	Too low throughput to be economically viable for commercial inference.
20	Ampere	20,500,000	Too low throughput to be economically viable for commercial inference.
21	Hopper	1,000,000
22	Ada Lovelace	5,000,000
23	Blackwell	500,000
24	Total number	32,700,000
25	Total number of viable GPUs (excluding Volta, Turing, Ampere)	6,500,000
26
27	Token throughput per second per GPU (Hopper)	100	Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
28	Token throughput per second per GPU (Ada Lovelace)	10	Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
29	Token throughput per second per GPU (Blackwell)	200	Very rough simplification. Probably one of the major estimates to be contested here. Depends on model choice and quantization. I have assumed Llama-3.1 70B with 8-bit quantization here. This is not a SOTA model.
30
31	Number of seconds per day per GPU	86,400
32
33	Token throughput per day per GPU (Hopper)	8,640,000
34	Token throughput per day per GPU (Ada Lovelace)	864,000
35	Token throughput per day per GPU (Blackwell)	17,280,000
36
37	Token throughput per day for all existing GPUs (Hopper)	8,640,000,000,000
38	Token throughput per day for all existing GPU (Ada Lovelace)	4,320,000,000,000
39	Token throughput per day for all existing GPU (Blackwell)	8,640,000,000,000
40
41	Average throughput per GPU per day	3,323,077
42
43	GPU inference utilization (vs. other uses, maintenance, and other downtime)	90%	Let us generously assume that 90% of GPUs are NOT used for training, research, maintenance, and other non-inference uses. Please note that there can be ineraction effects between training and inference. GPUs may be used for training at night when inference demand is low.
44	GPU utilization for inference	30%	We want to be able to serve all people at the same time, even at peak demand. Therefore, the minimum utilization will be low.
45
46	Effective token throughput per day per GPU (Hopper)	2,332,800
47	Effective token throughput per day per GPU (Ada Lovelace)	233,280
48	Effective token throughput per day per GPU (Blackwell)	4,665,600
49
50	Average effective throughput per GPU per day	897,231
51
52	Token throughput for the entire world's GPU stock per day	5,832,000,000,000	4tn token capacity per day!
53
54	Results
55
56	GPUs needed to satisfy daily token demand	66,872,428
57	GPUs in existence (only viable GPUs)	6,500,000	We need 15x more GPUs than we already have. (Assuming the average inference speed over the past years of GPU architectures.)
58
59	Gap in daily inference tokens (minimum peak capacity needed)	54,168,000,000,000
60	Number of Blackwell GPUs needed to fill the gap	11,610,082
61
62	Revenue Implications
63
64	Cost per Blackwell GPU	20,000	Lower bound for an average price per GPU.
65	Revenue	232,201,646,091	This is 2-3x of NVIDIA's annual revenue.
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100