ABIKLMNOPQ
1
No. of nodesNo. of H100'sTotal switchesFacility & Ops
Lifetime Cost
H100 Node PriceSwitch PriceInfiniband Network
Connector & Cable
Total H100 &
Infiniband Cost
Avg. Node Raw CostAvg. H100
Raw Cost
2
2160$100,000$620,000$0$12,960$732,960$366,480$45,810
3
645128$3,200,000$19,840,000$288,000$675,840$24,003,840$375,060$46,883
4
1281,02448$6,400,000$39,680,000$1,728,000$2,621,440$50,429,440$393,980$49,248
5
2562,04896$12,800,000$79,360,000$3,456,000$5,242,880$100,858,880$393,980$49,248
6
5124,096192$25,600,000$153,600,000$6,912,000$10,485,760$196,597,760$383,980$47,998
7
1,0248,192384$51,200,000$307,200,000$13,824,000$20,971,520$393,195,520$383,980$47,998
8
2,04816,384768$102,400,000$634,880,000$27,648,000$41,943,040$806,871,040$393,980$49,248
9
4,09632,7682,560$204,800,000$1,269,760,000$92,160,000$124,518,400$1,691,238,400$412,900$51,613
10
8,19265,5365,120$409,600,000$2,539,520,000$184,320,000$249,036,800$3,382,476,800$412,900$51,613
11
13,107104,8588,192$655,360,000$4,063,232,000$294,912,000$398,458,880$5,411,962,880$412,900$51,613
12
16,384131,07210,240$819,200,000$5,079,040,000$368,640,000$498,073,600$6,764,953,600$412,900$51,613
13
32,768262,14420,480$1,638,400,000$10,158,080,000$737,280,000$996,147,200$13,529,907,200$412,900$51,613
14
65,536524,28840,960$3,276,800,000$20,316,160,000$1,474,560,000$1,992,294,400$27,059,814,400$412,900$51,613
15
16
Prices and projections were all done as per: Aug 2024
17
H100 price is assumed to be of USD $310k. This includes the infiniband cards. For full 3.2Tbps interconnect (8 x 400Gbps)
You can find the H100 quotation from thinkmate at USD $317k here:
https://web.archive.org/web/20240822233446/https://www.thinkmate.com/quotation-request?a=YToxOntzOjI6ImlkIjtpOjc0OTY2Mzt9
18
64 Port 400Gbps Infiniband Switch, is assumed to $36k.
This is based on the FS.com varient which is cheaper then the nvidia official varient (approx 50k):
https://web.archive.org/web/20240822234224/https://www.fs.com/products/206473.html?now_cid=4887
19
For networking, we are assuming, the following adapters and cables. Cable length standardised to 30M for simplification
(you will need much much longer cables for the larger installs)
- 400G Transceiver ($700): https://www.fs.com/products/200963.html
- 800G Transceiver ($800): https://www.fs.com/products/205113.html
- 30M MTP cable ($220): https://www.fs.com/products/224209.html?attribute=98411&id=3616836
20
We use the Fat-Tree network topology (suggested setup according to nvidia guidelines), optimized for lowest switch count
<= 64 Nodes : 8 x 1 Level Fat Tree Network Plane
<= 2048 Nodes : 8 x 2 Level Fat Tree Network Plane
<= 65,536 Nodes : 8 x 3 Level Fat Tree Network Plane
21
10 gbe networking+switch for internet connectivity, and 1 gbe networking+switch for IPMI, is less than 1k per node, and is zero-ed out, and assumed to be part of the facility cost
22
We use an average of 50k per H100 for labour cost, rack space facilities, internet, facility cooling, parts replacement, and other hardware setup, etc.
This is considered conservative, due to the high cost of electrical generation and cabling required (which is out of my expertise),
And the incredibly high failure rate of the H100 SXM cards (<99.9%).

To illustrate the sheer scale of energy being consumed, each H100 node, can power on average 4 US households.
The energy consumption of some of these clusters, are comparable with small to medium size US cities.
And require dedicated cabling and cooling (which aint cheap)

Nvidia uses a projection of facility and operation cost being 50% of total cost here:
https://s201.q4cdn.com/141608511/files/doc_presentations/2023/Oct/01/ndr_presentation_oct_2023_final.pdf
23