ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
h/t Lukas Finvedden for help with this.
2
Training FLOP (physical FLOP)
3
1.00E+29
My best guess for the biggest training run by 2035.
4
k
5
1.5
If there are p parameters, a forward pass takes k*p FLOP. Bio Anchors uses k=1.5
6
Train model for mp tokens
7
20Chinchilla
8
Horizon length, h
9
3
Longer horizons increase training FLOP without increasing runtime FLOP/s.
10
p
11
1.92E+13
Training takes 3hkmp^2 FLOP. [Why the factor of 3? Training takes (FLOP for forward pass)+(FLOP for backward pass)~=3x FLOP for forward pass]
12
Runtime FLOP/s of AI
13
2.89E+13
14
Duration of training (seconds)
15
1.00E+074 months
16
17
How many SOTA AIs, by reallocating training compute to inference
18
3.46E+08
Of course, you could run more AIs using other compute.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100