GPT-4.5/5 Forecasting Template [Public]

	A	B	C	D	E	F	G	H	I
1	The questions will resolve based on the best GPT-4.5-based system (including post-training enhancements) 6 months after the release of the first model named GPT-4.5. If nothing named GPT-4.5 is released, nothing resolves. If GPT-4.5 is not applied to a well-specificied benchmark, that question resolves ambigious; there is no analogous condition for non-benchmark questions. You could alternatively forecast for GPT-5, or a specified date rather than model.
2	All questions are optional but encouraged! 10th and 90th percentile are optional, median is required. For yes/no questions, input your probaility where the median would go.					Forecasts
3		References				Optional	Required	Optional	Highly encouraged
4	Question	GPT-3.5	Human	Expert Human	Current Best	10th percentile	Median	90th percentile	Rationale
5	GPQA	28%	34%	74%	60%
6	WebArena	6%	78%	-	14%
7	GAIA	-	92%		24%
8	SWE-Bench, unassisted	0.2%	-	-	13.86%
9	InterCode	46.50%	-	-	48.50%
10	Anthropic ASL-3 ARA evals (note: only 5 tasks)	0%	0%	100%	0%
11	Chatbot Arena ELO, relative to GPT-4-1106-preview	-138	-	-	0
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100