CDEFGHIJKLMNOPQRSTUV
1
QuestionCategoryLanguage benchmark?Reverse-code?Exclude?ResolutionMetaculus Prediction [25%, 50%, 75%]OptimismRelative error|Relative error|Relative log errorCDFCDF (rev)Log scoreHorizon / daysForecastersPredictionsNotes
2
What will the state-of-the-art object detection performance on COCO be, at 2021-06-14 in box Average Precision (box AP)?Benchmark58.7057.3257.3657.61Q4-2%2%0.97797%97%-2.611203583Metaculus expected ~no change from the top score at the time but a new paper that beat the SotA came out ~1 month after question close and ~3 months before resolution. Looking at the chart plotting SotA over time on PapersWithCode, this doesn't seem like a big, unexpected jump. Unclear why the community was so overconfident on "no change" since no-one left any comments.
3
What will the value of the herein defined Image Classification Performance Index be on 2021-06-14?Benchmark120.87115.08115.47117.09Q4-4%4%0.95596%96%0.0828160227The initial prediction was pretty close to the true value but it declined over time, presumably due to lack of progress. I guess people overupdated on that and didn't put enough probability mass on the possibility of a big jump. For short horizon questions (a few months) like this one, there's always the problem of seasonal publication dynamics (e.g. big conferences) swamping actual progress as the main factor.
4
What will the state-of-the-art performance on semantic segmentation on Cityscapes be at 2021-06-14 in mean IoU in percent (MIoU%)?Benchmark84.3083.3283.3483.42Q4-1%1%0.98996%96%-0.3928160195I have no idea what's going on here. Resolution and SotA at question open as implied by the question description don't match what's on PapersWithCode. The community prediction was consistently below the PapersWithCode SotA during all the time the question was open.
5
What will the state-of-the-art performance on semantic segmentation of PASCAL-Context be at 2021-06-14 in mean IoU in percent (MIoU%)?Benchmark60.5058.9458.9559.01Q4-3%3%0.97495%95%-1.2422552175Seems like another case of anchoring to current SOTA as time goes by without new results + something coming out after question close that beats the SOTA in a way that's totally unsurprising if one looks at the trend over several years.
6
What will the state-of-the-art object detection performance on COCO be, at 2022-01-14 in box average precision (box AP)?Benchmark63.3057.5958.8662.07Q4-7%7%0.93093%93%-2.173733096Looking at this graph I don't see any obvious jumps between 2021 and 2022. Not sure why the community underestimated progress so much.
7
What will the state-of-the-art performance on image classification on ImageNet be at 2022-01-14 in top-1 accuracy?Benchmark88.3086.5386.9187.58Q4-2%2%0.98492%92%-1.1437348168
8
What will the state-of-the-art performance on semantic segmentation of PASCAL-Context be on 2023-02-14 in mean IoU in percent (MIoU%), amongst models not trained on extra data?Benchmark68.8761.7063.5665.53Q4-8%8%0.92392%92%-1.9567028119
9
What will the state-of-the-art language modelling performance on One Billion Word be on 2022-01-14, in perplexity?Benchmark20.2521.1621.4621.54Q4-6%6%0.9448%92%-1.0937341181SOTA was beaten a few days before question close and the Metaculus prediction shows a very narrow peak around that figure, while the community prediction is more spread out. This suggests to me that top forecasters predicted nothing would happen in the 9 months between close and resolution, while other forecasters either failed to update on the latest result or were less overconfident.
10
What will the state-of-the-art performance on semantic segmentation on Cityscapes be at 2022-01-14 in mean IoU in percent (MIoU%)?Benchmark84.4083.3283.4183.88Q4-1%1%0.98889%89%-0.1237335122
11
What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2021-06-14?Benchmark90.9490.7390.7490.77Q40%0%0.99888%88%0.6412159208
12
What will be the state-of-the-art language modelling performance (in perplexity) on WikiText-103 by the following dates? (January 14, 2022)Benchmark14.8015.3015.7115.76Q4-6%6%0.94212%88%0.6232544125This one was confusing because the fine print contained a restriction about how models had to be trained to count towards resolution that made it hard for people to find reliable info, e.g. the paperswithcode dataset reported models that didn't meet the restriction. See this comment thread for details.
13
What will the state-of-the-art performance on one-shot image classification on MiniImageNet be, on 2021-06-14, in accuracy?Benchmark84.8182.9482.9783.77Q4-2%2%0.97885%85%-0.11276100314
14
What will the state-of-the-art performance on image classification on ImageNet be at 2021-06-14 in top-1 accuracy?Benchmark86.7886.4686.5386.64Q40%0%0.99783%83%0.7812174291
15
What will the state-of-the-art language modelling performance on WikiText-103 be on 2023-02-14 in perplexity, amongst models not trained on extra data?Benchmark14.8015.3515.7315.76Q4-6%6%0.94118%82%-0.166702697
16
What will the state-of-the-art language modelling performance on One Billion Word be on 2023-02-14, in perplexity, amongst models not trained on extra data?Benchmark20.2520.4221.2221.51Q4-5%5%0.95421%79%0.8267036109
17
What will the state-of-the-art object detection performance on COCO be, on 2023-02-14 in box average precision (box AP) amongst all models?Benchmark65.4060.9363.4966.20Q3-3%3%0.97178%78%-0.176713194Same as the question below, no discontinuous jumps.
18
What will the state-of-the-art performance on one-shot image classification on miniImageNet be, at 2022-01-14 in accuracy?Benchmark84.8182.9583.7185.31Q3-1%1%0.98770%70%-0.2737334137
19
What share (in %) of the world's super-compute performance will be based in the United States in the November 2022 publication of TOP500 list?Compute43.7431.6438.6445.74Q3-12%12%0.88369%69%1.4168132101IIRC there were some shenanigans re: China under-reporting these numbers. Lots of info in the comments of this similar INFER Pub question that I was heavily involved in, but I forgot the details.
20
What percent will software and information services contribute to US GDP in Q1 of 2021?Econ3.092.943.043.14Q3-2%2%0.98364%64%1.9121783226
21
What percent of total GDP will software and information services contribute to US GDP in Q3 of 2021?Econ3.223.053.173.29Q3-2%2%0.98461%61%1.232983892
22
What will the state-of-the-art performance on semantic segmentation on Cityscapes be on 2023-02-14 in mean IoU in percent (MIoU%), amongst models not trained on extra data?Benchmark84.4083.3584.0085.06Q30%0%0.99561%61%0.696702993
23
What will the price of IGM be, on 2021-06-14?Econ392.89363.61387.07411.36Q3-1%1%0.98559%59%1.0012080220
24
What share (in %) of the world's super-compute performance will be United States-based in the TOP500 list on the following dates? (June 2021)Compute30.5526.0330.2836.31Q3-1%1%0.99155%55%0.8213365191
25
What will the value of the herein defined Object Detection Performance Index be on 2023-02-15?Benchmark135.27129.12134.28140.20Q3-1%1%0.99355%55%0.8267134105
26
What will the value of the herein defined Image Classification Performance Index be on 2022-01-14?Benchmark123.71120.49123.46126.66Q30%0%0.99853%53%0.8737339121
27
How many e-prints on AI Safety, Interpretability or Explainability will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio260.00223.81256.86288.96Q3-1%1%0.98853%53%1.1012184221
28
How many e-prints on multi-modal machine learning will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio109.0093.91107.92122.07Q3-1%1%0.99051%51%1.1012165196
29
What will the value of the herein defined Object Detection Performance Index be on 2021-06-14?Benchmark121.81119.73122.29125.81Q20%0%1.00451%51%0.9028154198
30
What will the state-of-the-art performance on SuperGLUE be on 2021-06-14?Benchmark90.4090.3190.4090.62Q30%0%1.00051%51%2.2712071266
31
What will be the state-of-the-art language modelling performance (in perplexity) on WikiText-103 by the following dates? (June 14, 2021)Benchmark15.7915.7215.7615.77Q10%0%1.00250%50%5.9215763188
32
What will the state-of-the-art language text-to-SQL performance on WikiSQL be at 2021-06-14 in logical form test accuracy?Benchmark87.8087.8187.8388.23Q10%0%1.00050%50%5.9412066216
33
What will the combined sector weighting of Information Technology and Communications be, in the S&P 500 on 2021-06-14?Econ39.1638.0839.2040.36Q20%0%1.00149%49%2.2922675256
34
How many e-prints on Few-Shot Learning will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio744.00694.73745.57812.96Q20%0%1.00248%48%1.1712173185
35
How much will the average degree of automation change for key US professions from December 2020 to the following dates? (January 2022)Econ1.31-0.451.754.46Q233%33%1.33545%45%2.4835750151Wide range, true value came close to median :shrug:
36
How many e-prints on multi-modal learning will be published on ArXiv over the 2021-12-14 to 2022-01-14 period?Biblio248.00223.54259.35303.87Q25%5%1.04643%43%1.7731135117
37
How many e-prints on AI Safety, Interpretability or Explainability will be published on arXiv over the 2021-01-14 to 2022-01-14 period?Biblio560.00515.60575.86636.73Q23%3%1.02842%42%2.5931139141
38
What will the value of the herein defined Object Detection Performance Index be on 2022-01-14?Benchmark125.77123.59126.96130.36Q21%1%1.00941%41%0.6837335103
39
What will the value of the herein defined Image Classification Performance Index be on 2023-02-14?Benchmark128.94125.71130.87136.61Q21%1%1.01540%40%0.9967232109
40
What will the Federal Reserves' Industrial Production Index be for April 2021, for semiconductors, printed circuit boards and related products?Econ198.80197.98199.82201.86Q21%1%1.00535%35%1.8912265177
41
What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2022-01-14?Benchmark90.9490.9091.0591.28Q20%0%1.00135%35%1.0132639152
42
What will the combined sector weighting of Information Technology and Communications be, in the S&P 500 on 2022-01-14?Econ38.3537.6739.1740.69Q22%2%1.02135%35%1.5031136129
43
What will be the maximum compute (in petaFLOPS-days) ever used in training an AI experiment by the following dates? (Feb-2023)Compute31712.9626933.2657277.55116671.87Q281%81%1.80629%29%0.8767031114
44
How many e-prints on Few-Shot Learning will be published on ArXiv over the 2021-02-14 to 2023-02-14 period?Biblio4089.003999.784476.934978.83Q29%9%1.09529%29%1.856793197
45
What will the the performance be of the top-performing supercomputer (in exaFLOPS) in the TOP500 be according to their November 2022 list?Compute1.101.021.281.49Q216%16%1.15728%28%1.3168035122
46
How many e-prints on Few-Shot Learning will be published on ArXiv over the 2021-01-14 to 2022-01-14 period?Biblio1671.001668.621833.781985.18Q210%10%1.09725%25%1.4631136123
47
How many e-prints on AI Safety, interpretability or explainability will be published on ArXiv over the 2021-02-14 to 2023-02-14 period?Biblio1211.001230.031432.611652.76Q118%18%1.18323%23%2.597343199
48
What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (November 2022)Compute4.864.725.426.17Q211%11%1.11422%22%1.7168039108
49
What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (June 2021)Compute2.802.873.173.99Q113%13%1.13320%20%-0.1013491321
50
What will the state-of-the-art performance on one-shot image classification on miniImageNet be, on 2023-02-14 in accuracy, amongst models not trained on extra data?Benchmark86.1187.0689.9391.01Q14%4%1.04420%20%-1.326713099
51
How many Reinforcement Learning e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio1590.001586.841700.141815.22Q27%7%1.06919%19%1.7712159163
52
What percent of total GDP will software and information services contribute to US GDP in Q3 of 2022?Econ3.133.163.293.43Q15%5%1.05019%19%1.926813276
53
How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio8207.008338.518713.279200.80Q16%6%1.06218%18%0.9412060172
54
How many Computation and Language e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period?Biblio3938.004109.834431.174841.95Q113%13%1.12515%15%1.2212062179
55
What will be the maximum compute (in petaFLOPS-days) ever used in training an AI experiment by the following dates? (January 14, 2022)Compute13542.008407.3016229.3231621.25Q220%20%1.19812%12%0.8737242129
56
What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2023-02-14?Benchmark90.9491.3591.8592.53Q11%1%1.01011%11%0.3467230118
57
What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (November 2021)Compute3.043.473.954.47Q130%30%1.3007%7%0.8824736124
58
How many Reinforcement Learning e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period?Biblio7202.008148.608768.559391.50Q122%22%1.2186%6%0.3468037102
59
What will the price of IGM be, on 2023-02-14, in 2019 USD?Econ280.98391.28460.52537.20Q164%64%1.6394%4%-1.0767132119Probably a similar story, haven't run the numbers.
60
How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period?Biblio37123.0042429.8644676.1246966.40Q120%20%1.2034%4%-0.597352983
61
How many Natural Language Processing e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period?Biblio17199.0019848.9621082.8822363.20Q123%23%1.2263%3%-0.3967931106
62
How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period?Biblio17249.0018920.1819666.6520399.52Q114%14%1.1403%3%0.253114399
63
How many Reinforcement Learning e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period?Biblio3375.003825.484015.124215.56Q119%19%1.1902%2%0.063103793
64
How many Natural Language Processing e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period?Biblio8066.009142.369581.5410083.44Q119%19%1.1882%2%-0.263114198Growth was surprisingly weak compared to previous years. Metaculus bet on exponential growth continuing.
65
What will Alphabet Inc.'s market capitalisation be at market close on 2023-02-14?Econ1.051.541.842.20Q175%75%1.7501%1%-1.076713190Market cap at question close (2021/4/15) was ~$1.46T, so the median prediction implies ~12% growth per year. That's surprisingly bullish imo. Prediction looks even worse because 2022 saw a stock market slump.
66
What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates? (June 14, 2021)Compute40.8060.9062.9465.93Q154%54%1.5430%0%-8.0712074203GPU price spike, see above.
67
What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates? (January 14, 2022)Compute23.2063.6968.8373.77Q1197%197%2.9670%0%-5.6730745133Biggest miss in terms of relative error. I blame it on the post-covid price spike (driven by supply chain disruptions and crypto mining afaik). I don't quite remember when exactly prices peaked but this piece from Nov 2021 suggests they were still on the way up back then.
68
What will the highest score be, on Atari 2600 Montezuma's Revenge, by any ML model that is un-augmented with domain knowledge on 2022-01-14?Benchmark43791.0044321.8845356.53140920.33Q14%4%1.0360%0%6.8237342173Resolution coincided with the lower bound of the range, so this number is slightly fake. The Metaculus prediction was bunched up against the lower bound, but still predicted higher numbers though.
69
What will the highest score of any ML model that is un-augmented with domain knowledge on Atari 2600 Montezuma's Revenge be on 2023-02-14?Benchmark43791.0098544.461483981.701501456.21Q13289%3289%33.8880%0%5.3467035151Same as above, though less so because there was a significant peak on the upper end too.
70
What will the state-of-the-art language text-to-SQL performance on WikiSQL be on 2023-02-14 in logical form test accuracy?Benchmark87.8090.5291.6092.72Q14%4%1.0430%0%2.476702985Roughly same as above, though much less so because there was a good chunk of probability mass in the upper part of the range.