C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Question | Category | Language benchmark? | Reverse-code? | Exclude? | Resolution | Metaculus Prediction [25%, 50%, 75%] | Optimism | Relative error | |Relative error| | Relative log error | CDF | CDF (rev) | Log score | Horizon / days | Forecasters | Predictions | Notes | |||
2 | What will the state-of-the-art object detection performance on COCO be, at 2021-06-14 in box Average Precision (box AP)? | Benchmark | 58.70 | 57.32 | 57.36 | 57.61 | Q4 | -2% | 2% | 0.977 | 97% | 97% | -2.61 | 120 | 35 | 83 | Metaculus expected ~no change from the top score at the time but a new paper that beat the SotA came out ~1 month after question close and ~3 months before resolution. Looking at the chart plotting SotA over time on PapersWithCode, this doesn't seem like a big, unexpected jump. Unclear why the community was so overconfident on "no change" since no-one left any comments. | ||||
3 | What will the value of the herein defined Image Classification Performance Index be on 2021-06-14? | Benchmark | 120.87 | 115.08 | 115.47 | 117.09 | Q4 | -4% | 4% | 0.955 | 96% | 96% | 0.08 | 281 | 60 | 227 | The initial prediction was pretty close to the true value but it declined over time, presumably due to lack of progress. I guess people overupdated on that and didn't put enough probability mass on the possibility of a big jump. For short horizon questions (a few months) like this one, there's always the problem of seasonal publication dynamics (e.g. big conferences) swamping actual progress as the main factor. | ||||
4 | What will the state-of-the-art performance on semantic segmentation on Cityscapes be at 2021-06-14 in mean IoU in percent (MIoU%)? | Benchmark | 84.30 | 83.32 | 83.34 | 83.42 | Q4 | -1% | 1% | 0.989 | 96% | 96% | -0.39 | 281 | 60 | 195 | I have no idea what's going on here. Resolution and SotA at question open as implied by the question description don't match what's on PapersWithCode. The community prediction was consistently below the PapersWithCode SotA during all the time the question was open. | ||||
5 | What will the state-of-the-art performance on semantic segmentation of PASCAL-Context be at 2021-06-14 in mean IoU in percent (MIoU%)? | Benchmark | 60.50 | 58.94 | 58.95 | 59.01 | Q4 | -3% | 3% | 0.974 | 95% | 95% | -1.24 | 225 | 52 | 175 | Seems like another case of anchoring to current SOTA as time goes by without new results + something coming out after question close that beats the SOTA in a way that's totally unsurprising if one looks at the trend over several years. | ||||
6 | What will the state-of-the-art object detection performance on COCO be, at 2022-01-14 in box average precision (box AP)? | Benchmark | 63.30 | 57.59 | 58.86 | 62.07 | Q4 | -7% | 7% | 0.930 | 93% | 93% | -2.17 | 373 | 30 | 96 | Looking at this graph I don't see any obvious jumps between 2021 and 2022. Not sure why the community underestimated progress so much. | ||||
7 | What will the state-of-the-art performance on image classification on ImageNet be at 2022-01-14 in top-1 accuracy? | Benchmark | 88.30 | 86.53 | 86.91 | 87.58 | Q4 | -2% | 2% | 0.984 | 92% | 92% | -1.14 | 373 | 48 | 168 | |||||
8 | What will the state-of-the-art performance on semantic segmentation of PASCAL-Context be on 2023-02-14 in mean IoU in percent (MIoU%), amongst models not trained on extra data? | Benchmark | 68.87 | 61.70 | 63.56 | 65.53 | Q4 | -8% | 8% | 0.923 | 92% | 92% | -1.95 | 670 | 28 | 119 | |||||
9 | What will the state-of-the-art language modelling performance on One Billion Word be on 2022-01-14, in perplexity? | Benchmark | 20.25 | 21.16 | 21.46 | 21.54 | Q4 | -6% | 6% | 0.944 | 8% | 92% | -1.09 | 373 | 41 | 181 | SOTA was beaten a few days before question close and the Metaculus prediction shows a very narrow peak around that figure, while the community prediction is more spread out. This suggests to me that top forecasters predicted nothing would happen in the 9 months between close and resolution, while other forecasters either failed to update on the latest result or were less overconfident. | ||||
10 | What will the state-of-the-art performance on semantic segmentation on Cityscapes be at 2022-01-14 in mean IoU in percent (MIoU%)? | Benchmark | 84.40 | 83.32 | 83.41 | 83.88 | Q4 | -1% | 1% | 0.988 | 89% | 89% | -0.12 | 373 | 35 | 122 | |||||
11 | What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2021-06-14? | Benchmark | 90.94 | 90.73 | 90.74 | 90.77 | Q4 | 0% | 0% | 0.998 | 88% | 88% | 0.64 | 121 | 59 | 208 | |||||
12 | What will be the state-of-the-art language modelling performance (in perplexity) on WikiText-103 by the following dates? (January 14, 2022) | Benchmark | 14.80 | 15.30 | 15.71 | 15.76 | Q4 | -6% | 6% | 0.942 | 12% | 88% | 0.62 | 325 | 44 | 125 | This one was confusing because the fine print contained a restriction about how models had to be trained to count towards resolution that made it hard for people to find reliable info, e.g. the paperswithcode dataset reported models that didn't meet the restriction. See this comment thread for details. | ||||
13 | What will the state-of-the-art performance on one-shot image classification on MiniImageNet be, on 2021-06-14, in accuracy? | Benchmark | 84.81 | 82.94 | 82.97 | 83.77 | Q4 | -2% | 2% | 0.978 | 85% | 85% | -0.11 | 276 | 100 | 314 | |||||
14 | What will the state-of-the-art performance on image classification on ImageNet be at 2021-06-14 in top-1 accuracy? | Benchmark | 86.78 | 86.46 | 86.53 | 86.64 | Q4 | 0% | 0% | 0.997 | 83% | 83% | 0.78 | 121 | 74 | 291 | |||||
15 | What will the state-of-the-art language modelling performance on WikiText-103 be on 2023-02-14 in perplexity, amongst models not trained on extra data? | Benchmark | 14.80 | 15.35 | 15.73 | 15.76 | Q4 | -6% | 6% | 0.941 | 18% | 82% | -0.16 | 670 | 26 | 97 | |||||
16 | What will the state-of-the-art language modelling performance on One Billion Word be on 2023-02-14, in perplexity, amongst models not trained on extra data? | Benchmark | 20.25 | 20.42 | 21.22 | 21.51 | Q4 | -5% | 5% | 0.954 | 21% | 79% | 0.82 | 670 | 36 | 109 | |||||
17 | What will the state-of-the-art object detection performance on COCO be, on 2023-02-14 in box average precision (box AP) amongst all models? | Benchmark | 65.40 | 60.93 | 63.49 | 66.20 | Q3 | -3% | 3% | 0.971 | 78% | 78% | -0.17 | 671 | 31 | 94 | Same as the question below, no discontinuous jumps. | ||||
18 | What will the state-of-the-art performance on one-shot image classification on miniImageNet be, at 2022-01-14 in accuracy? | Benchmark | 84.81 | 82.95 | 83.71 | 85.31 | Q3 | -1% | 1% | 0.987 | 70% | 70% | -0.27 | 373 | 34 | 137 | |||||
19 | What share (in %) of the world's super-compute performance will be based in the United States in the November 2022 publication of TOP500 list? | Compute | 43.74 | 31.64 | 38.64 | 45.74 | Q3 | -12% | 12% | 0.883 | 69% | 69% | 1.41 | 681 | 32 | 101 | IIRC there were some shenanigans re: China under-reporting these numbers. Lots of info in the comments of this similar INFER Pub question that I was heavily involved in, but I forgot the details. | ||||
20 | What percent will software and information services contribute to US GDP in Q1 of 2021? | Econ | 3.09 | 2.94 | 3.04 | 3.14 | Q3 | -2% | 2% | 0.983 | 64% | 64% | 1.91 | 217 | 83 | 226 | |||||
21 | What percent of total GDP will software and information services contribute to US GDP in Q3 of 2021? | Econ | 3.22 | 3.05 | 3.17 | 3.29 | Q3 | -2% | 2% | 0.984 | 61% | 61% | 1.23 | 298 | 38 | 92 | |||||
22 | What will the state-of-the-art performance on semantic segmentation on Cityscapes be on 2023-02-14 in mean IoU in percent (MIoU%), amongst models not trained on extra data? | Benchmark | 84.40 | 83.35 | 84.00 | 85.06 | Q3 | 0% | 0% | 0.995 | 61% | 61% | 0.69 | 670 | 29 | 93 | |||||
23 | What will the price of IGM be, on 2021-06-14? | Econ | 392.89 | 363.61 | 387.07 | 411.36 | Q3 | -1% | 1% | 0.985 | 59% | 59% | 1.00 | 120 | 80 | 220 | |||||
24 | What share (in %) of the world's super-compute performance will be United States-based in the TOP500 list on the following dates? (June 2021) | Compute | 30.55 | 26.03 | 30.28 | 36.31 | Q3 | -1% | 1% | 0.991 | 55% | 55% | 0.82 | 133 | 65 | 191 | |||||
25 | What will the value of the herein defined Object Detection Performance Index be on 2023-02-15? | Benchmark | 135.27 | 129.12 | 134.28 | 140.20 | Q3 | -1% | 1% | 0.993 | 55% | 55% | 0.82 | 671 | 34 | 105 | |||||
26 | What will the value of the herein defined Image Classification Performance Index be on 2022-01-14? | Benchmark | 123.71 | 120.49 | 123.46 | 126.66 | Q3 | 0% | 0% | 0.998 | 53% | 53% | 0.87 | 373 | 39 | 121 | |||||
27 | How many e-prints on AI Safety, Interpretability or Explainability will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 260.00 | 223.81 | 256.86 | 288.96 | Q3 | -1% | 1% | 0.988 | 53% | 53% | 1.10 | 121 | 84 | 221 | |||||
28 | How many e-prints on multi-modal machine learning will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 109.00 | 93.91 | 107.92 | 122.07 | Q3 | -1% | 1% | 0.990 | 51% | 51% | 1.10 | 121 | 65 | 196 | |||||
29 | What will the value of the herein defined Object Detection Performance Index be on 2021-06-14? | Benchmark | 121.81 | 119.73 | 122.29 | 125.81 | Q2 | 0% | 0% | 1.004 | 51% | 51% | 0.90 | 281 | 54 | 198 | |||||
30 | What will the state-of-the-art performance on SuperGLUE be on 2021-06-14? | Benchmark | 90.40 | 90.31 | 90.40 | 90.62 | Q3 | 0% | 0% | 1.000 | 51% | 51% | 2.27 | 120 | 71 | 266 | |||||
31 | What will be the state-of-the-art language modelling performance (in perplexity) on WikiText-103 by the following dates? (June 14, 2021) | Benchmark | 15.79 | 15.72 | 15.76 | 15.77 | Q1 | 0% | 0% | 1.002 | 50% | 50% | 5.92 | 157 | 63 | 188 | |||||
32 | What will the state-of-the-art language text-to-SQL performance on WikiSQL be at 2021-06-14 in logical form test accuracy? | Benchmark | 87.80 | 87.81 | 87.83 | 88.23 | Q1 | 0% | 0% | 1.000 | 50% | 50% | 5.94 | 120 | 66 | 216 | |||||
33 | What will the combined sector weighting of Information Technology and Communications be, in the S&P 500 on 2021-06-14? | Econ | 39.16 | 38.08 | 39.20 | 40.36 | Q2 | 0% | 0% | 1.001 | 49% | 49% | 2.29 | 226 | 75 | 256 | |||||
34 | How many e-prints on Few-Shot Learning will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 744.00 | 694.73 | 745.57 | 812.96 | Q2 | 0% | 0% | 1.002 | 48% | 48% | 1.17 | 121 | 73 | 185 | |||||
35 | How much will the average degree of automation change for key US professions from December 2020 to the following dates? (January 2022) | Econ | 1.31 | -0.45 | 1.75 | 4.46 | Q2 | 33% | 33% | 1.335 | 45% | 45% | 2.48 | 357 | 50 | 151 | Wide range, true value came close to median :shrug: | ||||
36 | How many e-prints on multi-modal learning will be published on ArXiv over the 2021-12-14 to 2022-01-14 period? | Biblio | 248.00 | 223.54 | 259.35 | 303.87 | Q2 | 5% | 5% | 1.046 | 43% | 43% | 1.77 | 311 | 35 | 117 | |||||
37 | How many e-prints on AI Safety, Interpretability or Explainability will be published on arXiv over the 2021-01-14 to 2022-01-14 period? | Biblio | 560.00 | 515.60 | 575.86 | 636.73 | Q2 | 3% | 3% | 1.028 | 42% | 42% | 2.59 | 311 | 39 | 141 | |||||
38 | What will the value of the herein defined Object Detection Performance Index be on 2022-01-14? | Benchmark | 125.77 | 123.59 | 126.96 | 130.36 | Q2 | 1% | 1% | 1.009 | 41% | 41% | 0.68 | 373 | 35 | 103 | |||||
39 | What will the value of the herein defined Image Classification Performance Index be on 2023-02-14? | Benchmark | 128.94 | 125.71 | 130.87 | 136.61 | Q2 | 1% | 1% | 1.015 | 40% | 40% | 0.99 | 672 | 32 | 109 | |||||
40 | What will the Federal Reserves' Industrial Production Index be for April 2021, for semiconductors, printed circuit boards and related products? | Econ | 198.80 | 197.98 | 199.82 | 201.86 | Q2 | 1% | 1% | 1.005 | 35% | 35% | 1.89 | 122 | 65 | 177 | |||||
41 | What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2022-01-14? | Benchmark | 90.94 | 90.90 | 91.05 | 91.28 | Q2 | 0% | 0% | 1.001 | 35% | 35% | 1.01 | 326 | 39 | 152 | |||||
42 | What will the combined sector weighting of Information Technology and Communications be, in the S&P 500 on 2022-01-14? | Econ | 38.35 | 37.67 | 39.17 | 40.69 | Q2 | 2% | 2% | 1.021 | 35% | 35% | 1.50 | 311 | 36 | 129 | |||||
43 | What will be the maximum compute (in petaFLOPS-days) ever used in training an AI experiment by the following dates? (Feb-2023) | Compute | 31712.96 | 26933.26 | 57277.55 | 116671.87 | Q2 | 81% | 81% | 1.806 | 29% | 29% | 0.87 | 670 | 31 | 114 | |||||
44 | How many e-prints on Few-Shot Learning will be published on ArXiv over the 2021-02-14 to 2023-02-14 period? | Biblio | 4089.00 | 3999.78 | 4476.93 | 4978.83 | Q2 | 9% | 9% | 1.095 | 29% | 29% | 1.85 | 679 | 31 | 97 | |||||
45 | What will the the performance be of the top-performing supercomputer (in exaFLOPS) in the TOP500 be according to their November 2022 list? | Compute | 1.10 | 1.02 | 1.28 | 1.49 | Q2 | 16% | 16% | 1.157 | 28% | 28% | 1.31 | 680 | 35 | 122 | |||||
46 | How many e-prints on Few-Shot Learning will be published on ArXiv over the 2021-01-14 to 2022-01-14 period? | Biblio | 1671.00 | 1668.62 | 1833.78 | 1985.18 | Q2 | 10% | 10% | 1.097 | 25% | 25% | 1.46 | 311 | 36 | 123 | |||||
47 | How many e-prints on AI Safety, interpretability or explainability will be published on ArXiv over the 2021-02-14 to 2023-02-14 period? | Biblio | 1211.00 | 1230.03 | 1432.61 | 1652.76 | Q1 | 18% | 18% | 1.183 | 23% | 23% | 2.59 | 734 | 31 | 99 | |||||
48 | What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (November 2022) | Compute | 4.86 | 4.72 | 5.42 | 6.17 | Q2 | 11% | 11% | 1.114 | 22% | 22% | 1.71 | 680 | 39 | 108 | |||||
49 | What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (June 2021) | Compute | 2.80 | 2.87 | 3.17 | 3.99 | Q1 | 13% | 13% | 1.133 | 20% | 20% | -0.10 | 134 | 91 | 321 | |||||
50 | What will the state-of-the-art performance on one-shot image classification on miniImageNet be, on 2023-02-14 in accuracy, amongst models not trained on extra data? | Benchmark | 86.11 | 87.06 | 89.93 | 91.01 | Q1 | 4% | 4% | 1.044 | 20% | 20% | -1.32 | 671 | 30 | 99 | |||||
51 | How many Reinforcement Learning e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 1590.00 | 1586.84 | 1700.14 | 1815.22 | Q2 | 7% | 7% | 1.069 | 19% | 19% | 1.77 | 121 | 59 | 163 | |||||
52 | What percent of total GDP will software and information services contribute to US GDP in Q3 of 2022? | Econ | 3.13 | 3.16 | 3.29 | 3.43 | Q1 | 5% | 5% | 1.050 | 19% | 19% | 1.92 | 681 | 32 | 76 | |||||
53 | How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 8207.00 | 8338.51 | 8713.27 | 9200.80 | Q1 | 6% | 6% | 1.062 | 18% | 18% | 0.94 | 120 | 60 | 172 | |||||
54 | How many Computation and Language e-prints will be published on arXiv over the 2020-12-14 to 2021-06-14 period? | Biblio | 3938.00 | 4109.83 | 4431.17 | 4841.95 | Q1 | 13% | 13% | 1.125 | 15% | 15% | 1.22 | 120 | 62 | 179 | |||||
55 | What will be the maximum compute (in petaFLOPS-days) ever used in training an AI experiment by the following dates? (January 14, 2022) | Compute | 13542.00 | 8407.30 | 16229.32 | 31621.25 | Q2 | 20% | 20% | 1.198 | 12% | 12% | 0.87 | 372 | 42 | 129 | |||||
56 | What will the highest Exact Match rate of the best-performing model on SQuAD2.0 be on 2023-02-14? | Benchmark | 90.94 | 91.35 | 91.85 | 92.53 | Q1 | 1% | 1% | 1.010 | 11% | 11% | 0.34 | 672 | 30 | 118 | |||||
57 | What will be the sum of the performance (in exaFLOPS) of the top 500 supercomputers in the following dates? (November 2021) | Compute | 3.04 | 3.47 | 3.95 | 4.47 | Q1 | 30% | 30% | 1.300 | 7% | 7% | 0.88 | 247 | 36 | 124 | |||||
58 | How many Reinforcement Learning e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period? | Biblio | 7202.00 | 8148.60 | 8768.55 | 9391.50 | Q1 | 22% | 22% | 1.218 | 6% | 6% | 0.34 | 680 | 37 | 102 | |||||
59 | What will the price of IGM be, on 2023-02-14, in 2019 USD? | Econ | 280.98 | 391.28 | 460.52 | 537.20 | Q1 | 64% | 64% | 1.639 | 4% | 4% | -1.07 | 671 | 32 | 119 | Probably a similar story, haven't run the numbers. | ||||
60 | How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period? | Biblio | 37123.00 | 42429.86 | 44676.12 | 46966.40 | Q1 | 20% | 20% | 1.203 | 4% | 4% | -0.59 | 735 | 29 | 83 | |||||
61 | How many Natural Language Processing e-prints will be published on arXiv over the 2021-02-14 to 2023-02-14 period? | Biblio | 17199.00 | 19848.96 | 21082.88 | 22363.20 | Q1 | 23% | 23% | 1.226 | 3% | 3% | -0.39 | 679 | 31 | 106 | |||||
62 | How many Computer Vision and Pattern Recognition e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period? | Biblio | 17249.00 | 18920.18 | 19666.65 | 20399.52 | Q1 | 14% | 14% | 1.140 | 3% | 3% | 0.25 | 311 | 43 | 99 | |||||
63 | How many Reinforcement Learning e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period? | Biblio | 3375.00 | 3825.48 | 4015.12 | 4215.56 | Q1 | 19% | 19% | 1.190 | 2% | 2% | 0.06 | 310 | 37 | 93 | |||||
64 | How many Natural Language Processing e-prints will be published on arXiv over the 2021-01-14 to 2022-01-14 period? | Biblio | 8066.00 | 9142.36 | 9581.54 | 10083.44 | Q1 | 19% | 19% | 1.188 | 2% | 2% | -0.26 | 311 | 41 | 98 | Growth was surprisingly weak compared to previous years. Metaculus bet on exponential growth continuing. | ||||
65 | What will Alphabet Inc.'s market capitalisation be at market close on 2023-02-14? | Econ | 1.05 | 1.54 | 1.84 | 2.20 | Q1 | 75% | 75% | 1.750 | 1% | 1% | -1.07 | 671 | 31 | 90 | Market cap at question close (2021/4/15) was ~$1.46T, so the median prediction implies ~12% growth per year. That's surprisingly bullish imo. Prediction looks even worse because 2022 saw a stock market slump. | ||||
66 | What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates? (June 14, 2021) | Compute | 40.80 | 60.90 | 62.94 | 65.93 | Q1 | 54% | 54% | 1.543 | 0% | 0% | -8.07 | 120 | 74 | 203 | GPU price spike, see above. | ||||
67 | What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates? (January 14, 2022) | Compute | 23.20 | 63.69 | 68.83 | 73.77 | Q1 | 197% | 197% | 2.967 | 0% | 0% | -5.67 | 307 | 45 | 133 | Biggest miss in terms of relative error. I blame it on the post-covid price spike (driven by supply chain disruptions and crypto mining afaik). I don't quite remember when exactly prices peaked but this piece from Nov 2021 suggests they were still on the way up back then. | ||||
68 | What will the highest score be, on Atari 2600 Montezuma's Revenge, by any ML model that is un-augmented with domain knowledge on 2022-01-14? | Benchmark | 43791.00 | 44321.88 | 45356.53 | 140920.33 | Q1 | 4% | 4% | 1.036 | 0% | 0% | 6.82 | 373 | 42 | 173 | Resolution coincided with the lower bound of the range, so this number is slightly fake. The Metaculus prediction was bunched up against the lower bound, but still predicted higher numbers though. | ||||
69 | What will the highest score of any ML model that is un-augmented with domain knowledge on Atari 2600 Montezuma's Revenge be on 2023-02-14? | Benchmark | 43791.00 | 98544.46 | 1483981.70 | 1501456.21 | Q1 | 3289% | 3289% | 33.888 | 0% | 0% | 5.34 | 670 | 35 | 151 | Same as above, though less so because there was a significant peak on the upper end too. | ||||
70 | What will the state-of-the-art language text-to-SQL performance on WikiSQL be on 2023-02-14 in logical form test accuracy? | Benchmark | 87.80 | 90.52 | 91.60 | 92.72 | Q1 | 4% | 4% | 1.043 | 0% | 0% | 2.47 | 670 | 29 | 85 | Roughly same as above, though much less so because there was a good chunk of probability mass in the upper part of the range. |