Forecasting is often vibes. But so is expertise

People often say ‘forecasting is just vibes’.

Often, yes.

But so is expertise.

My argument goes as follows:

Much expertise is prediction
Predictions are often vibes
Vibes can have good track records

Let’s go through these one by one.

Much expertise is prediction

Often experts apply their knowledge in unfamiliar or uncertain situations. “How much will this policy decrease child poverty” “Will Biden leave in the next 3 days” “Will I feel better after taking painkillers?” “How much will sea levels rise?”

Consider the experts you see most often in your life. For me it’s political commentators, military buffs and science youtubers. Whenever they say “He will probably…” or “I expect…” or “It seems likely that…” they are making predictions.

In this sense there is not some gap between experts and forecasters, everyone does a bit of prediction. Doctors are predicting the effects of treatment, political experts are guessing how groups will vote, generals are anticipating the moves of other nations.

This discussion often comes up in AI policy. But there isn’t some privileged group who get to avoid doing prediction. Everyone is predicting the effects of AI on the economy, on our lives, on the human race. Some people think the effects will be big. Some people think they will be small.

There is no position on the future that doesn’t involve forecasts.

Predictions and expertise are vibes (and this is fine)

Look at this tweet from Brian Schwartz, a political finance reporter at CNBC.^[a]

When Brian said that there were no plans for Biden to drop out, where did this opinion come from? Did he consult a big table in the back of his journalism book? Has he hacked every computer owned by the DNC? Did God reveal it to him?

No. He made it up.

He sat down, considered the information and wrote this tweet, which matched his rough thoughts on the matter. I’m not saying this is bad. This is what all of us do when writing about something we aren’t certain of. Our brains consider different options and we try to find words that match the vibes on our heads.

Now if he were a forecaster, perhaps people would take more notice of the lack of a robust model or dataset and query why he is allowed to talk precisely about things, but he’s just a typical everyday expert so we expect him to come up with vibes like this all day.

Yo Brian, where is your reference class.

And importantly, people will cite his vibes. “Brian Schwarz says” “According to Brian Schwarz at CNBC”. These are typical things for people to say in response to the vibes-based thoughts of an expert.

Making things up isn’t bad. We want to hear people telling us how their expert internal model responds to new information. Where else could these numbers come from? Are they engraved in the firmament? Nope. People are making them up.

I am tired of people acting like forecasters putting numbers on things are doing something deeply different from many other experts working in fields without deep reference classes or well understood models. Just as we take experts seriously because of their years of study, I want to be taken seriously because of my track record. I am good at predicting things.

Come at me

In terms of AI discourse, there is a notion that AI Safety advocates are using numbers to launder their vibes, but everyone else, they really know stuff. So when people say to not regulate AI, they have seen 1000 civilisations develop this technology? Or do they understand LLMs well enough to predict the next token?

To me, it’s vibes all the way down. Nobody knows what is going to happen, chance is the best language to discuss it in, figure out how we disagree and try and figure out what’s going on.

Most experts are making things up

Vibes can have good track records

My younger brother is a good judge of character. When he doesn’t like someone I recall him often being correct. He catches things I miss. Somehow this gut feeling relates meaningfully to the world.

Vibes can be predictive.

And if you want to know how predictive. Just start tracking them. Every time your friend says someone gives her the ick, note it down and a year later, look at the list. Were the people disreputable? Maybe her ‘vibes’ are in fact a good source of information about the world.

Forecasts are the same. They are not given legitimacy by having come down from the sky. They are given legitimacy by having a track record of being correct.

And this is true whether it’s Nate Silver’s forecasts or your dog barking at someone. Both might be signals, if we take the time to check.

The question here isn’t “can we trust forecasters because of something within themselves?” it is “do they have a history of getting questions like this right”

Sometimes in AI people say ‘AI isn’t like other fields, the track records don’t carry over’. But this doesn’t let you ignore just the forecasts you don’t like. If we can’t forecast anything then all expert predictions are out the window. We would have to proceed with no knowledge at all.

The issues with forecasting apply to other expertise

My takeaway here is that issues with forecasting generally apply to other forms of expertise too.

To discuss AI specifically, people criticise AI forecasts for the following:

For being outside the forecasting horizon of 3-5 years
Lacking base rates

Lacking a robust model
There are order of magnitude disagreements between forecasters

And

Being too out of sample to use current track records

The first 4 of these are good arguments. These are reasons that forecasts are less good than we might want them to be. I wish we were better at long term forecasting, that there were easy base rates for AI and that forecasters and experts were more in agreement with one another.

But this isn’t reason to ignore forecasts. What endows other participants in the AI discourse to forecast 5 years ahead? Do they have reference classes to use? Do the order of magnitude differences mean we should ignore forecasts? Or do they point to uncertainty.

The final point seems wrong to me. I don’t see why track records in geopolitics, pandemics and technological development shouldn’t transfer to x-risk. These aren’t tiny percentages we are talking about. If someone says there is a 1% chance, they may have forecast that enough times to be well calibrated.

You have to check the track records

The actual issue here, I think, is that participants use numbers as a proxy for track records. They see someone using numbers and assume that because they are doing so, the norm is to treat their forecasts with respect.

Do I think Yudkowsky is a good forecaster? No. Show me his track record.^[b] I haven’t seen it. I don’t think I’ve seen the good forecasting track record of anyone with a P(doom) over 50% though I guess there are a few.

If someone presents a P(doom) number, before I take it at all seriously, I want to know how well calibrated they are in general. I might ask questions like:

Are they a Superforecaster™? It’s pretty hard to become one, so it’s a good signal
Do they use Metaculus? What’s their placement like in the leaderboard
Do they use Manifold or Polymarket? What’s their profit like? Is it across many serious markets?
Have they recorded personal forecasts, ideally publicly?

To try and empathise with people who criticise forecasting, if I thought that merely giving a number meant that policy experts should listen to me, I’d be annoyed that that was the norm. But that’s not the norm I advocate for.

If someone is giving a forecast, ask “What’s your track record?”

Comments on the AI Snake Oil piece

I’ve wanted to write this article for a while, but I was tipped over by reading this article. I think it’s sloppy work and I expect better from academics.

My main criticisms are the ones I have levied above - that all the criticisms they apply to forecasters apply to all AI experts:

Noone has a good track record of predicting policy outcomes 5+ years into the future with uncertain tech
Noone has agreed-upon reference classes for AI
Noone has a model of LLMs robust enough to predict capability advances

Yet the authors think that the policies of those they disagree with should be rejected, but now^[c] their own.

But they should reject the kind of policies that might seem compelling if we view x-risk as urgent and serious, notably: restricting AI development.

And they think that their forecasts should be listened to:

As we’ll argue in a future essay in this series, not only are such policies unnecessary, they are likely to increase x-risk.

Wait, but you said that we can’t forecast AI? Now you’re saying that policies you don’t like will increase risk? On what basis?

Both of these are special pleading. Either we can forecast AI or we can’t, but we don’t get to listen on^[d] to your experts and your forecasts.

Next they seem uninterested in track records. I see little acknowledgement in this piece that some forecasts might be a lot less trustworthy than others. Even when they reference superforecasters, who have some of the lowest AI x-risk forecasts, they say, emphasis mine:

As before, the estimates from forecasting experts (superforecasters) and AI experts differ by an order of magnitude or more. To the extent that we put any stock into these estimates, it should be the forecasting experts’ rather than the AI experts’ estimates. One important insight from past research is that domain experts perform worse than forecasting experts who have training in integrating diverse information and by minimizing psychological biases. Still, as we said above, even their forecasts may be vast overestimates, and we just can’t know for sure.

There is a better solution here. We weight forecasts according to the number of them and the track records of those involved. Some AI Safety folks have very high p(doom)s but in my experience those people don’t have a track record I can see. I take their predictions less seriously as a result. To me, this looks like laziness, not bothering to sort the baby from the bathwater.^[e]^[f]

Finally there is are some easy errors from an article written by academics to a subscriber list of 30,000 people

For instance, they repeatedly liken AI to an alien invasion.

If the two of us predicted an 80% probability of aliens landing on Earth in the next ten years, would you take this possibility seriously? Of course not.

Actually, If two Princeton academics said there was an 80% chance of disaster in a field I didn’t know about, I probably would take it seriously. And if there were baby aliens on Earth and 1000s of alien researchers were concerned then I’d be concerned too. But AI isn’t like aliens, we have AI tools rapidly increasing in capabilities that currently exist. This is beneath them.

Some things I liked

To give them some credit, I like their proposals:

Forecasting AI milestones, such as performance on certain capability benchmarks or economic impacts, is more achievable and meaningful. If a forecaster has demonstrated skill in predicting when various AI milestones would be reached, it does give us evidence that they will do well in the future.

I too would like to see more milestone forecasting. Doom forecasts aren’t helpful - by the time we find out if they are correct, it’s too late. Let’s figure out who is good at forecasting the growth of AI and where we think it’s likely to go. Then ideally we can differentially invest our resources, putting more work into things with more benefit and less risk.

Likewise, I think this is a pretty good general approach for policymaking, if we remove the context of ignoring x-risk:

Instead, governments should adopt policies that are compatible with a range of possible estimates of AI risk, and are on balance helpful even if the risk is negligible.

In Summary

“Forecasting is the worst way of predicting the future, except for all the others.”

Forecasting is mostly vibes and that is okay. Because so is expertise. Experts be just making up statements, from their own inscrutable models.

And if many experts’ vibes are that there is a significant (>0.001%) chance of AI killing all of us in any given year, then that’s a reasonable policy input. I think we’ll do better if we listen to that.

The AI Snake Oil article is lazy writing. All its conclusions apply to all AI experts, but somehow the authors only apply the conclusions to proposals they dislike. There is room for both forecasting and other forms of expertise to inform policy. The authors should do better.

[a]I know nothing about this man, it was just the first tweet I found.

[b]I am not great at finding people's prediction market history but I believe he is very active there and makes a lot of bets, so this should in fact be pretty answerable.

[c]Not

[d]Only?

[e]Not really on topic, but I notice one crux I often have on this topic is, e.g., the experts say the risk of AI takeover by 2050 is 60% and the super forecasters say 5% and everyone will get hung up on the discrepancy, but I say "they both agree that there's a very real risk that everyone, everything, our future, our kids, our love, the very idea of love, literally everything is going to get erased, that's all I need to know, I don't care if of it's 5% or 50%, those are both crazy high, this is madness, how is this unregulated, are you people nuts, I need a permit to have a barbeque, and we're arguing over 5% vs 50% on extermination and saying 'don't regulate'?" As I said, an aside, but I don't care much about discrepancies between expert predictions because even the low ones are very high when you multiply them by "eradication".

[f]Yes, this seems important. Though people are reasonably wary of very small numbers also.

Thanks for your comments here.