1 of 17

Lecture 8: Combining Forecasts

Jacob Steinhardt

Stat 157, Spring 2023

2 of 17

Warm-up Question

�“How many 8.5 x 11 sheets of paper �does the average tree produce?”

3 of 17

Histogram of Answers

���������Now that you’ve seen the class’s distribution, what would you guess?

4 of 17

Ways of Combining Forecasts

What are different techniques for combining forecasts?

  • Mean
  • Median
  • Trimmed mean
  • Weighted mean

General name for this: “ensembling” (also used in machine learning)

5 of 17

Mean

  • For probabilities: take average of the distributions
  • For numerical answers: take average of answers�
  • One reason this is good: convexity
    • Brier score: ½ * ((1-p)² + (1-q)²) ≤ (1-(p+q)/2)²
    • Log score also convex
    • [Jensen derivation on board]�
  • Any possible issues with this?

6 of 17

Median

Unlike mean, robust to outliers

Also independent of scale (same on linear or log scale)

�Disadvantage: uses data less efficiently (only cares about middle values)

7 of 17

Trimmed Mean

  • Remove top and bottom x% of data, then take mean of the remainder
    • Or remove the x% of data that is furthest from mean
  • Like median, robust to outliers
  • Like mean, makes use of most of data
  • For probabilities, implicitly “extremizes”:
    • Suppose 95% of class give p = 0.99, and 5% give p = 0.5
    • For x = 5%, the trimmed mean is 0.99, �while mean is 0.9655

8 of 17

Single question Aggregate of 5 questions

Mean at 32nd percentile�Median, trimmed mean at 38th percentile

Mean: 8th percentile�Median: 14th percentile�Trimmed mean: 20th percentile

9 of 17

Weighted Mean

Exercise:

  • What is the atomic number of cadmium?
  • How confident are you in your answer (1-10)?

Trimmed mean is special case of weighted mean, where we assign 0 weight to answers that are far from rest.

  • Implicit reasoning: “answers far from rest are probably wrong”

10 of 17

Implications for Your Own Forecasting

  • Think of a number a few different times and take the average
    • E.g. I often waffle back and forth on what number to go with; sometimes best to just take the average of numbers you’ve considered and call it a day�
  • When deciding what to believe, weigh various sources by how much you trust them and take a weighted average�
  • Work in teams and take average across team
    • Related idea: the “Delphi method” (later in lecture)

11 of 17

Weighing Experts

We are changing our call for the February FOMC meeting from a 50 [basis point] hike to a 25bp hike, although we think markets should continue to place some probability on a larger-sized hike. (source, Jan 18)Shared by an economist at Citigroup, the 3rd largest banking institution in the US.

Pricing Wednesday morning pointed to a 94.3% probability of a 0.25 percentage point hike at the central bank’s two-day meeting that concludes Feb. 1, according to CME Group data. (source, Jan 18)

The CME group is the world's largest financial derivatives exchange. The CME FedWatch Tool uses futures

pricing data (the 30-Day Fed Funds futures pricing data) to analyze the probabilities of changes to the Fed rate.

Markets expect the Fed to raise rates again on February 1, 2023, probably by 0.25 percentage points…. However, there’s a reasonable chance the Fed opts for a larger 0.5 percentage point hike. (source, Jan 2)

Simon Moore is a writer at Forbes. He provides an outsourced Chief Investment Officer service to institutional

investors. He has previously served as Chief Investment Officer at Moola and FutureAdvisor, both are consumer

investment startups that were subsequently acquired by S&P 500 firms. He has published two books and is a CFA

Charterholder and educated at Oxford and Northwestern.

12 of 17

How do we choose the weights?

  • For experts: look at past track record
    • Improvement: track what type of �questions they are good at��
  • Mathematically: if estimates are unbiased�and independent, and estimate i has �standard deviation 𝛔i, then weight by 1/𝛔i2
    • If not independent, downweight estimates that �are more correlated with others
    • Hard to literally use in practice, but good conceptual motivation
    • Special case: finite sample error [roughly 1/sqrt(k) for k samples]

13 of 17

Working in Teams: The Delphi Method

Delphi method:

  • Forecasters individually come up with predictions and reasoning
  • Then provide predictions + reasoning to group
  • Individuals update based on group forecast [potentially multiple rounds]
  • At end, take average of all of the final individual forecasts

Variants:

  • Predictions + reasoning provided anonymously
  • Only reasoning given (not numerical predictions)

Question. Why come up with numbers individually (rather than working collaboratively the whole time?)

14 of 17

Ensembling with Yourself

What was the total annual budget of the US government in FY2022?

Come up with at least 3 distinct approaches �to Fermi estimate this.��Then, decide how to combine the estimates �together.

15 of 17

Combining Confidence Intervals

What if instead of point estimates, we have 80% confidence intervals?

  • [a1, b1], [a2, b2], … (ai = lower end, bi = upper end)�
  • Simplest approach: take trimmed mean of upper/lower ends
  • Alternatives:
    • Mixture of distributions
      • Variance of mixture = mixture of variance + variance of means
      • Implies widening width if means disagree
    • Treat as independent “measurements”
      • Implies narrowing width of intervals; need to be careful to avoid overconfidence

16 of 17

Combining Sums

What if we are predicting X + Y, and have confidence intervals for X and Y?

  • If expect errors to be independent, then std(X+Y) = sqrt(std(X)^2 + std(Y)^2)
  • If errors are perfectly correlated, then std(X+Y) = std(X) + std(Y)

For 70%/80% CI, stdev is usually a decent approximation

For extreme tails (99% CI), can be more complicated.

  • If X and Y are heavy-tailed, tail event comes from one of X or Y individually

17 of 17

Summary

  • Averaging multiple approaches or experts often improves forecasts�
  • Assess track record and accuracy of sources to determine weights�
  • Consider working in teams and generating independent numbers�
  • Combining confidence intervals: several ideas, no silver bullet (yet)