1 of 15

PREDICTING THE WINNER OF

CHRIS DONNAY & EMILY GRIFFITH

2 of 15

The Great British Baking Show

  • British Baking Competition
  • 11 Seasons
  • 10-13 contestants
  • 3 bakes/challenges per episode
    • Signature
    • Technical
    • Showstopper

3 of 15

Our Data

4 of 15

Our Parameters

  • Age
  • Episodes won
  • Technical challenge winner
  • Technical challenge rank
  • Technical challenge top 3
  • Won bread week
  • Won biscuit week
  • Won cake week
  • Handshake from Paul
  • Chocolate in bake title

5 of 15

Regression or Classification?

Regression

  1. Regress on the rank.
  2. Would help explain the key features in winning the show.

Classification

  1. Classify on whether or not you make the top 3.
  2. Would help in high stakes GBBS betting pools.

6 of 15

What happened with regression?

Explained variance ~ .73

Probably not enough data for regression to be effective, and not a lot of clear linear relationships between features and the rank.

Not able to rank as a unique integer value--no one ever ranks 1

7 of 15

Regression or Classification?

Regression

  • Regress on the rank.
  • Would help explain the key features in winning the show.

Classification

  • Classify on whether or not you make the top 3.
  • Would help in high stakes GBBS betting pools.

8 of 15

KNN

Ran a 5-fold Cross Validation, 3 neighbors got the highest average accuracy

9 of 15

Random Forest

Ran a 5-fold Cross Validation, max depth of 3 got the highest accuracy, recall, and precision.

Tells us the most important features are the technical challenges and star baker.

10 of 15

What features are important?

11 of 15

Naive Bayse

Had significantly lower accuracy than either KNN or RF so we eventually dropped it from the model.

12 of 15

Voting

Using KNN, RF, and NB to vote, we find accuracy scores

Random Forest: 0.96

Naive Bayse: 0.75

KNN: 0.83

Voting: 0.91

13 of 15

Nothing out-performed the RF!

Let’s test it on the newest season! No spoilers :)

14 of 15

Random Forest Predicted Top 3 by Episode

Finds 1/3

15 of 15

Random Forest Predicted Top 3

Probability by Episode