Finding Expected + Unexpected Goals With R
Namita Nandakumar
@nnstats
What is hockey?
Why does hockey matter?
What is hockey?
Some ways to evaluate hockey teams + players::
Some ways to evaluate hockey teams + players::
These have the nice statistical property of being one of the most granular things we can observe,��but people complain that not all shots are equally dangerous...
And they’re right!��So, what other features can we account for?
Let’s take a look.
A closer look...
What can the play-by-play data tell us?
What is the play-by-play data hiding?
This is where I might’ve labeled the shot.
5v5 unblocked shots have scored ~6% of the time this season…��but I think TK’s shot was better than that.
Couch to 5K xG
FACTS.
OPINION.
xG Researchers, Past + Present
Data Source: NHL via @MoneyPuckdotcom
Some stuff he did for us that I did not want to do::
This is all the code you need to get started!
Fun w/�Heat Maps
Unblocked Shots
Goals
ggplot(aes(arena_adjusted_x_cord_abs, arena_adjusted_y_cord)) + .
gg_rink() + .
stat_density_2d(aes(fill = ..level..), geom = 'polygon', alpha = 0.8) + .
scale_fill_viridis(option = 'D') ...
Lazy Logistic Regression xG
glm(goal ~ poly(arena_adjusted_shot_distance, 2),� data = shots, family = 'binomial')
Slightly Less Lazy xgboost xG
Google image results for “machine learning”
Which variables am I not including?
Score State.
Shooters + Goalies.
Some unsurprising xgboost variable importance #s.
I love using geom_smooth() to glance at calibration.
More Model Comparisons
xG Model | All Shots Are The Same | Shot Distance Logistic | xgboost�(CV results) | MoneyPuck | |
Log Loss | 0.220 | 0.202 | 0.193 | 0.192 | |
AUC | 0.500 | 0.728 | 0.769 | 0.771 | |
TK Goal Probability | 5.7% | 9.9% | 11.8% | 14.6% | |
Unexpected Goals: Shooters
shots %>%
group_by(shooter_id, shooter_name) %>%
summarize(exp_shooting_pct = mean(x_goal_xgboost),
actual_shooting_pct = mean(goal),
diff = actual_shooting_pct - exp_shooting_pct,
shot_count = n()) %>%
filter(shot_count > 300) %>%
arrange(-diff) %>%
View()
Thank you!!
+ thanks to @MoneyPuckdotcom, @mannyelk, @IneffectiveMath, @EvolvingWild, and all the other�researchers I’ve read + cited.��You can find me @nnstats.
Appendix: Goalies