Going Deep:
Models for Continuous-Time Within-Play Valuation of Game Outcomes in American Football with Tracking Data
1
Ronald Yurko1, Francesca Matano1, Lee F. Richardson1, Nicholas Granered2,
Taylor Pospisil1, Konstantinos Pelechrinis3, Samuel L. Ventura1
September 28, 2019
1Department of Statistics & Data Science, Carnegie Mellon University
2Department of Statistics, University of Pittsburgh
3School of Computing and Information, University of Pittsburgh
NESSIS 2017:
“Recent work in football analytics is not easily reproducible”
NESSIS 2019???
2
Thanks Max!
3
Play-by-play evaluation with nflscrapR
Expected points (EP): how many points have teams scored in similar situations?
Win probability (WP): have teams in similar situations won the game?
Between play value: expected points added (EPA) / win probability added (WPA)
What about continuous, within play value?
4
Continuous-time valuation with player-tracking data
Cervone et. al. (2014, 2016), two-level Markov chain approach
Soccer extensions: Link et. al. (2016), Fernandez et. al (2019)
5
Enter the Big Data Bowl...
NFL collects tracking data at 10Hz with RFID chips in shoulder pads and ball
December 2018: NFL (Mike Lopez) released data from weeks 1-6 of 2017 season
6
Competition entries focus on receivers:
And more great entries! https://operations.nfl.com/the-game/big-data-bowl/
What does the data look like?
On-field (x, y), speed, and angle for each player (and ball) is recorded at rate of ten frames per second - 1,075,720 unique frames across 14,167 plays
NFL provides event annotations within plays (e.g. handoff, first contact, etc)
Example play: Cordarrelle Patterson’s 47 yard jet sweep TD run
7
Patterson’s 47 yard jet sweep TD run
8
9
?
?
Continuous-time play value framework
GOAL: For each play , model the end-of-play yard line
10
General framework for any play?
11
Start
Run
Ball-carrier model
QB Decision Model
End
Pass
Global Catch Prob Model
Target Prob Model
Individual Catch Prob Model
Start, end, or play type
Model
Model outcome
Scramble or sack
Throw away
Catch (includes INT)
No Catch
Dropback
Predict
Ball-carrier model
We model the yards gained from the current position on the field at :
Use the fact that [player’s current yard line]
Then by linearity of expectations,
[player’s current yard line]
12
Ball-carrier model features
13
Ball-carrier model choices and qualities
14
| Intercept-only | LASSO | XGBoost | FNN | LSTM |
| | ✔️ | ✔️ | ✔️ | ✔️ |
| | | ✔️ | ✔️ | ✔️ |
| | | ✔️ | ✔️ | ✔️ |
| | | | | ✔️ |
Gradient boosted trees (XGBoost library)
Feedforward neural network (FNN) and long short-term memory (LSTM) network
Model validation
Leave-one-week-out (LOWO) cross-validation
Criteria for hold-out predictions:
15
LSTM displays best LOWO CV results
16
RMSE = 7.67
RMSE = 6.24
RMSE = 5.86
RMSE = 5.52
RMSE = 6.11
LSTM makes the smallest long-term errors
17
Example play: Patterson’s 47 yard jet sweep TD run
18
19
Can see clear impact of covariates on predicted end-of-play yard line
20
Can evaluate teams and players with respect to expectation at key moments in play
21
Compute continuous-time play value
Given our prediction for the end-of-play yard line we proceed to update:
22
Generate point estimate for using nflscrapR multinomial logit model
Similarly for with GAM
(for now we use observed change in )
Generate point-estimates for both EP and WP using nflscrapR models
Evaluate player movements using EP/WP within a continuous framework
23
A note of caution...
24
This is �NOT the expectation of EP or WP!
25
Conditional density estimation (CDE) can be used for estimating the density curve for
26
Pospisil and Lee (2018) developed flexible methodology for CDE, e.g. RFCDE above
Recap and future work
27
Thank you Mike Lopez
28
More data please?
Photo credit: Gregory Matthews
Thank you and join us at #CMSAC19 Nov 1st-2nd!
29
Francesca Matano
Lee Richardson
Nick Granered
Taylor Pospisil
Kostas Pelechrinis
Sam Ventura
Register at stat.cmu.edu/cmsac/
References I
Yurko, R., Ventura, S. & Horowitz, M. (2019). nflWAR: a reproducible method for offensive player evaluation in football. Journal of Quantitative Analysis in Sports, 15(3), pp. 163-183.
Pospisil, T. and Lee, A. (2018). RFCDE: Random forests for conditional density estimation. URL: https://arxiv.org/abs/1804.05753.
Cervone, D., D’Amour, A., Bornn, L. & Goldsberry, K. (2016): A multiresolution stochastic process model for predicting basketball possession outcomes. Journal of the American Statistical Association, 111, pp. 585–599
30
References II
Cervone, D., D’Amour, A., Bornn, L. & Goldsberry, K. (2014). Point-wise: Predicting points and valuing decisions in real time with nba optical tracking data. MIT Sloan Sports Analytics Conference.
Link, D., Lang, S. & Seidenschwarz, P. (2016). Real time quantification of dangerousity in football using spatiotemporal tracking data. PLoS ONE, 11
Fernandez, J., Bornn, L. & Cervone, D. (2019). Decomposing the immeasurable sport: A deep learning expected possession value frame-work for soccer. MIT Sloan Sports Analytics Conference.
31