1 of 1

Introduction

Knowing the probability of a shot being made is essential in basketball. There are many different factors that can affect the probability of a shot going in such as shot-distance, nearest-defender distance, shot type, time left in the quarter, home/away, etc. In this paper, we look at trying to predict the probability of a shot being successful for 133 players during the 2014-15 NBA regular season. After using classical machine learning models, we used three other methods attempting to improve our accuracy, comparing the results. Since a subset of our predictors affect each player differently, we used mixed effects models. We used a generalized linear mixed effects model (Julia), mixed effects random forest model (MERF python package), and Bayesian modeling (RStan).

Materials and methods

  • Classical Machine Learning Models
    • Logistic Regression, Random Forest Regression, Random Forest Classification, XGBoost

  • Bayesian Hierarchical Modeling (RStan)

= Vector of prediction coefficients for player j

= Predictor of the ith shot for player j

  • Generalized Linear Mixed Effects Model (Julia)

𝛽 = A vector of fixed effects coefficients

X = A vector of fixed effects variables

u = A vector of random effects variables

𝑍 = The random effects design matrix

  • Mixed Effects Random Forest (MERF package in Python)
    • Random Forest for Fixed Effects, Linear Random Effect

yj = ƒ(Xj) + Zj * bj + ej

ƒ(.) = Random forest regression function for fixed effects

Xj = (nj * p) matrix of fixed effects covariates for player j.

Where nj represents the number of shots taken by player j. And where p represents the number of fixed effect covariates

Zj = (nj * q) matrix of random effects covariates for player j. Where

represents the number of random effect covariates.

bj = (q * 1) vector of random effect coefficients for player j ~N(0, σ2b)

ej = (nj * 1) vector of errors for player j ~N(0, σ2e)

Results

Conclusions

The results of introducing random effects into models for shot classification definitely is thought-provoking. The results of all three hierarchical techniques yielded intriguing results, and even showed improvement over traditional models in the case of the MERF package. Not only are the results significant, but also the models are also valuable due to the potential to use the random effects coefficients to understand player strengths and weaknesses.

Potential Benefits

  • Objective measure for Good shot vs Bad Shot Debate (Does our model say the shot has a high probability of success)
    • Could help coaches, players, announcers, etc.
    • Solve debated cases like Damian Lillard's shot to beat the OKC Thunder
  • Can look at Individual random effects to look at how certain players are affected by variables such as closest defender distance, clutch time, garbage time
  • Can help determine best shots for an individual player based on his qualities

Areas for Improvement

  • Better Data
    • More predictors, player tracking data, more shots
  • Bayesian
    • group shots both by player and by type (2 or 3-pointer). Could allow use of informative priors
  • Mixed Effects Random Forest
    • Use XGBoost or different machine learning technique for the fixed effects
    • Use non-linear random effect
    • Note: Both of these rely on the developers improving the package

Dan Barlow, James Bury, Spencer Siegel (University of North Carolina at Chapel Hill)

Hierarchical Modeling to Predict Shot Outcome in NBA Games

Acknowledgments

Special thanks to Dr. Richard L. Smith of University of North Carolina at Chapel Hill for providing guidance and support throughout the project.

Data used from Kaggle, https://eightthirtyfour.com/data, and Basketball Reference.

Literature cited

  • A Julia package for fitting (statistical) mixed-effects models https://github.com/dmbates/MixedModels.jl
  • Stan Development Team. 2018. Stan Modeling Language Users Guide and Reference Manual, Version 2.18.0. http://mc-stan.org
  • R Core Team (2019). R: A language and environment for statistical

computing. R Foundation for Statistical Computing, Vienna, Austria.

http://www.R-project.org/.

  • A Python package for fitting Mixed-Effects Random Forests

https://github.com/manifoldai/merf/tree/master/merf

Mixed Effects Random Forest (MERF)

Example Plot of Random Effect (Pts Type- 2 or 3 pointer) :

Players in far right bin: Steph Curry, Joe Johnson, Brandon Knight, Tim Duncan, Al Jefferson

Generalized Linear Mixed Effects Model