1 of 19

Big Data Analysis Project in

NBA Award Prediction

Mingzhe Hu | Yuting Zhou | Zichen Wang

Dec. 17, 2021

2 of 19

Dataset Design

We developed datasets of three styles:

  • Ordinary datasets with technical stats
  • Packed datasets with profile and news info
  • Differential datasets with player improvement data

Novelty

  • Cache in the local storage of browser
  • Login status alive for 1 hour
  • Automatic name filling
  • Debounce
  • Streaming data
  • Model stacking

3 of 19

Big Query

Program triggered at 5 a.m. every day

  • id: unique id for player, primary key
  • name: player name, used for display
  • reb, ast, stl, blk, tov, pts: player’s technical data
  • src: player’s headshot URL, used for display
  • teamSrc, team: team name & icon of the player, used for display
  • newsDate, newsIntro, newsTitle, newsUrl: latest news info of the player
  • Wikipedia search freqency (social impact) as further feature

4 of 19

Scheduler

Fetch Tech Data

Fetch News Data

Fetch Profile Info

Algorithm

Update RAW Dataset

Result Dataset

5 of 19

Demo

Login interface

6 of 19

Demo

Layout

7 of 19

Demo

Fuzzy Search

8 of 19

Demo

MVP Probability Inquiry

9 of 19

Demo

MIP Probability Inquiry

10 of 19

Demo

DPOY Probability Inquiry

11 of 19

Prediction Methodology

Data Preprocessing

12 of 19

Prediction Methodology

Model Stacking with PySpark in GCP

MLP

Radom Forest

Decision Tree

13 of 19

Prediction Methodology

Model Stacking with PySpark in GCP

Naive Bayes

SVM

GBT

Logistic Regression

14 of 19

Prediction Methodology

Model Stacking

model

acc

SVM

Random Forest

MLP

Decision Tree

Naive Bayes

Gradient Boosted Tree

LR

MVP Accuracy

0.982

0.982

0.991

0.979

0.982

0.974

0.985

DPOY Accuracy

0.976

0.971

0.971

0.977

0.977

0.977

0.976

MIP Accuracy

0.924

0.975

0.930

0.937

/

0.924

0.974

15 of 19

Correctness Argument

MVP Prediction rationality

Top 2 google search in MVP Pred:

the game day & sportsbettingdime

Our results: top-8

Stephen Curry

Kevin Durant

Giannis Antetokounmpo

Nikola Jokic

Luka Doncic

Jimmy Butler

Paul George

Anthony Davis

Luka Doncic

Joel Embiid

Kevin Durant

Giannis Antetokounmpo

Stephen Curry

Overlap rate: 75%

16 of 19

Correctness Argument

MIP Prediction rationality

Top 1 google search in MIP Pred: vegasinsider

Our results: top-20

Overlap rate:70%+

{'Cole Anthony': 1.0, 'Anthony Edwards': 0.9592480992366217, 'Tyler Herro': 0.9267290278229765, 'Dejounte Murray': 0.9229780752106609, 'Paul George': 0.8986608961540634, 'LaMelo Ball': 0.8302039796910764, 'Jordan Poole': 0.8151593177692744, 'Ricky Rubio': 0.7644187240952561, 'Reggie Jackson': 0.7488390443341627, 'Tyrese Maxey': 0.735780055935347, 'Miles Bridges': 0.7224351567178886, 'Ja Morant': 0.716587205282251, 'Darius Garland': 0.6364049009684235, 'Spencer Dinwiddie': 0.6310338456575866, 'Fred VanVleet': 0.627110247793158, 'Anthony Davis': 0.5694527180906511, 'Bobby Portis': 0.5626971148317866, 'Jalen Brunson': 0.5623435853284228, 'Mohamed Bamba': 0.5488378774735984, 'Montrezl Harrell': 0.5478065664617654}

17 of 19

Correctness Argument

DPOY Prediction rationality

Top 1 google search in DPOY Pred:

dimers

Our results: top-9

Ruby Gobert +300

Anthony Davis + 350

Giannis Antetokuunmpo +400

Bam Adebayo +1000

Ben Simmons +1400

Joel Embiid +1400

Myles Turner +1600

Jrue Holiday +2500

Draymond Green +2500

Overlap rate: 55.6%

18 of 19

Business Value

Keep track on player’s latest trend

Provide real-time intelligent sports lottery analysis

Provide a simplier platform with no ads

19 of 19

Reference