1 of 20

Simulating Sports: The Inputs and the Engines

Paul Bessire

Product Manager, Quantitative Analysis and Content

FOX Sports Interactive, WhatIfSports.com

July 15, 2009

2 of 20

Table of Contents

  • WhatIfSports.com Overview

  • Challenges with Simulating Baseball

  • Plate Appearance Decision Tree

Or “Improving the log5 Normalization Model for Batter/Pitcher Matchups”

  • Pedro vs. Ruth (mostly second presentation)

3 of 20

About WhatIfSports.com

  • February 2000 - Launched in Cincinnati with SimMatchup

  • 2001 - SimLeague Baseball (like Strat-o-Matic) and Basketball; Paul Bessire runs free leagues

  • 2002 – SimLeague Football and Hockey; Paul wins own baseball league with “Streaking Ho-Hos”

  • 2004 – Hoops Dynasty and Gridiron Dynasty; Paul joins WIS part-time “between school”

  • 2005 – WhatIfSports.com acquired by FOX Interactive Media; Paul comes on full-time

  • 2006 – Hardball Dynasty and Clutch Racing Dynasty; All simulations rewritten with Paul’s help

  • 2008 – FC Dynasty

  • Present – 600,000+ registered users, part of FOX Sports TV group

4 of 20

Sports Simulation

  • Play-by-play

    • A “play” means something different for each sport

    • Probabilities for every individual outcome

    • Random number generation

    • Pitch-by-pitch (or basketball/hockey pass-by-pass) not needed

    • Account for every possible statistical interaction during a game

  • Can be recreated quickly

    • 200+ games/second

    • All data tracked

    • Every outcome is different

    • Boxscore (link)

    • Many relevant applications (second presentation)

5 of 20

Baseball Challenges

  • Missing Player Data

  • Defensive Metrics

  • Ballpark Effects

  • Era Adjustments

  • Assigning Value (Salaries ~ RC27# * PA or ERC# * BF + Fielding + Extremes)

  • Career “Seasons” (Pujols #3 in career $/PA, Musial #16; Gibson #31 in $/IP)

  • Fatigue (Projected PA vs Actual PA/162 or Projected IP & GP% vs Actual IP/162 & Historical GP%)

6 of 20

Missing Player Data

  • Typically solved with Regression

    • Linear: Pitchers’ 2B or 3B per hit allowed or Pitches Thrown per BF
    • Multivariate: Ballpark Effects
    • May be Era and/or Ballpark Adjusted

  • Discriminate Analysis/Cluster Analysis

    • Catcher’s Arm Ratings
    • Basketball Positional Effectiveness

  • Fitting to a curve/distribution

    • Player Generation and Development
    • Assigning Ratings or Grades

7 of 20

Significant Stats ( # has missing data)

Pitchers

  • HBP/BF
  • BB/(BF – HBP)
  • OAV
  • 1B/Hit Allowed
  • 2B/Hit Allowed # (regression)
  • 3B/Hit Allowed # (regression)
  • HR/Hit Allowed
  • K/Out # (regression)
  • GO/FO # (regression for GO)
  • BF # (approx. ~ outs + hits + bb + hbp)
  • Pitches Thrown/BF # (regression)
  • Relative Range Factor # (WIS formula)
  • Fielding Percentage # (fit to curve for grade)
  • Handedness (historical impact)
  • Ballpark Effects # (multivariate regression)
  • League Averages

Hitters

  • HBP/PA
  • BB/(PA – HBP)
  • AVG
  • 1B/Hit
  • 2B/Hit
  • 3B/Hit
  • HR/Hit
  • K/Out # (regression)
  • GO/FO # (regression for GO)
  • PA
  • Relative Range Factor # (WIS formula)
  • Fielding Percentage # (fit to curve for grade)
  • Catcher Arm Rating # (discriminate analysis)
  • CS% (Runner) # (regression for CS)
  • Speed Rating # (WIS forumla)
  • Handedness (historical impact)
  • Ballpark Effects # (multivariate regression)
  • League Averages

8 of 20

Insignificant Stats

Pitchers

  • Wins

  • Losses

  • Saves

  • Holds

  • Complete Games

  • Shutouts

  • ERA (kind of – 2B and 3B approx)

  • Unearned Runs

  • Games Started

  • Pitch Types

  • Performance in Counts

  • Other Situational Stats

Hitters

  • RBI

  • IBB

  • Runs (kind of – in Speed Formula)

  • GIDP (kind of – in Speed Formula)

  • SF (kind of – in PA, but also situational)

  • SH (kind of – in PA, in but also situational)

  • SBA (kind of – attempts, but also setting)

  • Performance in Counts

  • Other Situational Stats

9 of 20

WIS Relative Range Factor

  • Range Factor

    • Important because range can turn hits into outs and outs into hits

    • Generally defined as (Putouts + Assists)/(Innings/9)

    • Reliant on many factors

    • Wildly inconsistent across eras

    • Does not include errors

    • Need another metric…

  • WIS Relative Range Factor

    • Similar to Bill James RRF, but not as robust (data limitations)

    • Approximates plays made/possible plays made

    • Used to approximate + and – plays

    • Includes errors

    • Era-adjusted

10 of 20

RRF – Best (Min. 80 Games)

Position

Player

Season

Team

RRF

1B

Hal Chase

1919

New York Giants

12.7

2B

Billy Goodman

1952

Boston Red Sox

7.2

SS

Dave Bancroft

1922

New York Giants

7.1

3B

Buddy Bell

1981

Texas Rangers

4.1

OF

Darin Erstad

2002

LA Angels

3.7

OF

Mike Cameron

2003

Seattle Mariners

3.6

OF

Taylor Douthit

1928

St. Louis Cardinals

3.6

P

Greg Maddux

2002

Atlanta Braves

3.8

11 of 20

Ballpark Effects

  • LINK

12 of 20

Ballparks – Extremes (Min. 3 seasons)

Effect

Ballpark

High

Ballpark

Low

Hits

Coors Field

1.182

Petco Park

.908

2B

Baker Bowl

1.291

Dodger Stadium

.795

3B

Palace of the Fans

1.868

Great American Ballpark

.523

HR_RF

Coors Field

1.374

Municipal Stadium

.636

HR_LF

Coors Field

1.385

Municipal Stadium

.634

Runs (unused)

Coors Field

1.380

Petco Park

.830

13 of 20

PA Decision Tree - Normalization

Every step in PA uses modified* log5 normalization (Bill James AVG example):

H/AB = ((AVG * OAV) / LgAVG) /

((AVG * OAV) / LgAVG + (1- AVG )*(1- OAV)/(1-LgAvg))

Where, LgAVG = (PLgAVG + BLgAVG)/2

2000 Pedro vs. 1923 Ruth Example:

H/AB = ((.393 * .167) / .2791) /

((.393 * .167) / .2791+ (1- .393)*(1- .167)/(1-.2791))

Where, LgAVG = (.283 + .276)/2 or .2791

Result = .2504

* Modified due to a flaw in the assumption above that the batter and pitcher carry equal (50/50) weights on each possible outcome of the PA event. Also accounts for handedness and ballpark.

14 of 20

PA Decision Tree – Steps 1*

Plate Appearance

Unusual Event

(IBB, WP, PB, SB, CS, SH,

Hit and Run, Pickoff, Balk)

Normal PA

HBP

(per PA or BFP)

Not HBP

BB

(per PA or BFP – HBP)

At Bat…

* No ballpark or handedness adjustments made yet.

15 of 20

PA Decision Tree – Steps 2

At-Bat

Out

Hit…

(AVG vs. OAV)*

Strikeout

(K/Out)

Normal

(Logic to determine direction

and GO or FO)

Hit

(Poor Play)

Error

(Fielding Percentage)

Normal

* Historical handedness adjustment and ballpark hits multiplier used.

16 of 20

PA Decision Tree – Steps 3

Hit*

Normal – In Play

HR*

(HR/Hit)

Out

(Plus Play)

Normal Hit

3B*

(3B/Hit * multiplier

for lost HR)

2B*

(2B/Hit * multiplier

for lost HR)

1B

* Ballpark multipliers used.

17 of 20

PA Decision Tree – Matchup Weights

Addresses previous 50/50 assumption using League-Adjusted Variance to form batter and pitcher weights for each step:

 

HBP/PA

BB/(PA-HBP)

H/AB

K/(OUT)

HR/HIT

2B/HIT

3B/HIT

Pitcher%

47.8

43.5

46.7

45.6

39.7

15.2

11.6

Hitter%

52.2

56.5

53.3

54.4

60.3

84.8

88.4

18 of 20

Matchup Weights: What does this mean?

  • Batter always has more control (even with HBP and BB)

    • Makes final decision (Swing or not)
    • Dictates strike zone
    • Less consistent

  • Doubles and Triples are (mostly) out of pitcher’s control (BABIP)

  • Does not necessarily batting is more important

    • 9 vs. 1
    • Fewer pitcher outliers means elite pitchers are more valuable

19 of 20

PA Decision Tree - Normalization

Batting Average Example using Matchup Weights:

H/AB = ((1.066*AVG * .934*OAV) / LgAVG) /

((1.066*AVG * .934*OAV) / LgAVG + (1.066- 1.066*AVG )*(.934- .934*OAV)/(1-LgAvg))

Where, LgAVG = (.934*PLgAVG + 1.066*BLgAVG)/2

2000 Pedro vs. 1923 Ruth Example (with handedness):

H/AB = ((1.066*.393 * .167 * .934) / .2795) /

((.393 * .167) / .2795+ (1- .393)*(1- .167)/(1-.2795))

Where, LgAVG = (1.066*.283 + 0.934*.276)/2 or .2795

Result * Handedness = .2502 * 1.045

Final Result = .2614

20 of 20

Thanks

Questions? – @lunch or after second presentation

Email: PBessire@WhatIfSports.com

Phone: 513-291-0321

See me for business card with promo code