Simulating Sports: The Inputs and the Engines
Paul Bessire
Product Manager, Quantitative Analysis and Content
FOX Sports Interactive, WhatIfSports.com
July 15, 2009
Table of Contents
Or “Improving the log5 Normalization Model for Batter/Pitcher Matchups”
About WhatIfSports.com
Sports Simulation
Baseball Challenges
Missing Player Data
Significant Stats ( # has missing data)
Pitchers
Hitters
Insignificant Stats
Pitchers
Hitters
WIS Relative Range Factor
RRF – Best (Min. 80 Games)
Position | Player | Season | Team | RRF |
1B | Hal Chase | 1919 | New York Giants | 12.7 |
2B | Billy Goodman | 1952 | Boston Red Sox | 7.2 |
SS | Dave Bancroft | 1922 | New York Giants | 7.1 |
3B | Buddy Bell | 1981 | Texas Rangers | 4.1 |
OF | Darin Erstad | 2002 | LA Angels | 3.7 |
OF | Mike Cameron | 2003 | Seattle Mariners | 3.6 |
OF | Taylor Douthit | 1928 | St. Louis Cardinals | 3.6 |
P | Greg Maddux | 2002 | Atlanta Braves | 3.8 |
Ballpark Effects
Ballparks – Extremes (Min. 3 seasons)
Effect | Ballpark | High | Ballpark | Low |
Hits | Coors Field | 1.182 | Petco Park | .908 |
2B | Baker Bowl | 1.291 | Dodger Stadium | .795 |
3B | Palace of the Fans | 1.868 | Great American Ballpark | .523 |
HR_RF | Coors Field | 1.374 | Municipal Stadium | .636 |
HR_LF | Coors Field | 1.385 | Municipal Stadium | .634 |
Runs (unused) | Coors Field | 1.380 | Petco Park | .830 |
PA Decision Tree - Normalization
Every step in PA uses modified* log5 normalization (Bill James AVG example):
H/AB = ((AVG * OAV) / LgAVG) /
((AVG * OAV) / LgAVG + (1- AVG )*(1- OAV)/(1-LgAvg))
Where, LgAVG = (PLgAVG + BLgAVG)/2
2000 Pedro vs. 1923 Ruth Example:
H/AB = ((.393 * .167) / .2791) /
((.393 * .167) / .2791+ (1- .393)*(1- .167)/(1-.2791))
Where, LgAVG = (.283 + .276)/2 or .2791
Result = .2504
* Modified due to a flaw in the assumption above that the batter and pitcher carry equal (50/50) weights on each possible outcome of the PA event. Also accounts for handedness and ballpark.
PA Decision Tree – Steps 1*
Plate Appearance
Unusual Event
(IBB, WP, PB, SB, CS, SH,
Hit and Run, Pickoff, Balk)
Normal PA
HBP
(per PA or BFP)
Not HBP
BB
(per PA or BFP – HBP)
At Bat…
* No ballpark or handedness adjustments made yet.
PA Decision Tree – Steps 2
At-Bat
Out
Hit…
(AVG vs. OAV)*
Strikeout
(K/Out)
Normal
(Logic to determine direction
and GO or FO)
Hit
(Poor Play)
Error
(Fielding Percentage)
Normal
* Historical handedness adjustment and ballpark hits multiplier used.
PA Decision Tree – Steps 3
Hit*
Normal – In Play
HR*
(HR/Hit)
Out
(Plus Play)
Normal Hit
3B*
(3B/Hit * multiplier
for lost HR)
2B*
(2B/Hit * multiplier
for lost HR)
1B
* Ballpark multipliers used.
PA Decision Tree – Matchup Weights
Addresses previous 50/50 assumption using League-Adjusted Variance to form batter and pitcher weights for each step:
| HBP/PA | BB/(PA-HBP) | H/AB | K/(OUT) | HR/HIT | 2B/HIT | 3B/HIT |
Pitcher% | 47.8 | 43.5 | 46.7 | 45.6 | 39.7 | 15.2 | 11.6 |
Hitter% | 52.2 | 56.5 | 53.3 | 54.4 | 60.3 | 84.8 | 88.4 |
Matchup Weights: What does this mean?
PA Decision Tree - Normalization
Batting Average Example using Matchup Weights:
H/AB = ((1.066*AVG * .934*OAV) / LgAVG) /
((1.066*AVG * .934*OAV) / LgAVG + (1.066- 1.066*AVG )*(.934- .934*OAV)/(1-LgAvg))
Where, LgAVG = (.934*PLgAVG + 1.066*BLgAVG)/2
2000 Pedro vs. 1923 Ruth Example (with handedness):
H/AB = ((1.066*.393 * .167 * .934) / .2795) /
((.393 * .167) / .2795+ (1- .393)*(1- .167)/(1-.2795))
Where, LgAVG = (1.066*.283 + 0.934*.276)/2 or .2795
Result * Handedness = .2502 * 1.045
Final Result = .2614
Thanks
Questions? – @lunch or after second presentation
Email: PBessire@WhatIfSports.com
Phone: 513-291-0321
See me for business card with promo code