FanGraphs Official Position On: Single Season UZR

by Bradley Woodrum - September 20, 2011

One year of UZR data is enough.

That’s right — in fact, it’s so right I’ll write again, embolden’d even: One year of UZR data is enough.

The proper question, however, is, “Enough for what?”

Welcome to the UZR portion of the FanGraphs Official Position series. Consider me an emissary of Ultimate Zone Rating (UZR), Mitchel Lichtman’s metric that is by many analysts’ perspective the cutting edge of defensive statistics.

With the MVP debate heating like a kettle on the stove, many writers, fans, and even analysts have begun decrying UZR and its inclusion in Wins Above Replacement(WAR).

“If it takes three seasons for UZR to stabilize,” the general argument goes, “then why is a single season included in WAR?”

Yes, it is true that a single season of UZR tells us almost nothing about a player’s true talent — by Lichtman’s own admission:

[A] player’s true talent UZR or what you might expect from him in the future is as close to that one-year number as it is to zero…

Visually, this is what he means:

A single season of UZR is a point along the spectrum of true talent, but the error range for that single point is so large as to mean almost nothing. So yes, one season of UZR is notenough to determine a players true talent level.

Bronzing a player’s glove after a single season would be as foolish as enshrining a hitter after just 400 plate appearances.

HOWEVER! That is only when we are talking about true talent levels. Many people who use UZR suppose incorrectly that since a single season does not accurately report true talent levels we can effectively ignore that season until there’s a wealth of data.

But just because a single season does not tell us a player’s true talent, it does not mean that year’s UZR tells us nothing. To the contrary, the data attempts to show us the story of the season, it describes the season carefully and relatively effectively, even though it is not perfect.

Take for instance, Troy Tulowitzki‘s 2007 season, wherein he ranked among the best defensive shortstops in the league (13.2 UZR/150). The following year, he ranked almost perfectly average (0.1 UZR/150). Overall, we have come to discover Tulo is most likely a top 10 shortstop — very good defensively and at age 26, possibly getting even better. But that does not mean 2007 and 2008 did not happen.

In 2007, Troy likely did make plays that very few shortstops in the league — even he himself — could usually make, and it is unfair to take that away from him and assume it just didn’t happen. Likewise, it would be unfair to take away his 2008 because it now seems like an outlier. In that year, in those plays he made and didn’t make, he likely played more as an average shortstop. His true talent level was probably never average, but that year, it’s quite possible that he made enough miscues to earn the label. Defensive performance is not static, and should not be treated as if it is.

A few weeks ago, Mark Simon of ESPN took a look at some of the specific plays that have hurt Curtis Granderson‘s defensive ratings. Using the Advanced Plus-Minus metric (a statistic similar to UZR), Simon closely examined six plays that dramatically effected Granderson’s overall rating. The truth is: Those plays that hurt him really happened, and he really did make mistakes that hurt his team. They weren’t aberrations in the data. They weren’t perfectly subjective events. They were good, honest, American mistakes.

Granderson, as the article notes, is a high-risk, high-reward fielder — which means he is bound to have some great, amazing seasons, and some terrible, awful seasons.

So this brings us back to the original statement: A single season of UZR is enough. It’s enough to tell the narrative of the season — to give us a description of the events that occurred without missing a play or warping events to our preconceptions. It is enoughdescriptive purposes, though the uncertainty around measuring defense makes it less precise than offensive metrics – this can be adjusted for, of course.

In calculating WAR, should we replace a single season UZR with a prorated 3-year average? Heavens no! That would be like normalizing a player’s BABIP because it’s out of whack with career norms.

Should we bump Casey Kotchman (2.4 WAR) down to 0.6 WAR because his BABIP inflated his hitting this year? Should Adrian Gonzalez‘s BABIP — which sits 60 points higher than his career norm — hurt his AL MVP chances? I should think not, but here is where we meet the forking.

When it comes to considering MVPs and seasonal awards, each voter and fan has a different perspective. Some vote on who they think the truly best player is — true-talent-wise — while others may choose to vote for the player who had the best season or the player who proved most important to a needy or dominant team.

For looking at true talent levels, a single season of UZR does not work. However, I can think of no human who has ever said, “Albert Pujols should win the MVP award this yearbecause he had been the best player for the last five years.”

Usually the argument goes, “Third baseman Bruce Wayne led the league in homers and RBI,” or, “Tony Stark led all outfielders in wRC+” (depending on the progressiveness on the speaker). These single season statistics do not convey true talent levels, so here the issue is performance — who performed the best in a given season — and for that purpose, one season of UZR does a better or as-good job than anything else out there. (NOTE: UZR in a single season is not exact, and but it’s far from worthless.)

Ideally, we will one day have public Field F/x data which perfectly and accurately adjusts for outfield positioning, that precisely tracks ball speed and direction, and ends the tiered grounder/liner/fly/fliner ranking system. But, until then, UZR is a decent substitute. It’s not perfect, just like ERA is not a perfect way to evaluate pitchers, but it is useful information and shouldn’t be ignored.

43 Responses to “FanGraphs Official Position On: Single Season UZR”

You can follow any responses to this entry through the RSS 2.0 feed.

Click here to view comments in a non-threaded output.

  1. Matt says:
  2. September 20, 2011 at 11:10 am
  3. If WAR is supposed to measure what actually occurred in a given year rather than a player’s true talent level, then why does it use FIP for pitchers instead of ERA?

  5. Reply
  1. Anon21 says:
  2. September 20, 2011 at 11:11 am
  3. “A single season of UZR is enough. It’s enough to tell the narrative of the season — to give us a description of the events that occurred without missing a play or warping events to our preconceptions. It is enough for any and all descriptivepurposes.”
  4. Look, I’m no expert on advanced fielding metrics. But to my reading, mgl himself has said, in multiple places, that your assertion here is just wrong. For example:
  6. Frankly, I’m a little shocked to see a Fangraphs writer who apparently has so little understanding about the kind of error that UZR includes being tapped to write the official position on UZR article.

  8. Reply
  1. kinnerful says:
  2. September 20, 2011 at 11:12 am
  3. Then why do you use FIP in WAR calculations? To me FIP is more like a true talent/forecast variable than a variable that’s describing outcomes (“true” or “false”).

  5. Reply
  6. mcbrown says:
  7. September 20, 2011 at 11:14 am
  8. Nice article. Complaints about single-season UZR ultimately boil down to “what question are you trying to answer?” UZR itself is what it is; any “problems” arise from incorrect usage.
  9. I will, however, point out that after only two “Official Position” articles FanGraphs now has an official inconsistency in its official positions. This article argues that UZR accurately captures single-season fielding performance, while the AL MVP article explicitly leans on the “uncertainty” of single-season UZR. I don’t really care, and normally I wouldn’t even mention it. But since these articles are supposed to lay out official FanGraphs editorial positions, and since I am left feeling like I don’t know FanGraphs’ actual official position on single-season UZR, I would appreciate some clarification.

  11. Reply
  1. Ryan says:
  2. September 20, 2011 at 11:14 am
  3. I described some of displeasure with the AL MVP position yesterday, and this piece opens up further discussion for me with the AL MVP position and some inconsistencies with WAR.
  4. The premise of this piece is that UZR accurate describes a single-season performance, but not true talent level. The premise of FIP, the component of fWAR rather than simply runs allowed (ERA, basically), is that it is a measure of true talent level.
  5. How can it be that a (huge) component of pitcher’s WAR measures true talent level rather than single-season performance, while a component of position player’s WAR measures single-season performance rather than true talent level?
  6. This is at the very heart of the Verlander vs. Bautista and Ellsbury debate.

  8. Reply
  1. Eminor3rd says:
  2. September 20, 2011 at 11:22 am
  3. Nice article, Bradley. This is the type of clear, assertive position that we need to see in this series.

  5. Reply
  1. Telo says:
  2. September 20, 2011 at 11:25 am
  3. I think the fact that UZR is essentially a “black box” is why it is such a hot topic among saberists and traditional fans alike.
  4. wOBA is painfully simple. (You hit a single? You get a point. You hit a homer? You get 2.5 points.) It’s the first place I start when I talk to my non-saber leaning friends about expanding their horizons.
  5. But UZR is entirely different. I am completely on board with you when you claim that “One year of UZR is enough” because it measures what happened. It’s HOW it measured what happen is what makes people feel uneasy. Nowhere do you address those concerns, which to me are paramount in the discussion.

  7. Reply
  1. Dustin says:
  2. September 20, 2011 at 11:25 am
  3. It isn’t enough if it really believe Casey McGehee is an above average 3B.

  5. Reply
  6. Matt H says:
  7. September 20, 2011 at 11:26 am
  8. “In calculating WAR, should we replace a single season UZR with a prorated 3-year average? Heavens no! That would be like normalizing a player’s BABIP because it’s out of whack with career norms.”
  9. And the problem is?
  10. Personally, I think we should regress a player’s BABIP TOWARDS his career norms – though not necessarily entirely there. This is because part of BABIP, and UZR, is luck. Now I realize that a bloop single to center or a misplayed fly that ends up as a triple are both events that actually happened, but should we really credit the batter for these events? No, of course not. Now, given, we can’t look at a batter’s every hit and out and decide whether they deserve it or not, but the point is, sometimes batters have lucky seasons, not because they are playing our of their true talent level, but because they are simply getting lucky. This is part of BABIP and should be accounted for.
  11. Similarly, UZR is not a perfect statistic. It tries to describe what actually happens as well as possible, but the truth of the matter is, it is going to overrate some players and underrate some players simply because of its inaccuracy as a metric. Some defensive season are going to look waaaay worse or waaay better than a player’s true talent level. Yes, sometimes it is because a few plays, which I agree, should count, greatly affected it, but sometimes it is because UZR is not perfect. It makes assumptions and is not entirely precise, meaning it can misrepresent not only the player’s true talent level, but what actually happened on the field that the fielder was responsible for.
  12. Long story short, luck exists in BABIP and UZR, and should be accounted for. Does this mean we normalize BABIP and UZR to career norms and 3-year averages respectively? No, of course not. But we need to keep these factors in mind when evaluating a player, and not necessarily take the measures at face value.

  14. Reply
  1. todmod says:
  2. September 20, 2011 at 11:28 am
  3. The bigger question – the accuracy of UZR to measure what happened. Why is this question completely ignored? Why should we trust UZR to be 100% accurate when there are other defensive metrics that its single season data doesn’t match up with.

  5. Reply
  6. MG says:
  7. September 20, 2011 at 11:31 am
  8. In several years after Field f/x has been installed in every park and there are multiple years of data available, these kind of debates will largely cease as UZR is either refined or added to the dustbin of history.

  10. Reply
  11. Jason says:
  12. September 20, 2011 at 11:33 am
  13. I have a lot of problems with this realization of the statistic. UZR really is not a record of what actually happened. It is a very poor approximation of what actually happened. Contrary to the author’s assertions, UZR has no precision in recording what actually happened as far as we know. The variance in UZR comes from two sources:
  14. 1) randomness in opportunity. Sometimes players don’t get the difficult but makeable plays that are required for high UZR, sometimes they get lots of them, etc. This is out of the players control, but this variance will likely decrease with sample size (this type of variance may not equalize between players ever, however, because of differences in the pitching staffs they play behind and the ballparks they play in, etc.).
  15. 2) Error in assigning values. UZR makes continuous data into categorical variables. In real life balls aren’t hit into zones, they are hit some distance away from the fielder. Balls aren’t hit hard or soft, they are hit with some velocity. Balls aren’t liners or popups, they follow a trajectory. Making continuous data categorical is measurement error. The variance from this measurement error will also decrease with sample size, but it is never going away. Lots of balls that are deemed catchable by the best players are actually not catchable by anybody. Also, lots of balls deemed really difficult to catch would actually have been caught by just about everybody.
  16. UZR is not a precise record of what actually happened contrary to the author’s assertion. In actuality, UZR is an estimate of what happened. An estimate with lots of error.

  18. Reply
  1. mb21 says:
  2. September 20, 2011 at 11:35 am
  3. Thanks to others above for pointing it out, but UZR most certainly does notrepresent what the player did defensively in a single season.

  5. Reply
  6. grandbranyan says:
  7. September 20, 2011 at 11:36 am
  8. You can very accurately determine the number of runs created on the offensive side. With the amount of grey area that exists in UZR that same certainty cannot be obtained on the defensive side.
  9. Take Ryan Braun and Nyjer Morgan in 2009. You can be fairly certain Braun was in fact worth 5 wins because most of his value comes from offense whereas with Morgan’s number being almost entirely dependent on his UZR that year maybe he was a 4 win player, maybe he was a 6 win player. That is significant.

  11. Reply
  12. Telo says:
  13. September 20, 2011 at 11:45 am
  14. Another point, from the UZR primer itself:
  15. “[UZR] does not give us a perfect estimate of a player’s true talent or even an accurate picture of what actually happened on the field. The reason for that is that the data is imperfect.”
  16. From Bradley:
  17. “In calculating WAR, should we replace a single season UZR with a prorated 3-year average? Heavens no! That would be like normalizing a player’s BABIP because it’s out of whack with career norms.”
  18. Those are simply not the same scenarios. We know exactly, definitively, what someone’s BABIP was from year to year. There are inherent inaccuracies and bias in fielding data that cannot be ignored! How many times does this have to be said!
  19. Here’s a thought experiment:
  20. X player has 600 PAs in 2011. But in 200 of them the data is WRONG. It’s simply completely wrong. It could be a HR instead of a K, a single instead of a triple. You have no idea. You have 400 accurate PAs and 200 gibberish PAs, and you can’t separate them.
  21. You also have the single season hitting line for that player, with all 600 PAs, (using the 200 gibberish PAs).
  22. I ask you: what is your most accurate guess of the player’s true, expected 2011 season line?
  23. Do you give me all 600 PAs with gibberish included? Or do you use his past performance to nudge the 2011 season towards what you expect the player to have performed at, since you know it’s likely that he is closer to his past performances than the 400 true outcomes, plus 200 randoms?

  25. Reply