Wednesday, October 21, 2009

There's a Stat for That: UZR

There is a question that TwinsCentric didn’t answer, or even raise. We’ll raise it today. Maybe we’ll answer it tomorrow.

One thing we tried to do within the TwinsCentric Offseason GM Handbook is pay increased attention to defense. For instance, I mentioned it in the first half of this review of Orlando Cabrera:

Last year at this time, Orlando Cabrera had plenty of reason to be hopeful about his impending free agency. He was a .300 hitter, a veteran shortstop, and had been a key component to a number of playoff teams over the latter half of the decade. He’d been so good that he easily found himself listed as a ‘Type A’ player by Elias in their year-end rankings.

That last fact turned ugly in a hurry. The White Sox offered Cabrera arbitration when neither side wanted him to return there. Since Cabrera turned down the arbitration and was an ‘A’ player, any team signing him would need to forfeit a first-round draft pick to the White Sox. And with that additional (and artificial) cost, the market for Cabrera dried up like a Sham-wow.

He finally signed with the Athletics in March, but he could only procure a one-year deal for $4 million, essentially signing a “make-good” contract after he had already made good. But give him and his agent credit for having learned their lesson. They reportedly made sure the new contract specified that whichever team has him can’t offer him arbitration if he’s a Type A free agent again.

It looks like he’ll be just that. Alas, the Twins won’t be offering him arbitration, so he might get a better deal despite having a worse year. His batting average and OPS fell to .284 and 705 respectively, but his defense was a bigger concern. The 34-year-old isn’t getting to groundballs the way he used to, and defensive metrics like a UZR of -14.9 confirm that.

(And yes, you can download that, along with 1/3 of the Handbook, absolutely free at TwinsCentric.com. Thanks for asking!)

You’ll notice in that last paragraph that we use UZR or Ultimate Zone Rating. It’s cited a few times in the Handbook, and we referenced it behind the scenes multiple times. It’s become a standard defensive metric, but I worry a little that it’s popularity stems mostly from being readily available at FanGraphs.com.

So I was excited to stumble across this interview with the developer of UZR, Mitchel Lichtman at BaseballDailyDigest.com, conducted by Joel Hamrahi. (Who, apropos of nothing, calls the Handbook “every baseball fan’s dream.” But I digress.) So let’s see how this thing works.

UZR starts with a map of the field that divides it into 22 slices and then divides those up into distances of 30-35 feet. (Just so you can picture it, I threw together the dreadful little drawing on the left.) For each of those spots they know from lots of major league data what percentage of the time a ball is turned into outs.

But it doesn’t stop there. Each of those probabilities are broken down into more granular probabilities based on further conditions. Those other conditions are important, so he specifies them. They are:

- Type, which I think he means as type of hit, but the values are hard, medium, and soft
- handedness of the batter, because it influences positioning and he claims it influences the speed, which I don’t quite understand
- game situation, meaning the baserunners and outs, because is also influences positioning
- ground ball/fly ball ratio of the pitcher, which again he says influences the speed

So, basically you have a huge table that has location and the rest of these conditions as columns, and for every possible combination of those, it has the percentage of time a ball is turned into an out by each fielder.

Based on those percentages, a fielder gets or loses varying amounts of credit for their performance. For instance, if Jermaine Dye catches a ball that 90% of right fielders catch, he gets credit for 10% of an out. If he misses a ball that 60% of right fielders catch, he loses 60% of an out.

Then UZR turns those plays into runs using a very high level metric. It counts an out as .28 runs, an infield hit as .5 runs and an outfield hit as .6 runs. Since every ball is one or the other, I’m assuming that a play by an infielder credits or subrtracts .78 runs, and an outfielders play counts as .88 runs. So I think that if we go back to our outfielder who got credit for 10% of a catch, he gains .088 runs for that catch.

Lichtman goes into more detail, and I think a few of them are important. First, he rightly points out that in his system “fielder positioning is inherent in the results.” He only tracks flyballs to outfielders and groundballs to infielders, so an outfielder who misplays ground balls isn’t penalized. He also adjusts the final numbers based on the outfielders throws. He adjusts for park factors. And finally, an outfielder is never penalized if a gap hit is caught by another outfielder, but is if it is a hit.

That final point is an important one to me, because all this research was driven by a simple question – why is Denard Span listed as such a poor defender by UZR? That’s a topic we didn’t tackle, or even raise, in the Handbook. I hope to jump into some possible explanations tomorrow.

10 comments:

David said...

Awesome post. That's got to be the most concise yet thorough explanation of UZR I've ever read. How does it factor for outfield/infield shifts? Or does it?

Parker said...

OMG. I'm so excited you wrote about this I peed myself a little.

For those that have purchased the TwinsCentric Offseason GM Handbook, you'll notice that we referenced two different types of fielding data: the UZR and Dewan's Plus/Minus system. In July, Dewan answered a bunch of questions at length regarding how the two systems differ fundamentally. (Which you can find here: http://www.insidethebook.com/ee/index.php/site/article/john_dewan_and_research_assistant_speak/)

Both are based on the same BIS data but they apply different calculations on whose responsibility the play was (i.e. a grounder between second/first), Plus/Minus factors in some shifting (first baseman holding runners, hit-and-runs, etc). The most notable difference actually answers the Span question you posed. So, in UZR’s system, they blanket all fields the same, regardless of wall dimensions. Therefore, anyone playing left field in Fenway was penalized unfairly for shots off the Monster as they were inventoried as potential plays that the left fielder “should” have made if they were in other ballparks. Dewan’s system accounts for that factor and adjusts accordingly, referring to it as the “Manny Adjustment” after Manny Ramirez who was the most notable sufferer of the wall.

Circling back to Denard Span’s case, Span some how exercised a negative UZR (-9.5 UZR/150) in RF but was outstanding in the most spacious LF (13.2 UZR/150). How can this be? Is he like Derek Zoolander who was an ambi-turner? The Baggy at the Dome is another example of an element that affects the UZR numbers. Whereas fangraphs.com has him listed as a -9.5 RF, the Plus/Minus system acknowledges that balls hit halfway up the Baggy are unplayable, therefore not demeriting him for those plays. According to billjamesonline.net’s warehousing of P/M, Span was a +3 RF, saving two runs.

Same thing applies for Cuddyer. Where Cuddyer’s UZR numbers look downright vomit-inducing, he’s actually better in the P/M (still bad because of the range).

Parker said...

Shart. I probably just stepped on your next blog posting. Should have read the last paragraph more carefully...

Jake Lunemann said...

The types of hits are fly, fliner fly, fliner liner, and liner. To go along with these, plus ground balls, they have the soft, medium, or hard variations. Starting this year, at Baseball Info Solutions, (Dewan's company) they have also started using a stopwatch to measure how long the flight of the ball was and also they started to figure in the infield shifts. Outfield shifts are tough to do unless you had people at every single game since they do not show where the OF is positioned on TV. It is not a perfect system yet but they are headed in the right direction.

TT said...

What are you rating? You are assuming that there is something universal you are measuring, but I am not sure what it is. Much less be able to evaluate whether this system does it accurately.

Parker points out that the Green Monster and the Baggie effect these numbers. But far from "penalizing" anyone, those park effect make the range to get to those balls less valuable. Its not a penalty, there is just less ground to cover and so being able to cover more ground is less valuable.

One of the effects of the Dome's turf was that a ground ball could go for extra bases if they got by an outfielder. That meant that infielders positioned themselves to knock down balls even where they might not be able to get the out. It made a strong infield arm, for longer throws more valuable.

In the outfield, it meant players positioned themselves to prevent singles becoming doubles or triples. That sometimes meant letting balls fall for singles that in another park would have been outs.

The point is not that the numbers vary, but that it is not clear what you are measuring. The goal seems to be to measure some idealized version of player skill, rather than their actual performance.

It may useful as evidence for a pre-conceived conclusion. But I don't think anyone should take it as a serious measure to compare players. Even at the extremes the difference between players might mean something, or it might just be there is some factor not considered which has a large impact.

Jason said...

It appears to simply be a measurement of probability. There is an X amount of chance that a ball will be hit to a certain spot, on average X percentage of fielders will make or not make the play. There's some adjustments on other factors but it's basic method is pretty simple. But it's strength is in the large amount of numbers which can go into the calculation. Large number systems are predictable.

jim h said...

When I first started reading baseball/Twins blogs most saber/bloggers ignored defense when ranking, discussing or evaluating players because there was no saber/way of doing so. I often felt they were ignoring an important part of a player's value.

Now, I think people are using an evaluation tool that they do not really understand or if they understand it they are putting way too much weight in its numbers.

No two players will ever position themselves the same. Just to use one example a slower but strong armed shortstop will position himself differently than a quicker but weaker armed player-against the same hitter with the same pitcher on the mound. They will likely both make all the routine plays but will make different difficult plays. Trying to compare the two can be difficult for a trained scout. Using some sort of grid and than adding in various judgements as to how hard a ball is hit is unlikely to lead to any conclusions that are very reliable. I don't see how anyone can consider this as anything but a rather crude and rough comparsion.

Trying to turn this into runs saved or runs lost makes it even worse. Baseball just doesn't work that way. One time a misplay might result in a huge inning or a game ending run, the next similar misplay will basically have no easily discerniable effect.

Part of the problem here is that these systems tend to focus on range as the major/only factor in defense. Clearly arm strength/accuracy, positioning(which can be only vaguely accounted for), judgement, and any numbers of playing condition factors and weather factors can only be accounted for by an observer which adds all sort of biases.

I think possibly, a fielding metric could be useful if there were a large enough sample. I doubt if one year is nearly large enough, especially for part time players or players who play more than one position. The reason I think this that most plays are basically major league routine. Most players should be able to catch most outs-most hits are pretty much uncatchable. The few plays in between are not enough plays to fairly evaluate a player.

BeefMaster said...

No two players will ever position themselves the same. Just to use one example a slower but strong armed shortstop will position himself differently than a quicker but weaker armed player-against the same hitter with the same pitcher on the mound. They will likely both make all the routine plays but will make different difficult plays. Trying to compare the two can be difficult for a trained scout.

I recall this being discussed a couple years ago when highlighting the differences between Derek Jeter and Adam Everett, who were basically at opposite ends of the advanced defensive metrics. Some analysis of their showed that Everett generally positioned himself deeper than Jeter, and as a result, Everett's "great plays" tended to be plays in which he ranged deep into the hole, where Jeter's tended to be slow rollers (the kind that often result in a barehand pickup). Everett's defensive numbers were so much better largely because his case was more common.

Trying to turn this into runs saved or runs lost makes it even worse. Baseball just doesn't work that way. One time a misplay might result in a huge inning or a game ending run, the next similar misplay will basically have no easily discerniable effect.

I'd argue that that's not the point. While I agree that a guy making a great play with the bases loaded directly saves more runs than the same play with no one on, making it more valuable, the play with no one on displays the same amount of fielding skill. Unless you believe that there is a reason why the player would not have made the play with the bases loaded (some sort of clutch fielding ability), adding the context makes the stat less useful for evaluating individual performance, in the same way that RBI are not useful for evaluating individual batters.

Part of the problem here is that these systems tend to focus on range as the major/only factor in defense. Clearly arm strength/accuracy, positioning(which can be only vaguely accounted for), judgement, and any numbers of playing condition factors and weather factors can only be accounted for by an observer which adds all sort of biases.

Positioning, certainly, is a weakness of the current systems, and I'd add that it is compounded by the fact that it's virtually impossible for an observer to determine whether that positioning was dictated by the player himself or his manager/coaches (a guy whose manager tells him to play in shouldn't have a bloop over his head counted against him). I think weather and field conditions are others - I think it would be interesting to see home/away UZR splits for players on turf, for example, or who routinely play in poor weather, and I'm not sure where to find that information.

The others, though, I'm not so sure. Arm quality is already accounted for at least somewhat in UZR, as it tracks the number of those plays resulting in outs - if Everett didn't have the arm to get the ball to first after those plays in the hole, they wouldn't end up as outs and count positively toward his UZR. I think it's much squishier for outfielders, as I don't know how much UZR attempts to measure outfield assists or the deterrent effect of a good outfield arm. I'd generally argue that range is more important than arm for outfielders, as there are likely fewer opportunities for outfield assists than for cutting off balls in the hole or catching flies, but that's not something I'm certain of. Judgment isn't something that can be all that directly quantified, but I don't know that it really comes into play that often - most plays are routine plays.

TT said...

"I'd generally argue that range is more important than arm for outfielders, as there are likely fewer opportunities for outfield assists than for cutting off balls in the hole"

I think that is right to some degree, but these systems don't measure how often a player cuts off a ball in the hole and prevents an extra base hit. It gives credit for a diving catch that makes an out, but no penalty if that ball goes by him for a triple instead of single.

jim h said...

A large part of the problem with UZR is that it turns it rating into runs- plus or minus. That can't possibly be reflective of reality. I think I agree with Beefmaster in that a great play stands on its own, but this system turns everything into plus or minus runs. It focuses on outs but tends to ignore base advancement, which also leads to runs.

TT is right, you get a bonus for making a great catch, but turning a single into triple like Rayburn did in the playin game against the Twins wouldn't be penalized. There is no way to measure how times Cuddyer's strong, accurate arm pervents baserunners from even trying to advance a base.

I agree that range is the biggest factor in a strong defender. I just don't see how you can ignore all the other factors that go into strong defense. Even when those factors are not ignored, you are now introducing observer biases, and judgements which are not necessarily that reliable.

When someone tells me that Player X is 10 runs a season more valuable to his team than Player Y based on UZR, I am going to laugh at that someone. UZR can't possibly determine that, accurately.