Sunday, February 10, 2008

Nathan More than his Saves

Let’s start with this concession: the Save is a bullcrap statistic.

I don’t really know how you can argue any differently. It is defined in three different ways, and really none of them have any objective basis. I’ll take it further. I believe it has slightly corrupted the game of baseball, resulting in a lower level of play and a truly inefficient use of resources.

But that does not mean that Joe Nathan hasn’t been one hell of a pitcher.

Because there is a non-bullcrap statistic that shows exactly how valuable he has been. It’s called Win Probability Added or WPA, and it should be a beautiful statistic, because it should bring the sabremetrics and traditionalists together. It’s truly a shame that it’s so rarely referenced in the media, but I suspect that’s partly because it isn’t particularly easy to explain, so let’s try and do so in less than 100 words.

“WPA measure how much a player does to help his team win games. It starts with a long list of probabilities pulled from 30 years of major league baseball. Every situation is listed along with the percentage of times that teams in that situation won. So, if a team goes into the bottom of the eighth down a run they have a 23% chance to win.

And from that starting point, each batter and pitcher can earn points by how much they help their team. So, if the leadoff batter gets on first base, his team now have a 31% to win, and the batter gets the 8 points difference, while the pitcher is penalized –8 points. If the next batter hits into a double play, then his team’s chances drop to 13%, and the second batter is penalized –18 points while the pitcher is rewarded another 18 points.”

Yikes, 149 words. Nobody is going to use that in a 750 word story. We’ll work on that. Feel free to take your stab at it in the comments section.

Anyway, over the last four years, the leader in these points on the Twins hasn’t been Johan Santana. And it hasn’t been Joe Mauer, and it hasn’t been Justin Morneau. In fact, it wasn’t Morneau even in the year where he was voted MVP. The leader over those four years, and individually in three of those four years, was Joe Nathan.

How can a guy who has only pitched about 70 innings per year be the most valuable guy on the team? Well, as you get nearer the end of a close game, every at-bat and every pitch count a lot more in WPA, because they count a lot more in a game. For instance, every time Nathan takes the mound at the bottom of the ninth with a one run lead and nails down the save, he gets 13 points.

Does that seem like a lot? Well, let’s not forget that these points haven’t just been assigned by some cognitive statistical society. They reflect the actual number of percentage points that player have helped their team over 30 years of major league history. He gets those 13 points because 13% of the time, a pitcher has failed to do his job in that situation.

Also, it works the other way, too. If Nathan gives up a two runs in that half inning, he loses 87 points. That’s the way it works for a closer. It doesn’t take too many blown saves over a season to wipe out all the “saves” that a closer might accumulate.

So Nathan’s ranking at the top of that list isn’t some statistical shell game. He really has been that valuable for the Twins. His impact on games is more than the number of saves he has, or his All-Star appearances, or his Cy Young votes (he’s finished in the top 5 twice). It’s even more than the intangible comfort that he gives the coaches and teammates. His impact can be measure objectively, and it’s been huge.

The ‘Save’ has had several negative impacts on baseball, but trashing it won’t get rid of it. The task at hand is far harder. It needs to be ignored, and replaced with a more concrete evaluation tool. We shouldn’t let a justifiable disdain of it cloud our recognition of Nathan’s performance.

38 comments:

Jon Marthaler said...

Question about WPA.

Consider the following scenario: the Twins, playing at home, begin the 8th inning with a 1-0 lead. Pat Neshek is on to pitch, and he retires the side in order. In the ninth, with the Twins still leading 1-0, Nathan comes on and pulls off a textbook save: three up, three down.

In WPA, Nathan's inning is more valuable. My question: does this make intuitive sense? (I'm asking because I can't decide.)

Nate said...

I'm with Jon. There is no way that the 9th inning is more valueable than the 1st inning or any other inning. That is my problem with WPA. Santana is far more valuable than Nathan and any team in the league would take Johan over Joe, even with the large difference in their contracts.

John said...

But it is ABSOLUTELY more valuable, and it is more valuable for a very specific reason:

you have a better chance of bouncing back from losing the lead in the eighth than you do in the ninth.

In the same way that it's more valuable to get a hit with the bases loaded than with the bases empty. Or to hit a game-winning home run against a rival when you're tied in the standings with them, then if you're 10 games back.
Or, for that matter, being on your 'A' game with that gorgeous co-worker you've admired from afar versus the wife of a friend.

The performance is the same, but the value of that performance is dramatically different. Timing is important, and the time of the ninth inning is more important than the time in the eighth inning. And the same performance there is more valuable.

TT said...

My question: does this make intuitive sense?

Yes. Although I think WPA is just another crappy statistic whose flaws are hidden in layers of complexity.

Its possible after Neshek's one inning that the Twins will score more runs in the top of the ninth, the game will no longer be close and his performance largely irrelevant. Its possible for Neshek to give up the lead, and the Twins can still win the game when they come to bat.

When Nathan is on the mound in the bottom of the ninth, the game is over if he gives the other team the lead. And if he doesn't the Twins win the game.

So, while the performance of the two pitchers may be the same, the importance of their performance to the team isn't. That's one of the reasons ninth inning closers have become popular. The other reason is managers discovered that a good pitcher going full blast is often going to be dominating over one inning if they aren't overused.

Drake33 said...

Its possible after Neshek's one inning that the Twins will score more runs in the top of the ninth, the game will no longer be close and his performance largely irrelevant.

No. Neshak's inning pitched is still HUGE in terms of WPA. He still gets the big WPA bump in pitching the eighth inning of a 1-0 game.

When Guerrer or Crain comes in to "mop up" a 6-0 game, his performance is not going to get any significant WPA adjustment.

It's kind of like hitting a grand slam homerun when you are up 7-0 in the eighth inning is great for your stats and RBI's, but really, did it help your team win? Yeah, but not by a whole hell of a lot.

TT said...

Neshak's inning pitched is still HUGE in terms of WPA

But in real terms, it is huge only because Nathan isn't very likely to give up the lead. If it were August 2001 with LaTroy Hawkins closing games, getting a one run lead to him would not mean that much.

I'm not familiar with the details of WPA, why bother? But it seems to give Neshek points whether they actually result in the team winning or not.

John said...

tt,

Actually, Neshek would get credit. He would get the 11 probability points he earned for getting it to the 9th innning. If Nathan blew the save, he would then lose the 86 points.

Think of the WPA points as a thermometer, starting at 50%. Pusing it towards the win gains positive % points. Pushing it towards the loss gainst negative percentage points. At the end of the game, the team cumulatively has +50 points or -50 points.

(In actuality, that's not quite right, but you get the idea.)

And don't be so crabby. Behind that crusty exterior, I suspect you like learning new stuff as much as the next guy. I would think you would love this stat. It is a much different statistic than OPS or EQa or whatever else.

Dan said...

Do you have Nathan's WPA compared to other closers in the league? Also, where did you get the stats? Can you post the WPA for all the Twins players for last season?

Anonymous said...

I'm curious how pitchers' WPA's compare to those of hitters. A pitcher gets credit for outs, but he's got 8 guys behind him to do a lot of the work. A hitter succeeds or fails on his own (I'm ignoring bloops, seeing-eye grounders and errors misscored as hits).

So the fact that Nathan has the highest WPA on the team - is that score biased because he's a pitcher or is WPA normalized for position?

David Wintheiser said...

I'm not really a fan of WPA, not for the reason already stated, but because of another one: it's very arbitrary as to who gets 'credit' in a WPA situation. This is more true on defense than on offense, but some examples should suffice; all of these are from a hypothetical Twins/Royals game at the Metrodome:

(All win expectencies calculated from the Win Expectency Finder at http://winexp.walkoffbalk.com/expectancy/search)

1) Top of seventh, score tied. Neshek starts the inning on the mound. Mark Teahen leads off and swats a hard liner toward the left-center gap which Kubel gets a bad break on. Official scorer rules the play a double.

Win expectency for home team goes from .520 to .391, a loss of nearly 13 points. Does Neshek take the hit? Does Kubel? How do you divide them if you divide them?

Next up is Billy Butler, who lines the ball to Punto, subbing in at second base. Punto is able to reach the bag before Teahen can get back, resulting in an unassisted double-play.

Win expectency for the home team has just gone from .391 to .590, a gain of about 20 points. Who gets the gain? Neshek? Punto? Again, if you split them, how do you split them?

Note as well the inherent lack of fairness in the point assignation; no matter how you try to divvy out the points here, somebody is likely to get screwed (or unduly benefitted); if Neshek takes the full bore for both, he's gone up 7 points for the game, but neither Kubel's bad jump nor Punto's great play on the double play are recognized. The more you try to recognize Punto's play, though, the more of a hit Neshek takes for what, in effect, was a poor play by Kubel. And if you try to assign some blame to Kubel, you have to realize that you're effectively giving Neshek more of the benefit for Punto's play for each point you don't assign him for Kubel's.

And once you figure out how all of that works, go back and check the play-by-play of the actual game and discover that Neshek actually retired both batters on routine plays; a fly to Kubel and grounder to Punto. So do you only try to assign credit to fielders when the play is notable? Or do you still divvy up the points using the same method of division you figured out for the original example?

2) Bottom of the seventh, score still tied. Alexi Casilla leads off the inning with a base on balls. Easy call there: win expectency goes from .609 to .669, meaning six points for Casilla.

Next up is Tyner, the leadoff DH, who smacks a single to right that reaches the wall. Tyner digs for second, Teahen comes up throwing, and Tyner is nailed trying to stretch the single to a double -- but while this play is going on Casilla rounds third and scores.

Win expectency changes from .669 to .797, a gain of nearly 14 points. Who gets it? Tyner, as the hitter? Casilla, as the runner? The third-base coach, for sending Casilla on the throw to second? Oh, and what do you do about Tyner being thrown out at second? Note that if Tyner holds at first, and thus Casilla holds at third, the win expectency is actually .842 (runners at the corners, no out), which means Tyner's advance, though it helped score a run, actually hurt the Twins chances of winning the game. Do you penalize Tyner for that? Or do you chalk it up as a hypothetical (just as in the previous example where the plays that actually occurred were simpler than the plays we hypothesized) and call it a wash?

Some of these examples may be uncommon (the unassisted DP, the stretch of the single in a tie game), but I can't say any of them are unlikely; it's not as though I'm suggesting a scenario from the Knotty Problems of Baseball involving the ball lodging in the pocket of a passing marsupial or something. But if any situation more complex than 'batter walks, batter strikes out, batter hits home run' causes you to question how to divvy the points in this system, and how you divvy the points in this system is everything in this system, then how can you be confident the system works? After all, what if every time Nathan 'blows' a save, the scorer assigns the negative WPA to a fielding play and lets Nathan keep his high score?

(This doesn't even begin to get into the problem that, by its very nature, WPA doesn't carry across different eras well; run the same exercise with the game taking place in 1984 and you'll find that the WPA swings are even larger than they are today (because reduced scoring in the 1960s increased the value of runs and run-producing events, especially in close games, during that era. But this is already starting to get long-winded for a comment on someone else's blog...)

Jack Ungerleider said...

John,

It looks you're saying we shouldn't care about the saves, its the blown saves that are important. One can extrapolate that for starters as well. Its not the number of wins but the number of losses that should be looked at when evaluating the value of a starter.

Which of course leads to the well worn conclusion, "pitchers can't win games by themselves, but they can lose them."

John said...

Anon,

Your with is my command. You can start getting all the info you need here:

http://www.fangraphs.com/winss.aspx?team=Twins&season=2007

John said...

David W

You're absolutely right that WPA simplifies the 'stuff on the field". Generally, UI think the pitcher gets credit if there's an out, and loses credit if there's a runner or a run, and it works the opposite way for the batter. The fielder or the baserunner don't get their proper credit. So it's not perfect.

Of course, there are precious few stats that give that kind of credit to fielder or baserunners, and fewer still (none?) that do so accurately.

So take it as a precise measurement of value (not performance) of the big stuff - hitting and pitching. I think it still has plenty of value that way. And I also think that given Nathan's propensity for generating easy outs (Ks in particular) that he might be deemed even more valuable if those were taken into account.

David Wintheiser said...

Of course, there are precious few stats that give that kind of credit to fielder or baserunners, and fewer still (none?) that do so accurately.

Very true. However, the existing stats do have one advantage WPA doesn't; an advantage you even made a humorous point about in your essay -- they're simpler.

Yes, we do have an offensive stat that recognizes Kubel's hypothetical single-thrown-out-trying-to-advance without directly recognizing Casilla's baserunning: it's called RBI. We have a defensive stat that recognizes Neshek's pitching without directly recognizing Punto's defense: it's called ERA. Why bother with WPA when it's significantly more complicated than RBI and ERA, and gives us only marginally better information than the other stats do?

I wouldn't have a problem with a limited form of WPA replacing saves -- call it a Leverage Index, as some others already do. The problem I have with WPA are the folks who think WPA is mature enough to join the ranks of the global measuring tools like Win Shares and VORP; though WPA can provide us with some interesting insights on occasion (such as the realization that Kubel's hypothetical play above was actually a bad play), it's not even in the same time zone with these other measures (and I say that as someone who's down on VORP nearly as much as I'm down on WPA).

John said...

The problem I have with WPA are the folks who think WPA is mature enough to join the ranks of the global measuring tools like Win Shares and VORP.

Well, I'm biased a bit on this. Those 'refinements' might be valuable for comparing different eras, or playing between ballparks. But in terms of clarifying what my eyes are seeing, I have trouble viewing them as anything more than another observer.

I also think that's like saying that oranges can't join the ranks of apples. They're totally different kinds of stats that measure different things. If one is talking about pure performance, go ahead and use VORP. But if one is talking about value, use WPA.

People get frustrated by WPA because of perceived limitations, but I think that most of that is base on the roto-ization of thought processes surrounding baseball. You can't use WPA for something like predicting players performances in upcoming years, because it's so contextual.

What WPA does, within the limitations of not judging fielding and baserunning, point out whether that performance translated into results for the team.

Nick N. said...

John,

Great article, I'm glad to see someone making an effort to emphasize Nathan's value. In the world of sabermetricians and elitist baseball fans, I feel like people are getting so worked up with expounding how overvalued closers have become that those people are starting to greatly undervalue a great closer.

I've seen so many people argue that the Twins should trade Nathan, reasoning that he could easily be replaced by Neshek. These people are missing on two key points: 1) as good as Neshek is, it would be very difficult for him to be as consistent and dominant as Nathan, who has been otherworldly since becoming the Twins' closer; 2) while Neshek could well do a satisfactory job as closer, losing Nathan would weaken the bullpen to the point where it's not even a strength for this team anymore. It's easy for people to say now that the Twins should dump Nathan because they don't have a legitimate shot at competing for a playoff spot without Santana, but once the season starts those people are going to be incredibly frustrated when late-game leads start slipping away because Gardenhire has to go to Carmen Cali or Julio DePaula in a crucial situation.

TT said...

I think the examples miss the essential problem. The actual chances a team will win or lose depend a lot on the who the base runner is and who the next batter is in each situations.


The problem I have with WPA are the folks who think WPA is mature enough to join the ranks of the global measuring tools like Win Shares and VORP;

You mean it isn't yet complicated enough to completely obliterate any ability to evaluate it? All of these uber-stats are too crude to be useful.

I suspect you like learning new stuff as much as the next guy.

That is probably true. But sometimes you learn more by noticing what's missing. I think that is almost always true with baseball statistics. Their imperfections tell you a lot about the game.

AdamOnFirst said...

I'm going to have to argue with the idea that WPA is useful for objective assessment of performance. It is very interesting, but it is subject to the same level to team-dependency that flawed numbers like RBI and Runs are, perhaps even more so.

brianS said...

Frankly, I don't see what is to "complicated" about the concept of WPA. As John points out:

1) we have some 30 years of play-by-play data that allows us to know the proportion of games ending in wins for the target team (offense or defense), given a particular situation (e.g., top of 5th inning, runners on 1st and 3rd, two outs, visitors leading 1-0).

2) so by comparing two situations (e.g., the afore-mentioned to bottom of 5th, leadoff batter up, visitors leading 1-0), we get a change in win proportions from the intervening event(s).

with the availability of contingency tables, WPA is trivial to measure for any event in a game. so "complexity" really isn't a valid criticism. No magic wands are being waved to calculate WPA. It's just elementary-school arithmetic.

again, as John points out, the controversial part seems to lie in attributing WPA to the players involved in the intervening event. This is a fair question. I like the solomonic approach that Will Young has tended to take on the issue, apportioning credit/blame according to his reading of the context. Unfortunately, this is a highly labor-intensive exercise and would require a LOT of work to standardize and apply broadly.

but my real comment question is this: are there any non-crappy statistics out there in tt's eyes? What could be more basic to baseball than asking "how will the odds of winning this game change if I do "X", given where we are RIGHT NOW?" Isn't that question fundamental (or at least, should be fundamental) to every strategic decision made by players and managers during a game?

TT said...

how will the odds of winning this game change if I do "X"

But that isn't the question being answered is it?

You are answering how often, past tense, the average team won in the average situation meeting some selected criteria - usually the score, the inning, the number of baserunners and number of outs - compared to how often the average team won in some other past average situation.

And you don't even know whether the teams, much less the players, that make up those average situations are average. Its likely they aren't. Some teams and players are probably in certain situations far more often than others.

Just as an example, lets assume Neshek holds that one run lead against the heart of the other team's order. And Nathan gets the last three outs against the seventh, eighth and ninth hitters. Does that change the value of their respective contributions? Of course it does.

The idea that once you determine the inning, the score, the number of outs and where there are runners on base you have accounted for all the important factors in how likely every team is to win is frankly ludicrous.


are there any non-crappy statistics out there in tt's eyes?

Of course there are. Real statistics that measure something that actually happened and the results. Not the results it might have gotten in the alternative universe of somebody's untested statistical model. Are they flawed? Sure, but you can subjectively adjust for whatever flaws they have because they measure results from real situations in real games, not imaginary ones.

Drake33 said...

Real statistics that measure something that actually happened and the results

WPA is not a predictive stat. It measures what happened and assigns a numeric value to it.

Some people like baseball broken down into numbers. Some people would prefer the written description of the play (Punto, unassisted DP) without quantifying it.

But WPA takes the context of the game into account. More than any other counting or ratio stat attempts to do. Is it flawed? Sure. At some basic level all stats are flawed. No one stat is ever going to completely encompass the game.

My guess is that the hard part with WPA is that each at bat is a major event in the computation of it, and it's not a very "intuitional" stat.

John needs to figure out how to describe it in 20 words or less. ;)

Anonymous said...

Most of the comments are to the effect that a lot more goes on in a game than WPA can capture. Well, of course. Even in a system that is completely describable mathematically (which baseball is NOT) any summary statistic involves loss of information. That's the nature of it.

But whether you use WPA or saves or any other measure, you can't escape the conclusion that John started out with: Nathan is a hell of a pitcher.

Get him extended, Bill Smith.

montanatwinsfan said...

Continuing with our "Judging Nathan by WPA" theme, I have a question.

What about the recent struggles we saw from Joe in 2007. In 2004,5 & 6 we could hand the ball off to Joe, turn the ninth inning off and spend a little time with our own little "voices of reason" knowing Joe had saved the game for us.

In 2007 we got similar results, but the innings were much wilder, and frankly only slightly less terrifying than:

a) a bungi jump;
b) having to pick the kids up at the widowed mother-in-law's house when you know she's feeling frisky; or
c) handing the ball off, in the ninth, to Eddie Guardado.

WPA would rank those years the same, but anyone with a brain saw Joe laboring to get those outs in 2007, and it certainly appeared to be an uphill struggle far too often.

How would you explain that anomoly in WPA?

brianS said...

TT sez: You are answering how often, past tense, the average team won in the average situation meeting some selected criteria ... compared to how often the average team won in some other past average situation.

Well, uh, yea. Sort of. Which is exactly what "the book" does for baseball managers, but without any serious effort to quantify experience. WPA contains a much more serious effort to quantify past experience in order to develop analytical context for viewing plays or, potentially, making strategic decisions.

Statistical models are all about reasoning from probability distributions to the real world. Typically, we work from maintained hypotheses about the data generating processes.

You seem to want to argue that almost every statistic employed by baseball technophiles is fundamentally flawed because the models aren't complex enough. In other words, you have your own (untested) theories about the appropriate statistical models in which we don't gain leverage on the events of interest by comparing them to what appear to most analysts to be very similar events that have been played out zillions of times.

At the same time, you seem to want to complain that the more complex measures are junk stats because they are "too complex".

So, which is it? Too complex or too simple?

Real statistics that measure something that actually happened and the results. Not the results it might have gotten in the alternative universe of somebody's untested statistical model.

WPA measures "something that actually happened and the results." It is completely, totally and utterly based on what has happened and the results.

What you do with statistics of any sort is draw inferences. The intended inference of a change in WPA is that the observed event(s) increased, decreased or did not change the focal team's chances of winning the game at hand, and by how much.

One man's junk is another man's fun, I guess.

TT said...

You seem to want to argue that almost every statistic employed by baseball technophiles is fundamentally flawed because the models aren't complex enough.

Not really. I don't dress to go out based on the the average temperature when its 25 below outside. Does that mean the weather forecasting models aren't complex enough? No. Because no one would seriously use the average temperature to predict the weather today.

WPA is not a predictive stat. It measures what happened and assigns a numeric value to it.

It assigns a number based entirely on a theoretical model. A model which relies on the notion that every day is average, every team is average and every player is average with the exception of the player being evaluated.

That is a lot like using the average temperature to determine today's temperature. The difference is that anyone with a thermometer could quickly prove that theoretical model hopelessly flawed for the purpose.

The intended inference of a change in WPA is that the observed event(s) increased, decreased or did not change the focal team's chances of winning the game at hand, and by how much.

An inference which no one who follows baseball would believe and, unlike today's temperature, can't be tested.

Just to put this in perspective for the discussion here. The more likely it is that Joe Nathan will hold a lead in the ninth, the more likely it is that Neshek holding the lead in the 8th will lead to a victory. Both fans and players understand that when watching the game.

The problem isn't the use of statistics, or even statistical models, its their misuse. The average temperature in Minnesota may be in the 50's, but its 20 below out today and no model will convince me a light jacket is sufficient.

Brian said...

1) When you look at individual scenarios for a particular game (e.g. someone makes a great catch) it is easy to find holes with the WPA, that is why you have to look at it over a period of time (e.g. full season) during which you expect the law of averages to even out to a certain extent...

2) It is important to use WPA when comparing to other players of the same position. One difficulty of comparing Nathan to Neshek is that the closer is typically put into more situations (9th inning) where they are in position to gain/lose a larger # of WPA points. Therefore the potential WPA ceiling for successful closers is higher than for other successful relievers

brianS said...

It assigns a number based entirely on a theoretical model. A model which relies on the notion that every day is average, every team is average and every player is average with the exception of the player being evaluated.

That is a lot like using the average temperature to determine today's temperature. The difference is that anyone with a thermometer could quickly prove that theoretical model hopelessly flawed for the purpose.


No. It is a lot like using a thermometer reading to determine whether you should put on a coat. Bazillions of thermometer measurements over centuries have lead us to believe that when the thermometer reads "10 deg. F", it is cold outside and we shouldn't wear a wife-beater and gym shorts to go for a long walk.

The thermometer is a measurement device based on a theory. What we DO with the measurements is draw inferences about how we should best behave.

obviously, temperature is not the only parameter of interest for our comfort. Sometimes 85 deg. F is comfortable, sometimes not so much.

"The book" tells us when a hit-and-run, sacrifice or a steal is warranted, without respect to the individual baserunners, batters and pitchers involved. WPA tells us with MUCH greater (real) precision, because it also allows for an expression of the risks involved.

The decision-maker still has to actually, you know, think, when deciding whether or not to give Matty LeCroy the steal sign or asking Nicky Punto to lay down a bunt. Thinking is not WPA's job.

TT said...

you have to look at it over a period of time (e.g. full season) during which you expect the law of averages to even out to a certain extent...

When you look at the actual data, the "law of averages" doesn't "even out" over a season or even several seasons.tz

TT said...

The thermometer is a measurement device based on a theory.

You are mistaking the measurement for the thing itself. The temperature outside is a physical reality - the number we assign to it is just a way of representing it.

"The book" tells us when a hit-and-run, sacrifice or a steal is warranted, without respect to the individual baserunners, batters and pitchers involved.

What "book" is that?

WPA tells us with MUCH greater (real) precision, because it also allows for an expression of the risks involved.

No, it obviously doesn't do that. It looks at the average results from the decisions that were acutally made. It tells you nothing at all about how different decisions would have effected the average outcomes. And it tells you almost nothing at all about what outcome to expect in a specific situation.

Anonymous said...

TT-
your weather example has a major flaw- to be consistent with WPA your weather metric would need to calculate the average temperature for the specific date, conditions and location, not just an average temp of all days in MN.

TT said...

your weather example has a major flaw- to be consistent with WPA your weather metric would need to calculate the average temperature for the specific date, conditions and location, not just an average temp of all days in MN.

This is way off the original topic. Arguing about how good the analogy is is irrelevant. The analogy isn't perfect, but WPA doesn't consider the specifics of anything. It looks solely at the differences in two averages.

The fact remains that how likely a team is to win when there is a pitching change is based on a lot more than the score, the inning and the number of base runners.

brianS said...

TT said: What "book" is that?

Now you are being deliberately obtuse.

I said: "WPA tells us with MUCH greater (real) precision, because it also allows for an expression of the risks involved."

TT said: No, it obviously doesn't do that. It looks at the average results from the decisions that were acutally made. It tells you nothing at all about how different decisions would have effected the average outcomes. And it tells you almost nothing at all about what outcome to expect in a specific situation.

This is patently false. WPA compares the observed win percentages for all games in the data set with the same starting and ending points as defined by inning, runners, outs and score. There is nothing "average" about it. It is simply a difference between two proportions.

If you choose to ignore thousands of games' worth of experience when reading, watching or thinking about a specific baseball game, that's fine with me.

If you want to criticize WPA as a useless junk statistic, you have to provide some evidence beyond ad hominem attacks and dismissive analogies.

WPA has been used to evaluate marginal contributions (performance) by players. It's admittedly dicey to do that in any team sport. But your reaction to this (and every. other. statistic. ever. mentioned) mystifies me.

TT said...

There is nothing "average" about it. It is simply a difference between two proportions.

I think you need to think about that one, because your "proportions" are in fact averages. The difference between the average number of wins after one set of situations, defined using limited criteria, with the average number of wins after a second set of situations using that same criteria.

How likely the Twins were to win in August 2001 with Latroy Hawkins on the mound has little to do with how likely they are to win with Joe Nathan on the mound. And neither one had anything to do with what happened, on average, with other teams.

Jeff_York said...

Wow...Excellent posts. I have learned a bunch tonight in one place from a bunch of posters (here and elsewhere) that I have been following for quite a while. Makes me think a bit and I do appreciate the civil debate as well. Great job guys. thanks!

brianS said...

I think you need to think about that one, because your "proportions" are in fact averages. The difference between the average number of wins after one set of situations, defined using limited criteria, with the average number of wins after a second set of situations using that same criteria.

Ok, fine. IF you want, treat the past win/loss observations associated with a particular sample of games that reached a particular state. The observed proportion of wins for team X in situation x_1 can then be thought of as an expected value for a binomial distribution associated with the population of x_1 context games. As the number of games in the sample goes to infinity, the binomial PDF converges on the normal PDF with the same expected value (and the EV of the binomial looks more and more like the "average" of the normal).

w00t.

The sample proportion underlying WPA is an exact statement about what has happened in past, similar (by inning, score, runners, outs) contexts. If you want to add further conditions (wind speed, temperature, hair color of the pitcher, whatever) to more exactly compare navel oranges to navel oranges or gala apples to gala apples, be my guest.

Lake Country Blogger said...
This comment has been removed by the author.
BeefMaster said...

How likely the Twins were to win in August 2001 with Latroy Hawkins on the mound has little to do with how likely they are to win with Joe Nathan on the mound.

That's the whole point of the stat - Hawkins in the last half of 2001 would have a far lower WPA than Nathan in 2007, because his team didn't succeed when they would otherwise be expected to.

It looks at the average results from the decisions that were acutally made. It tells you nothing at all about how different decisions would have effected the average outcomes. And it tells you almost nothing at all about what outcome to expect in a specific situation.

All statistics have this limitation - nothing will tell you exactly what to expect. The values of statistics, especially ones like WPA and VORP, are in roster construction and in making broad predictions (season-level, perhaps, in which averages are more useful as evaluators), not in predicting individual game outcomes. Even the worst team in baseball will win 60 or 70 games a year; obviously not everything comes out to the averages every time, and no one is suggesting otherwise.

TT said...

Hawkins in the last half of 2001 would have a far lower WPA than Nathan in 2007,

We don't need WPA to tell us that Hawkins blew more leads than Nathan.

All statistics have this limitation - nothing will tell you exactly what to expect.

Most statistics don't claim to predict the future, they record the past. The problem with WPA is that it isn't actually measuring anything. It is just assigning values to events based on a flawed theoretical model.


The values of statistics, especially ones like WPA and VORP, are in roster construction and in making broad predictions

So whenever anyone uses them, as they were used here, to evaluate an individual player's contribution they are misusing them? Why is it that every discussion of these uber-stats seems to end with a claim that they are really useful for some purpose other than the one they were just used for?