A Defense Of The Value Of Relievers
You can blame this extended column on John Dyer-Bennet, my Calc II professor back in 1985. He’s the guy that instilled in me a very high standard for what is “intuitively obvious.”
Yesterday, rumors heated up nationally about the Washington Nationals’ interest in swapping their closer Drew Storen for Twins center fielder Denard Span. The leading indicator of fan reaction, Twitter, nearly self-combusted. I’d estimate that 90-95% of the reactions varied from “this is a terrible idea” to “the Twins need to get more than Storen.”
There is no doubt that some of that is a knee-jerk reaction to last year’s Matt Capps-Wilson Ramos trade. There are too many similarities to ignore: the Twins acquire the Nationals closer at midseason for a young, cost-controlled, up-the-middle defensive player. And as someone who ripped the hell out of that trade the day it was made, I can sympathize.
But Span is not Ramos, and Storen REALLY isn’t Capps. Comparing Storen to Capps because they’ve both been Nationals closers is akin to comparing gold to lead because they’re both metallic elements. Storen isn’t an average reliever who happened to be plugged into the closer role for the Nationals without throwing up all over himself for three months. He’s the real deal. I’ll let some other blogs (or the comments section) give the statistical breakdown, but if a deal goes down, rest assured that the Twins are getting a good ’un here. For lack of a better comp, think Joe Nathan, with a year less service time.
But the comparison that counts isn’t Storen-Capps, it’s Storen-Span. So let’s compare them.
If you look at the “other stuff” that we pay so much attention to – things like salary, age, service time, contract options, health – there is no doubt that Storen comes out ahead. He’s younger. He has less service time. His money isn’t guaranteed so there is less financial risk. He’s under team control longer. He’ll be cheaper for the next four years (and Span will be a FA by then). I suppose one could argue about the health risks inherent with a pitcher vs. a position player, but Span’s concussion history would seem to balance that out.
On all those fronts, Storen gets the checkmark. To me, that is intuitively obvious.
It also doesn’t appear that Span is any better at his role than Storen is at his. Without a lot of analysis, Span would appear to be better than about 2/3 of the center fielders in the majors. But there is no question that Storen is quite a bit better than 2/3 of the relief pitchers in the majors. That figure might be as high as 90%.
(Dyer-Bennet would hate that last paragraph. But this story is already gonna go extremely long. I’ll take the demerits and move on. If someone wants to challenge it or do the analysis, you can get the extra credit.)
Instead, what is intuitively obvious to everyone – save me, apparently – is that an everyday position player is much more valuable to a team than a relief pitcher. It is so obvious that several Twitter users were flummoxed that I would even ask why they believe that. I was accused of playing dumb or trolling.
But gratefully, some did reply, and I’d like to examine the arguments.
Everyday Players Play And Do A Lot More
We track a lot of statistics for baseball, and relievers usually have the fewest of those statistics. It’s reasonable to suggest this shows a higher level of value for position players.
But comparing the overall value of those stats becomes problematic. First, there is the problem that hitters and pitchers have different statistics: how does an RBI double compare to a scoreless eighth inning? Then there is the problem of context – how did those hits or outs impact the game?
Indeed, measuring the value of players in a single game is problematic, let alone for a full season. For instance, in last night’s 7-1 win, who was more valuable: Brian Duensing (6 2/3 IP, 1 run) or Joe Mauer (2-4, 3R, 2RBI)? You might have your opinion, I might have mine. There is no intuitively obvious answer. How would one measure such a thing?
One way would be to try and measure each player’s impact on a game. You know that Mauer’s single in the fourth inning helped the Twins and impacted the game. You know that Duensing’s scoreless sixth inning impacted the game. But you don’t know exactly how much each impacted the game.
But what if I told you that historically (counting thousands of MLB games), teams that were in the same position as the Twins were when Mauer came to the plate had won 62% of their games. But after that hit, teams in roughly that same position had won 73% of their games. It would be fair to give Mauer credit for that 11%, wouldn’t it?
And what if I said that when Duensing took the hill in the middle of the sixth to protect a 3-run lead, historically teams had won 83% of their games? But that teams who still had a 3-run lead at the end of the sixth had won 89% of their games? Wouldn’t it be reasonable to suggest that Duesning and the Twins defense should get credit for driving that game 6% closer to a win?
And if you’re trying to determine the impact of a player on a season, isn’t it reasonable to add up all those percentages – both positive and negative – and see how a player impacted his team?
This is the theory behind Win Probability Added (WPA).
(And this is where I lose a big chunk of the sabrmetric stats guys. Because while you might think that they would love this stat, my experience is that most of them dislike it. The most common criticism? They don’t like the results. It’s usually expressed by saying something like “But that says that Phil Dumatrait has been more valuable than Carl Pavano!”
And I gotta say, as someone who championed sabrmetric stats closer to their infancy, that reaction makes me want to cry. Bill James talks about how he used to think that once he explained his discoveries to baseball teams, and proved his methods, they would accept them. Instead, they would say something like “That can’t be true – it shows that Darrin Erstad isn’t valuable! He’s a gamer!”
The parallels are obvious. It drives me crazy to think that the high priests who pride themselves on championing baseball research are those most passionate about discrediting stuff like this. I’m not kidding about the wanting to cry thing. I honestly feel a small buzzing below my ears when I hear people say crap like that. For those of you looking for a hot button, you found one.)
Anyway, there are flaws with WPA. One is that it gives credit to the pitcher for the defense behind him, which most traditional sabrmetricians suggest is worth about 1/3 of the value. Obviously, that also means fielders don’t get that credit, either. We’ll try to accommodate that a bit.
Another criticism is that even though WPA tries to value hits and scoreless innings in the context of game, it doesn’t take it far enough. The probabilities reflect average teams and not true probabilities of facing teams. For instance, ideally it would assign a higher probability of holding a lead versus the Royals as opposed to the Yankees.
(There are likely other flaws, too. It took time to uncover some flaws in the Pythagorean Formula, Runs Created, UZR, VORP and WAR. We’ll likely find some more in WPA too, provided we continue to actually study it.)
In terms of impacting the game, Denard Span leads all Twins hitters, having added 84% to the team’s probability in the 56 games he played. If you want to see all the Twins, both hitters and pitchers, you can do so .
And Storen? +236%. Even if we give 1/3 of that credit to his defense, and even if we give Span an extra fifty points for the above average defense he has played in center field, Storen has impacted the Nationals a bit more than Span has impacted the Twins this year. He also has had that impact while being a closer on a team that is five games under .500.
How can that be the case? Because one thing WPA shows is how a manager can leverage the value of player at critical points. Very good relievers can have very high or very low WPA scores because a manager will consistently put them in the right place at the right (or very wrong) time. If they come through, they save the game and increase the probability of winning significantly. If they blow it, they can lose a ton of those probability points. For instance, for the Twins, Glen Perkins is second on the team with +151%. But Matt Capps is near the bottom at –90%. The swings for relievers can be volatile – which bring us to the next point.
Relievers are too volatile to be valuable.
This is the point that makes the least sense to me. If relievers are more volatile than position players, wouldn’t it mean that the relievers who perform are more valuable? There’s a reason that tech stocks that perform are valued sky-high. It’s because tech stocks are volatile, and those that perform are worth a lot more – even more than regular high-performing stocks.
I think what is really meant here is “I don’t trust Drew Storen, because relievers are volatile and Storen is a reliever.” I can’t make you trust Drew Storen. If it makes you feel better, most of the tweets I saw yesterday concerning the trade from Nationals fans were also rending their garments. Apparently they trust him.
Good center fielders are more rare than good relievers.
There is another definition of value beyond impact: rarity. The more rare a commodity, the more valuable it is. I argued this several times during the offseason when berating the Twins for offering arbitration to Matt Capps.
The problem with comparing Span and Storen on that basis is that they’re both exceedingly rare. One doesn’t find 27-year-old center fielders with a career OBP of .366 on the free agent market, and one doesn’t find 23-year-old fireballers with a sub-one WHIP on the free agent market, either. If we did, my best guess is that Storen would probably get a better contract than Span, but I can understand those that are wary of him being overpaid because of his “closer experience.”
But I’m sure about one thing: they’re close to each other in the rarity department. For this exercise, that’s enough.
An everyday player is harder to replace than a reliever.
Usually, this is demonstrated in one of two ways.
The most common is anecdotal. “The Rays signed Juan Cruz to a minor league deal and look what he did for them this year.” Or, more regionally, “Nobody thought Glen Perkins was going to be any good, and look what he did.” Certainly, there are several success stories throughout each season that are similar.
Of course, there are also a lot of disasters, too. There is a reason that at every trade deadline relievers are a hot commodity, and believe it or not, it’s not because every GM of every really good team is too stupid to sign good relievers. It’s because, going back to an earlier point, relievers are really volatile.
When you have a lot of volatile commodities, many are much better than you think they’re going to be and many are much worse. If you only look at the ones that over perform, you feel like an idiot. “Look how that tech stock came through. Why didn’t all the traders pick that? They could’ve had it at a record low price. It was easy. Why are they all such idiots?”
They aren’t idiots – they’re just in the business of picking volatile assets. A bunch of them are going to over perform and look good. A bunch are going to under perform and look dismal. But looking at the good ones and concluding that good tech stocks are easily replaceable is foolish.
The second way is to use a formula like VORP or WARP or something that ends in RP, which stands for “replacement player.” The problem with using those kinds of metrics when evaluating relievers is that it misses the context of what they do. It relies on the number of innings they pitched, and since they don’t pitch many innings, they’re not very valuable. We know that isn’t true because of the importance of the innings that they are put in. They are really, really poor metrics for evaluating relievers.
It isn’t clear to me how to judge what a replaceable player is at each position, at least not in an overnight entry. So instead of looking at a player in relation to a “replacement player” which is supposed to be freely available talent in AAA, let’s look at it in relation to an average MLB player or pitcher.
Certainly, if you use that in relation to WPA, we’re going to get the same result as before. WPA compares both hitters and pitchers performance to how players have historically affected games or to an average player. Storen has improved his chances +236% over an average pitcher. Span has improved his team’s chances of winning +84% over an average hitter, plus he’s saved about 10 runs over an average center fielder. That’s what we came up with before.
If you prefer to use something like runs, I suppose we could compare Span’s runs created to the median center fielders runs created and tack on the defensive runs he saved. Then we could compare Storen’s runs saved to it using Baseball Prospectus’ great little report. But it’s after midnight, and I don’t see that report anywhere on their site right now.
That might show me I’m wrong – that Span’s impact is quite a bit greater compared to an average center fielder than Storen’s is to an average middle reliever. If someone wants to do that and post it somewhere just let me know in the comments below. I’m cooked. Perhaps that is why it is still not obvious that Span (or any effective hitter) is more valuable than Storen (or any effective reliever). Or perhaps it is because it isn’t obvious at all.