Thursday, April 29, 2010

How Predictive are Stats?

We use stats a lot in our analysis, and I often cringe when they are used to "prove" something will happen. Frankly, I think anyone who has tried to do that and followed up later probably cringes, too. Because it doesn't take long before a few of those publicly proven predictions come back to bite you in the butt.

Besides cringing, I wonder if the stats really prove anything about the future. For instance, we might say that we expect Delmon Young to be a .280-.290 hitter this year because he hit .284 last year. Or that Michael Cuddyer will have a -25 UZR because it was -26.5 last year. Both seem like reasonable expectations. But are those stats truly predictive? Let's find out:

I pulled all players that qualified for a batting title from 2006 through 2009, and if they qualified in two consecutive years, I matched up their stats in a bunch of basic and advanced hitting statistics. Then I ran a statistical test that shows how predictive they are. It ranks each statistic from 1 to 100. 100 means that you can perfectly predict the following year based on the previous year. 1 means that the following year is totally random compared to the previous year.

I also did this for the UZR and UZR/150 fielding metrics, but for them I used a benchmark of 850 innings played at a position. Why 850? Because it came to about the same number of players that qualified for the batting title each year.

Here are the results:

Stat Predictive
K% 91
SO 87
SB 87
Spd 82
BB% 80
BB 76
HR 75
ISO 74
BB/K 74
IBB 73
CS 71
SH 71
1B 70
RBI 69
HBP 65
3B 61
SLG 60
OPS 59
OBP 59
wRAA 57
wOBA 56
wRC 56
H 50
GDP 50
R 49
UZR/150 47
UZR 45
AVG 44
AB 42
PA 37
2B 33
SF 31
G 23

There are some surprises for me in there:
  • I'm surprise that batting average is fairly low. It ranks about the same as FIP does to ERA when we studied that a few months ago. That's also about the same for the correlation between Opening Day payroll and number of wins a team has.
  • I'm surprised that OBP, SLG and OPS are lower than something like HR and RBI. Note to self: when someone says "He's a 20 HR, 85 RBI guy," don't start talking about his OPS.
  • I'm surprised that stats like BB% and K% are so far up at the top of that list. Those are stats that we talk about players improving as they develop plate discipline. It sure doesn't look like that varies very much.
  • I'm surprised that BABIP is so low. I've always heard it is fairly consistent and can be counted on to rebound. This doesn't support that at all.
  • I'm pleasantly surprised that UZR and UZR/150 aren't at the very bottom of the list. I still have some concerns over its limitation and feel it is often misused, but at least it's somewhat consistent.
That's it for this week gang. Feel free to sound off on your own thoughts in the comments below.

Tuesday, April 27, 2010

Twins 2, Tigers 0: Play of the Game

Most of the attention for tonight's victory will (correctly) go to Francisco Liriano and his impressive pitching performance. After all, he went 8 innings on the road against a Detroit Tigers lineup that was desperate to show the Twins that they should be considered competitors this year. He overcame numerous offensive letdowns. And he was simply dominant, announcing to anyone that still doubts him that this winter, spring, and April have not been a fluke.

But for all that effort, the Twins still only led by two runs going into the ninth inning, and seven pitches later the game was seriously in doubt. Johnny Damon had singled on the second pitch and Magglio Ordonez had worked a full count. He bounced that eight pitch to the left side of the diamond, far enough from third baseman Alexi Casilla (who was correctly hugging the third base line) that he wouldn't have had a chance. And in previous years, it was the kind of groundball that a Twins shortstop would've just missed or stopped but not converted to an out.

This was not previous years. JJ Hardy ranged to his right, picked it up and rifled a throw to second base to get the lead runner.

How big was that play? Ironically it was about as big as anything else anyone in the lineup did.

If you look at all MLB baseball games played between 1977 and 2006, there have been 1689 games that the same situation that the Twins had when that play was over: a two-run lead, one out, in the bottom of the ninth with a runner on first base. The visiting team has won 155 of them, or 90.8%

But if that play wasn't made? If it ended with runners on first and second and no outs, the visiting team has won just 72% of the games. So that single play increased the Twins chances of wining that game by 19%.

I'm not going to claim there is a morale to this story. I just thought I should point it out, since I don't think there is a single statistic that is going to do it justice.