Thursday, April 29, 2010

How Predictive are Stats?

We use stats a lot in our analysis, and I often cringe when they are used to "prove" something will happen. Frankly, I think anyone who has tried to do that and followed up later probably cringes, too. Because it doesn't take long before a few of those publicly proven predictions come back to bite you in the butt.

Besides cringing, I wonder if the stats really prove anything about the future. For instance, we might say that we expect Delmon Young to be a .280-.290 hitter this year because he hit .284 last year. Or that Michael Cuddyer will have a -25 UZR because it was -26.5 last year. Both seem like reasonable expectations. But are those stats truly predictive? Let's find out:

I pulled all players that qualified for a batting title from 2006 through 2009, and if they qualified in two consecutive years, I matched up their stats in a bunch of basic and advanced hitting statistics. Then I ran a statistical test that shows how predictive they are. It ranks each statistic from 1 to 100. 100 means that you can perfectly predict the following year based on the previous year. 1 means that the following year is totally random compared to the previous year.

I also did this for the UZR and UZR/150 fielding metrics, but for them I used a benchmark of 850 innings played at a position. Why 850? Because it came to about the same number of players that qualified for the batting title each year.

Here are the results:

Stat Predictive
K% 91
SO 87
SB 87
Spd 82
BB% 80
BB 76
HR 75
ISO 74
BB/K 74
IBB 73
CS 71
SH 71
1B 70
RBI 69
HBP 65
3B 61
SLG 60
OPS 59
OBP 59
wRAA 57
wOBA 56
wRC 56
H 50
GDP 50
R 49
UZR/150 47
UZR 45
AVG 44
AB 42
BABIP 37
PA 37
2B 33
SF 31
G 23

There are some surprises for me in there:
  • I'm surprise that batting average is fairly low. It ranks about the same as FIP does to ERA when we studied that a few months ago. That's also about the same for the correlation between Opening Day payroll and number of wins a team has.
  • I'm surprised that OBP, SLG and OPS are lower than something like HR and RBI. Note to self: when someone says "He's a 20 HR, 85 RBI guy," don't start talking about his OPS.
  • I'm surprised that stats like BB% and K% are so far up at the top of that list. Those are stats that we talk about players improving as they develop plate discipline. It sure doesn't look like that varies very much.
  • I'm surprised that BABIP is so low. I've always heard it is fairly consistent and can be counted on to rebound. This doesn't support that at all.
  • I'm pleasantly surprised that UZR and UZR/150 aren't at the very bottom of the list. I still have some concerns over its limitation and feel it is often misused, but at least it's somewhat consistent.
That's it for this week gang. Feel free to sound off on your own thoughts in the comments below.

14 comments:

Eric B. B. said...

Perhaps BABIP is consistent long term, but can vary wildly from year to year?

David Wintheiser said...

Kind of ironic that you should ask the question of how predictive stats are, and then use a stat (the correlation coefficient multiplied by a factor of 100, if I don't miss my guess) to try to answer it. : )

In seriousness, though, a correlation coefficient isn't really what we're looking at to talk 'consistent' here -- BABIP isn't consistent in the sense that a player tends to have the same BABIP every year, but rather that a player who finishes with an above-mean or below-mean BABIP will almost invariably regress toward the mean in the following year. That's not really what you're measuring with a correlation coefficient.

Ideally, what any projection system tries to identify are the two 'holy grails' of sabermetric analysis:

1) What's this player's mean level of performance, as measured statistically?

2) Is the player likely to exceed, meet, or fall below that mean level of performance based on known factors (age, past injury history, park factors, etc.)?

(The third holy grail is the correlation of individual performances to team success, which is the current new hotness.)

Most importantly, just because a stat can vary wildly doesn't mean it isn't significant -- even though games played is the least 'consistent' stat in your analysis, I doubt anybody would argue that the number of games a player participates in doesn't matter.

Chiasmus said...

Agree with both of the previous comments. I think the key issue here is that we're often trying to get at a player's "true talent"--what they would do on average if you could play the season over and over a thousand times. Statistics that are less variable from year to year are likely to be more reflective of true talent and less reflective of luck.

So the whole reason why we use K% and BB% to indicate that a player has progressed is precisely because they're not that variable from year to year--a big change in those numbers probably means a real change in the player's abilities and not just random chance. And as the others said, BABIP is a good diagnostic for flukey seasons because it's so variable and hence prone to outlier seasons followed by heavy regression toward the mean.

TT said...

"Statistics that are less variable from year to year are likely to be more reflective of true talent and less reflective of luck."

That simply isn't true. You are talking about volatility from year to year and any number of things having nothing to do with talent effect that. The number of left and right handed pitchers they faced for instance.

The main problem with this analysis is that it is not using a random sample. There is little doubt that whether someone gets enough plate appearances to qualify for the batting title depends on their performance. So you are starting with a subset managers have likely selected for their consistency from year to year. The limited range of PA and G of that small sample probably explains the low correlation.

Those players that get consistent plate appearances are also likely going to be used consistently in other ways. Its also probably more likely that they veterans who are consistent in their approach rather than young players who are still making adjustments.

In short - how predictive are they? Not very. You have simply selected for the players who are consistent. I would bet that well less than half the players who appeared in the big leagues last year were in your sample.

Jack Ungerleider said...

Is a major league baseball player consistent from year to year?

The answer of course is... it depends. I agree with the comment that you are looking at a pre-screened sample here.

For example you have anecdotal evidence that if Torii Hunter's BB% increased last season it was the impact of batting behind Bobby Abrieu. (Search it, Torii said that in an interview.) So you also have the randomization of who is on the team. For example: Does Mauer get better pitches when Morneau is batting behind him than when Cuddyer or Kubel does? Does the presence of Jim Thome in the lineup effect the way the hitters in front and behind are approached?

The number of variables involved makes me think of weather predictions. The longer range the prediction the less accurate it is. I suspect that the stats might be useful in planning match-ups going into a series, like the one starting tonight in Cleveland. Current trends and past performance can be used to make a prediction on how someone will perform in the specific situation. The manager and coaches use it all the time as the decide who to put into the lineup and who to call in from the bullpen.

And just like the local Meteorologists they cross their fingers and hope they are right.

toby said...

I always explain the deal with BABIP like this: there's lots of luck in it (a lot of which boils down to the PRECISE trajectories a guy's batted balls take and whether there were fielders positioned appropriately), but there's still an underlying true skill (in large part, how hard you tend to hit the ball), thus the statistically signficant but not dazzling correlation coefficients. You look at all-star, HOF type guys who rake year after year and their career BABIP is gonna be higher than the average bear's career BABIP.

When people first started writing about this they found the year-to-year correlation to be around .30 while a 2-year BABIP number correlated with a 3rd year's BABIP to the tune of .37 or so. Breaking down BABIP by batted ball types and looking a little longer term to get enough data (three year BABIPs predicting a 4th year BABIP), ground ball BABIP (where speed, a near-constant, is critical), BABIP on grounders had the highest correlation while BABIP on line drives was second, which jibes with the intuitive idea that a better hitter hits harder line drives than Nick Punto, and harder line drives will fall for hits more often.

But as Mr. Wintheiser points out, it's the fact that BABIP is relatively random that makes people say "this guy's numbers are flukey, there's every reason to suppose they'll level out when his BABIP stops being [randomly] weird." As Matt Swartz at BP put it: "It’s very clear who is good at making contact, and who is good at hitting home runs, but it is harder to know who is good at getting hits on balls in play. That’s because the difference between the best and worst hitters at BABIP is much smaller [than the difference in HR or K rates], which explains why the year-to-year correlation of a batter’s HR/AB is about .74, and the year-to-year correlation of a batter’s K/PA is about .84, while the year-to-year correlation between a hitter’s BABIP is only about .37."

When looking to see if something's changed in a guy's approach that might make me believe an observed BABIP change is "real" (that is, the underlying skill level we all infer to be driving that .30 correlation), I like to look at his batted ball types. 3 year GB% correlates to 4th year at .77; 3 year OFFB% to 4th is .80; even LD% (a one-year lucky change in which can drive a "lucky" BABIP) correlates at .39. If something major changes in a guy's GB/FB make-up (and, to a lesser extent, in his LD%), I figure there's definitely something afoot with his overall approach.

It's something I'm keeping a close eye on in terms of one D. Young -- in spring training he was driving the ball and hardly hitting ANYTHING on the ground, which is the opposite of what he's done in his career, and while that hasn't carried over so much in the regular season, so many other things have changed I'm convinced I wasn't totally off base to be excited. His 2010 BABIP on GBs is anemic right now yet it's obvious he's two steps faster down the line than he was, so I see good things in the future for his batting average.

FWIW, the single "neatest" insight I've found is that strikeout rate actually correlates with a HIGHER babip, because most guys who whiff a lot swing harder and thus hit the ball harder when they do make contact. A lot of the newest xBABIP models for hitters use K/BB ratio, which if I had the statistical chops and data access and whatnot I'd love to delve into as I've got a hunch the correlation they find using that ratio is due to the BB component and is WEAKENED by throwing the strikeouts into the mix -- unless the earlier research that just looked at K rate was ALL screwed up.

TT said...

"
FWIW, the single "neatest" insight I've found is that strikeout rate actually correlates with a HIGHER babip, because most guys who whiff a lot swing harder and thus hit the ball harder when they do make contact. "

FWIW - your explanation is pure speculation. The fact that someone who strikes out more has a higher babip is a statistical artifact. Strikeouts aren't balls in play. So if you compare two hitters with the same number of hits (other than home runs), the one who strikes out more often will have a higher babip since he has fewer balls in play per hit.

toby said...

I didn't say anything about comparing two players with identical numbers of hits on balls in play but differing numbers of strikeouts. I said that across all MLB hitters, strikeout rate and BABIP have been (repeatedly, by multiple researchers) found to have a positive correlation. You can assume this is pure happenstance or you can assume there's an underlying reason that accounts for the correlation and makes sense, like "people who swing harder tend to (1) miss more balls and (2) hit the ball harder when they do." Of course it's always technically "speculation" to proffer a real-world explanation for a statistical correlation, but I don't see how that's a meaningful criticism.

BeefMaster said...

It makes sense that RBI would be high on the list - players, especially (as TT pointed out) guys who are playing enough to quality for the batting title, are at least somewhat likely to stay in the same position in the same batting order from year to year. Justin Morneau, even in an off year, is still a powerful hitter batting behind Denard Span and Joe Mauer - he's still going to drive in a ton of runs. Span, even in a great year, is still a high-OBP/low-SLG hitter batting behind Nick Punto - he's not going to drive in a ton of runs.

Moreover, I think that actually lends credence to the "RBI are largely luck" school of thought - if RBI are more consistent for a player than their actual underlying production (BA/OBP/SLG), that seems to be a sign that there are other factors that are very important in determining the number of runs a player will drive in.

TT said...

"I said that across all MLB hitters, strikeout rate and BABIP have been (repeatedly, by multiple researchers) found to have a positive correlation. You can assume this is pure happenstance or you can assume there's an underlying reason that accounts for the correlation"

Or you can understand that it is a necessary statistical result. Since the more someone strikes out the fewer balls they put in play, it would be odd if there was no correlation. It tells you nothing at all about the balls they do put in play.

"I think that actually lends credence to the "RBI are largely luck" school of thought "

Beefmaster, do you really think it is "largely luck" that Morneau is batting cleanup? Its obvious his performance has a lot to do with that.

No baseball results are from purely random situations. There is constant management of every possible detail of every situation to optimize outcomes. The question is whether we are comparing what actually happened or what might have happened in some artificial alternative universe.

BeefMaster said...

Perhaps "circumstance" would be a better word choice than "luck" - obviously, better hitters are, in general, placed in situations in which they are more likely to drive in runs - but that doesn't really change the gist of my statement or John's initial comment. John's point was that since RBI are dependent on circumstance and not purely driven by a player's raw hitting results, he was surprised that they had such a strong correlation from year to year. My point was that this should not be entirely unexpected, because a player's circumstances (particularly team and general lineup position) are also fairly likely to be similar from year to year, leading to a decent chance of similar RBI numbers.

On an unrelated note, you're still missing toby's point - to say that "It tells you nothing at all about the balls they do put in play" is completely wrong, because BABIP only counts balls in play (hence the last three letters of the acronym). According to the research he's familiar with, players who strike out more often generally maintain a higher average on the balls they do put into play than players who strike out less often.

Anonymous said...

No, TT, a positive correlation between a batter's strikeout rate and BABIP is NOT "a necessary statistical result." (Even your own next sentence argues against it: if it were indeed a NECESSARY result, it wouldn't be "odd" for there to be no [positive] correlation--it would be impossible.)

If you misunderstood the definition of the statistic, then BeefMaster's last post should have cleared things up. Another possibility, suggested by your previous post, is that you are operating from some assumption about strikeouts coming (at least partly?) at the expense of other outs. But that's an empirical question--there's nothing necessarily true about it. It would be quite possible for hitters who strike out more to be weak hitters overall, in which case the correlation between strikeouts and BABIP would be negative. Case in point: National League pitchers.

Maybe it's not surprising that there aren't enough players fitting that profile to produce an overall negative correlation; aside from pitchers, we're talking about "good field, no hit" guys, and most teams don't want more than a couple of those. But again, this is in the realm of empirical questions, not statistical necessities.

TT said...

"No, TT, a positive correlation between a batter's strikeout rate and BABIP is NOT "a necessary statistical result." (Even your own next sentence argues against it: if it were indeed a NECESSARY result, it wouldn't be "odd" for there to be no [positive] correlation--it would be impossible.)"

You will need to explain how a necessary result is also impossible.

"is that you are operating from some assumption about strikeouts coming (at least partly?) at the expense of other outs. But that's an empirical question--there's nothing necessarily true about it."

That depends on what you consider necessary. For a player to stay in the major leagues they need to get a minimum number of hits per plate appearance. A player who both strikes out a lot and fails to get hits on balls in play is quickly taken out of the pool of major league players. Unless, as you point out, they are pitchers. Of course that isn't true for every player, but it will be true often enough to virtually guarantee there will be a statistical correlation between players with high strike out rates and high BABIP.

AMusingFool said...

Anonymous said: "No, TT, a positive correlation between a batter's strikeout rate and BABIP is NOT "a necessary statistical result." (Even your own next sentence argues against it: if it were indeed a NECESSARY result, it wouldn't be "odd" for there to be no [positive] correlation--it would be impossible.)"

TT replied: You will need to explain how a necessary result is also impossible.

You said it was odd if there were no correlation, after saying it was impossible to not have a correlation. Now do you see the problem?

TT also said: "That depends on what you consider necessary. For a player to stay in the major leagues they need to get a minimum number of hits per plate appearance."

So, what you're saying is that the only way to contribute to a baseball team is to get singles with a certain frequency? You already busted that by admitting pitchers contribute in other ways. You might also recall that having a very good glove, being extremely fast, hitting the ball particularly hard, walking to first base frequently are all ways to keep yourself employed at the major league level. Jose Canseco, for instance, stayed in MLB a lot longer than one would expect based purely on his hit rate.

Theoretically, an exceptional ability to foul pitches off (not that I've ever heard of such) might also.

Anyway, getting back to the main post: a cool idea to measure these things. Anyone thinking of building their own projection system should definitely know this.