Tuesday, October 16, 2007

Pythagorean Limits

This will shock you, I know, but I love mathematical magic tricks.

You know the kind I'm talking about. You'll get an email that tells you to pick a number a number and then add 30 and square it and add pi and by the end of the email, it's telling you you your birthday. I love that stuff.

And in some ways, that's what baseball's pythagorean theorem is. (If you visit this site, you likely know what I'm talking about, and if you don't, this is a nice primer.) You square a few numbers and presto-change-o, you have a win-loss record. It's magical. And we all know from history what happens to anything magical.

That's right. It becomes misused. Because those that know the tricks like the attention or the power, and those that don't can become pretty gullible. That's probably overstating what is happening now, and it is surely twisting the motives. But baseball's pythagorean theorem has some limits that have become convenient to ignore for its practitioners. I'd like to go through a couple of those over the next couple days, starting with:

1. It ain't gospel.

You will undoubtedly see, at some point in next year's season previews, someone pointing to a team's pythagorean theorem as proof that they will be worse or better next year. For instance, the Braves underperformed their pythagorean record last year, so they're really a better team than you might have thought. This conclusion is justified because (as is often stated) a team's pythagorean theorem is a better indicator than a team's actual record as to their true ability.

And it is, but just barely. Here you'll find GameDay's study that looks at team over the last ten years. The spreadsheet compares each team's record to:
1) it's record the previous year and
2) it's pythagorean record the previous year.

The results are consistent with what every study I've read about this. The pythagorean record has a slightly higher correlation than the actual record, but not by much. In our study, the comparison was .582 to .555. That's a difference that is technically known in the statistical community as "pretty much a wash".

Or, in other words, flip a coin as to which one you want to pay more attention to.

Which doesn't mean that you should ignore a team's pythagorean record. Just understand that the team also has a real record, and it's nearly an equally valid indicator of how good that team is. So don't treat is as gospel. Or even a minor miracle.

It's just a neat magic trick.


Anonymous said...

"prettu much a wash"="pretty much meaningless"

The problem here is that neither one is particularly good at predicting the next year's record. The difference between the two is essentially irrelevant. Its measuring inches when all the other measurements are in miles.

Jack Ungerleider said...

First things first, all I know about most of these obscure statistical methods is what I read here. I did read the primer that John linked to above.

What I came away from the primer with was that given enough data and enough computing power you can squeeze any set of data onto any line. (Or into any other desired pattern.) This is no great revelation. None of these calculations can be used as a prediction tool. The only way you can use one to predict a future outcome is to insure that the inputs are constant, within a reasonable range. I would suggest that this is as possible with baseball as it is with the equity trading markets.