Monday, February 18, 2008

Reviewing PECOTA

So, yesterday I didn't post, and to be honest, it just kind of slipped my mind. And then it almost did again today. You would think after doing this for five years, it would sort of be second nature, and for the most part it is, but occasionally, I just kind of forget.

Fortunately, the editor of MNGameday.com, Tom Genrich, provided links to a couple of other awfully well done stories. At the top of the page, he linked to Nick Nelson's solid spring training preview, which I appreciated because I just hadn't taken the time to think about all that stuff. I'd love to give my two cents on a bunch of that stuff, and hope to soon.

And then over at SBG's site, Ubelmann pointed out that Baseball Prospectus' PECOTA standings (subscription required) came out and had the Twins winning 74 games and finishing 4th in the AL Central. Ubelmann thinks that sounds about right. To me, it seems a bit low, but I haven't even done any back-of-the-napkin figuring on that, and we'll need to get to that later. Hey, a that's my second future column idea.

PECOTA, in case you're wondering, stands for Player Empirical Comparison and Optimization Test Algorithm. For those of you who made it through that acronym and are still awake, it is what it says it is - it's a formula for comparing players to previous players based on their performance. The hope is that by looking at how similar players developed, one can predict how current players develop. And then, when you see how close it came to the results, it allows the owner to try and optimize the algorithm for the next year with new data.

You see it referenced more and more, and I'll admit I've become increasingly uncomfortable with the weight it's given. (3rd column idea!) It would seem to be especially difficult to trust it when used for a purpose for which it isn't really intended - predicting the results of teams instead of individual players.

After all, even if it nailed exactly the rates at which each player would perform, the results are still dependent on playing time. And then you need to convert those runs scored and run against to wins and losses. Every layer adds a level of uncertainty that makes the process seem increasingly masturbatory - kinda fun but mostly kind of pathetic. Kinda like blogging.

Anyhoo, in true PECOTA fashion, it seems like there should be a nice empirical test that we can fun to get a sense if we should pay any attention to it, and that is to compare last year's results to what PECOTA predicated. How did it do?
Well, for all my doubts, it didn't do a half bad job. It got three of the four playoff teams in the AL (just missed the Twins) and about three of the four (more or less) in the NL, too. Overall, on the average, it's usually off by a handful of 3.5 to 5 games. It was fairly accurate at predicting the general finish of a team.

On the other hand, it was off by seven games or more for 1/3 of the teams in the league, and seven games is plenty significant when the worst and best team in a league are 30 games apart. The Twins need to hope that PECOTA pegs them as poorly as it did last year - only in the other direction.





6 comments:

neckrolls said...

If you had to make a prediction, I think it'd be fair to guess that the Twins wouldn't win as many games as last year. But there are a lot of variables at play:

Is Kubel's past performance indicative of what he'll do in 2008? How about the newly-slimmed-down Boof?

How many at-bats are they expecting from Punto and Monroe? How many innings from Livan?

And then there's simple, dumb luck (which I think could account for all 6 of the Indians' extra wins last year).

Guess they'll just have to play the games!

Anonymous said...

What do PECOTA, masturbating, and blogging all have in common?

I've always wondered, and now I know!

Mike C said...

So far I have to say that the PECOTA ratings seem pretty solid. Obviously, it can't account for teams that endup having a lot of injuries and such but it seems to be farily accurate.

sploorp said...

Very interesting. PECOTA had them picked 1st. I remember being frustrated with all of the baseball magazine's season previews projecting the Twins 4th. And I still say that they were a much better team last year than their record would lead a person to believe. A lot of things went wrong for them to finish were they did - especially in the last 4-6 weeks. If even just a couple of things went more their way, it could have been a very different season.

I would be curious to see what PECOTA projected the team for 2005 and 2006. How were each of the teams ranked at the start and how did they finish? Did PECOTA see those breakout seasons coming or did they predict disaster that year? I sure wish I could delve into all those team and individual numbers and see why PECOTA came up with the results they did, but that would be way beyond the scope of this blog. But is there anyway you could post PECOTA CL team predictions and results back several years?

Individually, there were a lot of so so years in 2005, then break out years in 2006, followed by injury plagued, sub-par performances in 2007. I guess what I'm looking for is trends. How much do the stats from one season affect PECOTA's predictions for the next year. Did the breakout years in 2006, create a rosier picture for PECOTA in 2007? Likewise, how much is the disastrous 2007 casting a cloud over 2008.

I've also read that PECOTA isn't nearly so accurate at projecting a player's performance as they make the jump from minors to majors. That could be a huge blind spot this year for the Twins.

Scott said...
This comment has been removed by the author.
Scott said...
This comment has been removed by the author.