Wednesday, April 20, 2011

Sabrmetrics 101: Runs Created

How many runs is Joe Mauer worth?

When Bill James showed how runs could be cleanly converted to wins and losses (using the Pythaogrean Formula from last week), it opened up the door for studying the impact pitchers have on wins and losses. After all, we measure pitchers in runs - earned runs usually, but runs, nonetheless. It's fairly trivial to swap out one pitcher out for another, compare their ERAs, and estimate how many fewer runs one would give up versus another. Now that we can convert those runs to wins, it was easy to estimate the impact that would have on the team.

But how do you do the same for batters? Count their RBI? Their runs? Some average?

James studied the problem by backing away from it. Instead of looking at individual players, he looked at teams. Could you guess how many runs a team would produce given some of their other statistics? What he found shaped a generation of analysis. Just like predicting wins, it was a fairly simple formula - so simple that what was striking was what was NOT in it.

If you added up all the total bases for the Twins last year, you get 2347. (By total bases, I mean one base for each single, two for each double, three for each triple, and four for a home run.) Those bases were the result of 1521 hits. There were also 559 walks and 5568 at-bats. What James found is that with that info, he could estimate the total runs the Twins - or any other team –would score. You just do the following:

1. Add up the H and BB (essentially the number of times a team got on base).
2. Multiply that by the total bases that team had (essentially the power the team displayed).
3. Divide by the sum of AB plus BB (essentially the plate appearances the team had).

He called this value Runs Created. Go ahead and figure it out for the Twins. I’ll be over here playing Angry Birds.

The Twins actually scored 781 runs, sixteen runs less than what you just figured out using James’ formula, so it was off by about 2%. If you go through the whole American League last year, every team was within 10% of their Runs Created. Only four teams were not within 5%.

Now you have a way to measure runs for hitters, because players have these stats, too. And if a team of players can produce that many runs with those stats, it seems equitable to award each player with the stats for which they were responsible. So in 2010, if Joe Mauer tallied 239 total bases, 167 hits, 65 walks in 510 at-bats, one could figure out how many "runs created" he was worth:

(167 + 65) * 239 / (510 + 65).

(I won’t ruin the surprise, though I suppose someone in the comments could. Figure it out and you’ll see why sabremetric guys tend to like Joe Mauer a lot. ESPECIALLY if you take the time to figure out how many Runs Created Drew Butera is worth.)

The most striking thing about that formula is what it does NOT contain. No stolen bases. No clutch hitting. No bunting, no moving the runners over, no little things. It contained two things - getting on base and hitting for power. In fact, you could even rewrite the formula to include the two stats that MEASURE getting on-base and power:

On-Base Percentage * Slugging Percentage * At-bats

So where is that other stuff? It has to be somewhere right? Well, James revised his formula to add stolen bases, and then to add being hit by a pitch. And then others took the same idea and started adding additional factors to it, and this is where a good chunk of the alphabet soup that plagues sabremetrics came from. Each attempt was to get a little bit better at predicting a team’s total runs, and then apply that formula to individual players.

But it didn’t stop with just being more precise with more stats. The next step was comparing the impact of hitters to pitchers. Or hitters of one era (where power might be more plentiful) to another era (where speed or getting on base was more prevalent). Or trying to add defensive ability. Or major leaguers to minor leaguers. Or to include some defensive impact.

But these formulae almost all have Runs Created and Pythagorean Formula deep inside them as their engine:

1. On base times power equals runs.
2. Runs can be converted to wins.

For better or worse, these are two of the cornerstones upon which a good chunk of sabremetric study is built.