jump to navigation

Did Run Production Change in 2010? June 2, 2011

Posted by tomflesher in Baseball, Economics.
Tags: , ,
add a comment

Part of the narrative of last year’s season was the compelling “Year of the Pitcher” storyline prompted by an unusual number of no-hitters and perfect games. Though it’s too early in the season to say the same thing is happening this year, a few bloggers have suggested that run production is down in 2011 and we might see the same sort of story starting again.

As a quick and dirty check of this, I’d like to compare production in the 2000-2009 sample I used in a previous post to production in 2010. This will introduce a few problems, notably that using one year’s worth of data for run production will lead to possibly spurious results for the 2010 data and that the success of the pitchers may be a result of the strategy used to generate runs. That is, if pitchers get better, and strategy doesn’t change, then we see pitchers taking advantage of inefficiencies in strategy. If batting strategy stays the same and pitchers take advantage of bad batting, then we should see a change in the structure of run production since the areas worked over by hitters – for example, walks and strikeouts – will see shifts in their relative importance in scoring runs.

Hypothesis: A regression model of runs against hits, doubles, triples, home runs, stolen bases, times caught stealing, walks, times hit by pitch, sacrifice bunts, and sacrifice flies using two datasets, one with team-level season-long data for each year from 2000 to 2009 and the other from 2010 only, will yield statistically similar beta coefficients.

Method: Chow test.

Result: There is a difference, significant at the 90% but not 95% level. That might be a result of a change in strategy or of pitchers exploiting strategic inefficiencies.

R code behind the cut.

(more…)

Micah Owings and Cobb-Douglas Production July 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , ,
1 comment so far

Micah Owings, who is one of the best two-way players in baseball since Brooks Kieschnick, was sent down to the minors by the Cincinnati Reds yesterday. As big a fan as I am of Micah (really, look at the blog), I think this was probably the right decision.

Owings was being used as a long reliever. For a big-hitting pitcher like Micah, that’s death to begin with. Relievers need to be available to pitch, so the Reds couldn’t get their money’s worth from Owings as a pinch hitter, since he wouldn’t be available to re-enter the game as a pitcher unless they used him immediately. They also weren’t getting their money’s worth as a pitcher, since, as Cincinnati.com notes, the Reds’ starting pitching was doing very well and so long relief wasn’t being used very often.

Letting Owings start in AAA will give him the best possible outcome – he’ll have regular opportunities to pitch, so he won’t rust, and he’ll get to bat at least some of the time. Owings needs to be cultivated as a batter because that’s where his comparative advantage is. I doubt he’ll ever be at the top of the rotation, but he could be a competent fifth starter. If he pitches often enough to get there, he’ll add significant value to the team in terms of his OBP above the expected pitcher. He’ll get on base more, so he’ll both advance runners and avoid making an out.

A baseball player is a factory for producing run differential. He does so using two inputs: defensive ability (pitching and fielding) and offensive ability (batting). In the National League, if a player can’t hit at all, he’s likely to produce very little in the way of run differential, but at the same time, if he’s a liability on defense, he’s not likely to be very useful either. Defense produces marginal runs by preventing opposing runs from scoring, and offense produces marginal runs by scoring runs. Having either one set to zero (in the case of a pitcher who can’t hit at all) or a negative value (an actively bad pitcher) would negatively affect the player’s run production. This is similar to a factory situation where labor and equipment are used to produce goods, and that situation is usually modeled using a Cobb-Douglas production function:

Y = K^{\alpha} \times L^{1 - \alpha}

with Y = production, z = a productivity constant, K = equipment and technology, L = labor input, and \alpha is a constant between 0 and 1 that represents relatively how important the input is. K might be, for example, operating expenses for a machine to produce widgets, and L might be the wages paid to the operators of the machine. This function has the nice property that if we think both inputs are equally important (that is, \alpha = .5) then production is maximized when the inputs are equal.

In general, production of run differential could be modeled using the same method. For example:

RD = P^{\alpha} \times F^{\beta} \times B^{1 - \alpha - \beta}

where P = pitching contribution, F = fielding contribution, B = batting contribution, and \alpha and \beta are both between 0 and 1 and would vary based on position. For example, David Ortiz is a designated hitter. His pitching ability is totally irrelevant, and so is his fielding ability outside of interleague games. The DH’s \alpha would be 0 and his \beta would be very close to 0. On the other hand, an American League pitcher would have an \alpha very close to 1 since pitcher fielding is not as important as pitching and his hitting is entirely inconsequential in the AL. Catchers would have \alpha at 0 but \beta much higher than other positions.

The upshot of this method of modeling production is that it shows Owings can make up for being a less than stellar pitcher by helping his team score runs and be a considerably better investment than a pitcher with a slightly lower ERA but no run production.

Modeling Run Production June 19, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , ,
add a comment

A baseball team can be thought of as a factory which uses a single crew to operate two machines. The first machine produces runs while the team bats, and the second machine produces outs while the team is on fields. This is a somewhat abstract way to look at the process of winning games, because ordinarily machines have a fixed input and a fixed output. In a box factory, the input comprises man-hours and corrugated board, and the output is a finished box. Here, the input isn’t as well-defined.

Runs are a function of total bases, certainly, but total bases are functions of things like hits, home runs, and walks. Basically, runs are a function of getting on base and of advancing people who are already on base. Obviously, the best measure of getting on base is On-Base Percentage, and Slugging Average (expected number of bases per at-bat) is a good measure of advancement.

OBP wraps up a lot of things – walks, hits, and hit-by-pitch appearances – and SLG corrects for the greater effects of doubles, triples, and home runs. That doesn’t account for a few other things, though, like stolen bases, sacrifice flies, and sacrifice hits. It also doesn’t reflect batter ability directly, but that’s okay – the stats we have should represent batter ability since the defensive side is trying to prevent run production. The model might look something like this, then:

\hat{Runs} = \hat{\beta_0} + \hat{\beta_1} OBP + \hat{\beta_2} SLG + \hat{\beta_3} SB + \hat{\beta_4} SF + \hat{\beta_5} SH

This is the simplest model we can start with – each factor contributes a discrete number of runs. If we need to (and we probably will), we can add terms to capture concavity of the marginal effect of different stats, or (more likely) an interaction term for SLG and, say, SB, so that a stolen base is worth more on a team where you’re more likely to be brought home by a batter because he’s more likely to give you extra bases. As it is, however, we can test this model with linear regression. The details of it are behind the cut. (more…)