##
What is the effect of the Designated Hitter? *May 30, 2010*

*Posted by tomflesher in Baseball.*

Tags: baseball-reference.com, designated hitter, R, regression

trackback

Tags: baseball-reference.com, designated hitter, R, regression

trackback

Intuitively, the designated hitter rule seems like it should increase scoring. By getting on base more often than the pitcher would have, the designated hitter helps produce runs by hitting, by being on base so that other players can drive him in, and by not accumulating outs by bunting or striking out as often as the pitcher does. However, there should be a corresponding effect from having pitchers left in the game longer: a better pitcher who remains in the game might get more outs than a reliever who came in simply because the manager pinch-hit for the starting pitcher because he needed offense.

Behind the cut, I’ll explain the testing I did to determine whether the effect of a DH is positive (hint: it is) and look at how big an effect is actually there.

MLB is the perfect setting for natural experiments about the DH rule for obvious reasons – the American League uses it, the National League doesn’t, and the talent pool is exactly the same. There are very few restrictions on player transfers between the leagues, so players are probably as good as randomly assigned to the leagues. With that in mind, if there is a difference between the leagues, then it can probably be attributed to the DH rule.

Using Baseball-Reference.com, I pulled this dataset of batting by league from both leagues from1955 on (with 1955 chosen because it’s the first year that all of B-R.com’s data was available). I changed Year to t and subtracted 1954 so that I could do a trend analysis and added a binary variable called “DH” that took value 1 if the Designated Hitter rule was used and 0 otherwise. Assuming the leagues are otherwise identical, my null hypothesis is that ; that is, the effect of the DH rule is nonexistent.

I used R to run the following regression on the data:

and got the following results:

Call:

lm(formula = OBP ~ t + tsq + DH)

Residuals:

Min 1Q Median 3Q Max

-0.0219984 -0.0041721 0.0003126 0.0048915 0.0187776

Coefficients: | ||||

Estimate | Std. Error | t value | Pr(>|t|) | |

(Intercept) | 0.323100 | 0.002243 | 144.055 | < 2e-16 *** |

t | -0.000470 | 0.000188 | -2.503 | 0.013827 * |

tsq | 0.000013 | 0.000003 | 4.039 | 0.000101 *** |

DH | 0.008036 | 0.001677 | 4.793 | 5.27e-06 *** |

The *** suffix indicates significance at the 99% level. A Breusch-Pagan test for heteroskedasticity returned a BP stat of 3.0789 and a p-value of .3796, which means we cannot reject the null hypothesis of homoskedasticity (that is, the tests work for this data).

Across MLB, OBP is increasing with time, and the DH rule adds roughly .008 to the league’s average OBP after accounting for an increasing time trend in OBP. .008 is roughly .8%, meaning you’d get slightly less than one additional trip to first in 100 plate appearances. Assuming a leaguewide mean of 38.5 plate appearances per team per game, that translates to about .3 extra trips to first per game.

Link to the dataset appears to be broken.

Sorry about that. I recently switched hosts and WordPress.com is somewhat restrictive in the way files can be shared. I’m in the process of fixing the old links – the data should be available now.