Diagnosing the AL December 22, 2010
Posted by tomflesher in Baseball, Economics.Tags: 2010, American League, baseball-reference.com, R, regression, statistics, Year of the Pitcher
trackback
In the previous post, I crunched some numbers on a previous forecast I’d made and figured out that it was a pretty crappy forecast. (That’s the fun of forecasting, of course – sometimes you’re right and sometimes you’re wrong.) The funny part of it, though, is that the predicted home runs per game for the American League was so far off – 3.4 standard errors below the predicted value – that it’s highly unlikely that the regression model I used controls for all relevant variables. That’s not surprising, since it was only a time trend with a dummy variable for the designated hitter.
There are a couple of things to check for immediately. The first is the most common explanation thrown around when home runs drop – steroids. It seems to me that if the drop in home runs were due to better control of performance-enhancing drugs, then it should mostly be home runs that are affected. For example, intentional walks should probably be below expectation, since intentional walks are used to protect against a home run hitter. Unintentional walks should probably be about as expected, since walks are a function of plate discipline and pitcher control, not of strength. On-base percentage should probably drop at a lower magnitude than home runs, since some hits that would have been home runs will stay in the park as singles, doubles, or triples rather than all being fly-outs. There will be a drop but it won’t be as big. Finally, slugging average should drop because a loss in power without a corresponding increase in speed will lower total bases.
I’ll analyze these with pretty new R code behind the cut.
Using R, I fitted time-series models of the same functional form as the home runs per game model. I pulled the data from the Baseball-Reference.com AL Batting Encyclopedia and regressed the variable of interest on a time trend, its square, and a dummy for the designated hitter.
First Assumption: Intentional walks should decrease.
Results:
> ibb.lm <- lm(IBB ~ t + tsq + DH) > summary(ibb.lm) Call: lm(formula = IBB ~ t + tsq + DH) Residuals: Min 1Q Median 3Q Max -0.1350376 -0.0261969 0.0005516 0.0294412 0.1534536 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.656e-01 1.408e-02 18.870 < 2e-16 *** t 8.037e-03 1.199e-03 6.706 1.01e-09 *** tsq -1.393e-04 2.024e-05 -6.882 4.30e-10 *** DH -1.140e-01 1.055e-02 -10.805 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.04689 on 106 degrees of freedom Multiple R-squared: 0.5961, Adjusted R-squared: 0.5847 F-statistic: 52.14 on 3 and 106 DF, p-value: < 2.2e-16 > ibb.2010.fitted <- (2.656e-01) + (8.037e-03)*56 + (-1.393e-04)*(56**2) + (-1.140e-01) > ibb.2010.obs <- .2 > residual.ibb <- ibb.2010.obs - ibb.2010.fitted > se.ibb <- .04689 > residual.ibb/se.ibb [1] 0.750113
Created by Pretty R at inside-R.org
Intentional walks per game increased, but the increase was by less than one standard error. Statistically, intentional walks did not change.
Second Assumption: Unintentional walks should not change.
Results:
> uBB <- (BB-IBB) > ubb.lm <- lm(uBB ~ t + tsq + DH) > summary(ubb.lm) Call: lm(formula = uBB ~ t + tsq + DH) Residuals: Min 1Q Median 3Q Max -0.69256 -0.12758 -0.01390 0.13178 0.77866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.0879505 0.0732669 42.147 < 2e-16 *** t -0.0190285 0.0062392 -3.050 0.002892 ** tsq 0.0003623 0.0001054 3.439 0.000837 *** DH 0.1812598 0.0549094 3.301 0.001313 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2441 on 106 degrees of freedom Multiple R-squared: 0.1876, Adjusted R-squared: 0.1647 F-statistic: 8.162 on 3 and 106 DF, p-value: 6.127e-05 > ubb.2010.fitted <- 3.0879505 + (-.0190285)*56 + (.0003623)*(56**2) + .1812598 > ubb.2010.obs <- 3.25 - .2 > residual.ubb <- ubb.2010.obs - ubb.2010.fitted > se.ubb <- .2441 > residual.ubb/se.ubb [1] -1.187166
Created by Pretty R at inside-R.org
Unintentional walks decreased by a bit over one standard error. Again, that isn’t evidence of a big enough fluctuation to say that it’s statistically different from our expectation.
Third Assumption: OBP drops, but by somewhat less than 3.4 standard errors.
Results:
> obp.lm <- lm(OBP ~ t + tsq + DH) > summary(obp.lm) Call: lm(formula = OBP ~ t + tsq + DH) Residuals: Min 1Q Median 3Q Max -0.0217348 -0.0044903 0.0002799 0.0046695 0.0182481 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.238e-01 2.230e-03 145.199 < 2e-16 *** t -5.703e-04 1.899e-04 -3.003 0.00334 ** tsq 1.472e-05 3.207e-06 4.591 1.22e-05 *** DH 8.245e-03 1.671e-03 4.933 3.02e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.00743 on 106 degrees of freedom Multiple R-squared: 0.487, Adjusted R-squared: 0.4724 F-statistic: 33.54 on 3 and 106 DF, p-value: 2.532e-15 > obp.2010.fitted <- (3.238e-01) + (-5.703e-04)*56 + (1.472e-05)*(56**2) + 8.245e-03 > obp.2010.obs <- .327 > residual.obp <- obp.2010.obs - obp.2010.fitted > se.obp <- .00743 > residual.obp/se.obp [1] -2.593556
Created by Pretty R at inside-R.org
OBP dropped, but it dropped by quite a bit. Without more information it’s hard to judge whether a change of this magnitude is due to better pitching or power being taken away from hitters.
Fourth Assumption: Slugging average will drop.
Results:
> slg.lm <- lm(SLG ~ t + tsq + DH) > summary(slg.lm) Call: lm(formula = SLG ~ t + tsq + DH) Residuals: Min 1Q Median 3Q Max -0.0357646 -0.0087050 -0.0007988 0.0115133 0.0317497 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.937e-01 4.471e-03 88.050 < 2e-16 *** t -2.058e-03 3.807e-04 -5.404 4.04e-07 *** tsq 5.049e-05 6.429e-06 7.853 3.51e-12 *** DH 1.693e-02 3.351e-03 5.054 1.82e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01489 on 106 degrees of freedom Multiple R-squared: 0.6452, Adjusted R-squared: 0.6352 F-statistic: 64.27 on 3 and 106 DF, p-value: < 2.2e-16 > slg.2010.fitted <- (3.937e-01) + (-2.058e-03)*56 + (5.049e-05)*(56**2) + (1.693e-02) > slg.2010.obs <- .407 > residual.slg <- slg.2010.obs - slg.2010.fitted > se.slg <- .01489 > residual.slg/se.slg [1] -3.137585
Created by Pretty R at inside-R.org
A drop in slugging average of over three standard errors indicates that we may be working with something that’s ruined hitters’ power or that’s hurt their ability to hit in general. We have results that are consistent with either something harming power hitters specifically or hitters in general.
This isn’t evidence of steroid use. In fact, the same results would be consistent with a shift toward pitching talent. More work needs to be done on this year’s data before conclusions can be drawn. However, it does seem to indicate that, at least in the American League, the Year of the Pitcher narrative has some statistical foundation.
Comments»
No comments yet — be the first.