Did Run Production Change in 2010? June 2, 2011
Posted by tomflesher in Baseball, Economics.Tags: Chow test, run production, Year of the Pitcher
trackback
Part of the narrative of last year’s season was the compelling “Year of the Pitcher” storyline prompted by an unusual number of no-hitters and perfect games. Though it’s too early in the season to say the same thing is happening this year, a few bloggers have suggested that run production is down in 2011 and we might see the same sort of story starting again.
As a quick and dirty check of this, I’d like to compare production in the 2000-2009 sample I used in a previous post to production in 2010. This will introduce a few problems, notably that using one year’s worth of data for run production will lead to possibly spurious results for the 2010 data and that the success of the pitchers may be a result of the strategy used to generate runs. That is, if pitchers get better, and strategy doesn’t change, then we see pitchers taking advantage of inefficiencies in strategy. If batting strategy stays the same and pitchers take advantage of bad batting, then we should see a change in the structure of run production since the areas worked over by hitters – for example, walks and strikeouts – will see shifts in their relative importance in scoring runs.
Hypothesis: A regression model of runs against hits, doubles, triples, home runs, stolen bases, times caught stealing, walks, times hit by pitch, sacrifice bunts, and sacrifice flies using two datasets, one with team-level season-long data for each year from 2000 to 2009 and the other from 2010 only, will yield statistically similar beta coefficients.
Method: Chow test.
Result: There is a difference, significant at the 90% but not 95% level. That might be a result of a change in strategy or of pitchers exploiting strategic inefficiencies.
R code behind the cut.
The AL 2000-2009 regression is in the linked post.
> al2010.lm <- lm(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) > summary(al2010.lm) Call: lm(formula = R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) Residuals: 1 2 3 4 5 6 7 8 9 10 1.6811 -0.4015 -4.7134 -4.4748 2.3670 -0.6163 10.9223 -5.9141 3.6778 8.7488 11 12 13 14 -6.5535 -1.8978 -1.3545 -1.4711 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -143.38548 216.54661 -0.662 0.5760 H -0.02621 0.14614 -0.179 0.8742 X2B 0.39989 0.22205 1.801 0.2135 X3B 2.64850 1.11995 2.365 0.1418 HR 1.60310 0.18286 8.767 0.0128 * SB -0.59883 0.32574 -1.838 0.2074 CS -0.34655 0.92426 -0.375 0.7437 BB 0.34067 0.18183 1.874 0.2018 SO -0.03886 0.09155 -0.424 0.7125 HBP 1.68038 0.76545 2.195 0.1593 SH 5.12139 1.71244 2.991 0.0960 . SF 2.72034 1.26899 2.144 0.1653 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 13.15 on 2 degrees of freedom Multiple R-squared: 0.997, Adjusted R-squared: 0.9802 F-statistic: 59.46 on 11 and 2 DF, p-value: 0.01665 > al2010.aov <- aov(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) > summary(al2010.aov) Df Sum Sq Mean Sq F value Pr(>F) H 1 36454 36454 210.6561 0.004714 ** X2B 1 22048 22048 127.4102 0.007757 ** X3B 1 6688 6688 38.6481 0.024912 * HR 1 33354 33354 192.7387 0.005148 ** SB 1 5595 5595 32.3342 0.029562 * CS 1 1 1 0.0084 0.935262 BB 1 7329 7329 42.3497 0.022808 * SO 1 52 52 0.2983 0.639733 HBP 1 10 10 0.0592 0.830500 SH 1 855 855 4.9422 0.156254 SF 1 795 795 4.5955 0.165277 Residuals 2 346 173 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Created by Pretty R at inside-R.org
> altotal.lm <- lm(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) > summary(altotal.lm) Call: lm(formula = R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) Residuals: Min 1Q Median 3Q Max -51.037 -13.182 -3.234 14.527 48.900 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -423.13734 54.39431 -7.779 1.36e-12 *** H 0.47931 0.03674 13.044 < 2e-16 *** X2B 0.25617 0.09160 2.797 0.005879 ** X3B 1.03417 0.24231 4.268 3.58e-05 *** HR 0.95164 0.06822 13.950 < 2e-16 *** SB 0.15907 0.08224 1.934 0.055070 . CS -0.37668 0.22847 -1.649 0.101428 BB 0.35750 0.03157 11.324 < 2e-16 *** SO -0.05014 0.02363 -2.122 0.035574 * HBP 0.55322 0.14781 3.743 0.000264 *** SH 0.44079 0.19665 2.241 0.026546 * SF 0.65236 0.26675 2.446 0.015684 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 23.56 on 142 degrees of freedom Multiple R-squared: 0.9248, Adjusted R-squared: 0.919 F-statistic: 158.8 on 11 and 142 DF, p-value: < 2.2e-16 > altotal.aov <- aov(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF) > summary(altotal.cov) Error in summary(altotal.cov) : object 'altotal.cov' not found > summary(altotal.aov) Df Sum Sq Mean Sq F value Pr(>F) H 1 595470 595470 1072.4898 < 2.2e-16 *** X2B 1 33266 33266 59.9146 1.689e-12 *** X3B 1 8866 8866 15.9678 0.0001031 *** HR 1 214542 214542 386.4084 < 2.2e-16 *** SB 1 4296 4296 7.7368 0.0061460 ** CS 1 5465 5465 9.8435 0.0020727 ** BB 1 90402 90402 162.8211 < 2.2e-16 *** SO 1 3049 3049 5.4914 0.0204978 * HBP 1 7850 7850 14.1386 0.0002475 *** SH 1 3412 3412 6.1461 0.0143408 * SF 1 3321 3321 5.9810 0.0156843 * Residuals 142 78842 555 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > numerator <- (78842 - (346 + 68548))/11 > denominator <- (346 + 68548)/(140 + 14 - 22) > numerator/denominator [1] 1.732749
Created by Pretty R at inside-R.org
Chow’s test yields an F value of approximately 1.73. On 11 and 132 degrees of freedom, the critical value for 90% significance is 1.62. The critical value for 95% significance is 1.86. Since the F value falls between these critical values, it is significant at the 90% level but not 95%.
Comments»
No comments yet — be the first.