jump to navigation

Did Run Production Change in 2010? June 2, 2011

Posted by tomflesher in Baseball, Economics.
Tags: , ,
trackback

Part of the narrative of last year’s season was the compelling “Year of the Pitcher” storyline prompted by an unusual number of no-hitters and perfect games. Though it’s too early in the season to say the same thing is happening this year, a few bloggers have suggested that run production is down in 2011 and we might see the same sort of story starting again.

As a quick and dirty check of this, I’d like to compare production in the 2000-2009 sample I used in a previous post to production in 2010. This will introduce a few problems, notably that using one year’s worth of data for run production will lead to possibly spurious results for the 2010 data and that the success of the pitchers may be a result of the strategy used to generate runs. That is, if pitchers get better, and strategy doesn’t change, then we see pitchers taking advantage of inefficiencies in strategy. If batting strategy stays the same and pitchers take advantage of bad batting, then we should see a change in the structure of run production since the areas worked over by hitters – for example, walks and strikeouts – will see shifts in their relative importance in scoring runs.

Hypothesis: A regression model of runs against hits, doubles, triples, home runs, stolen bases, times caught stealing, walks, times hit by pitch, sacrifice bunts, and sacrifice flies using two datasets, one with team-level season-long data for each year from 2000 to 2009 and the other from 2010 only, will yield statistically similar beta coefficients.

Method: Chow test.

Result: There is a difference, significant at the 90% but not 95% level. That might be a result of a change in strategy or of pitchers exploiting strategic inefficiencies.

R code behind the cut.

The AL 2000-2009 regression is in the linked post.

> al2010.lm <- lm(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF)
> summary(al2010.lm)

Call:
lm(formula = R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP +
    SH + SF)

Residuals:
      1       2       3       4       5       6       7       8       9      10
 1.6811 -0.4015 -4.7134 -4.4748  2.3670 -0.6163 10.9223 -5.9141  3.6778  8.7488
     11      12      13      14
-6.5535 -1.8978 -1.3545 -1.4711

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -143.38548  216.54661  -0.662   0.5760
H             -0.02621    0.14614  -0.179   0.8742
X2B            0.39989    0.22205   1.801   0.2135
X3B            2.64850    1.11995   2.365   0.1418
HR             1.60310    0.18286   8.767   0.0128 *
SB            -0.59883    0.32574  -1.838   0.2074
CS            -0.34655    0.92426  -0.375   0.7437
BB             0.34067    0.18183   1.874   0.2018
SO            -0.03886    0.09155  -0.424   0.7125
HBP            1.68038    0.76545   2.195   0.1593
SH             5.12139    1.71244   2.991   0.0960 .
SF             2.72034    1.26899   2.144   0.1653
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.15 on 2 degrees of freedom
Multiple R-squared: 0.997,      Adjusted R-squared: 0.9802
F-statistic: 59.46 on 11 and 2 DF,  p-value: 0.01665

> al2010.aov <- aov(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF)
> summary(al2010.aov)
            Df Sum Sq Mean Sq  F value   Pr(>F)
H            1  36454   36454 210.6561 0.004714 **
X2B          1  22048   22048 127.4102 0.007757 **
X3B          1   6688    6688  38.6481 0.024912 *
HR           1  33354   33354 192.7387 0.005148 **
SB           1   5595    5595  32.3342 0.029562 *
CS           1      1       1   0.0084 0.935262
BB           1   7329    7329  42.3497 0.022808 *
SO           1     52      52   0.2983 0.639733
HBP          1     10      10   0.0592 0.830500
SH           1    855     855   4.9422 0.156254
SF           1    795     795   4.5955 0.165277
Residuals    2    346     173
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Created by Pretty R at inside-R.org

> altotal.lm <- lm(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF)
> summary(altotal.lm)

Call:
lm(formula = R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP +
    SH + SF)

Residuals:
    Min      1Q  Median      3Q     Max
-51.037 -13.182  -3.234  14.527  48.900

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -423.13734   54.39431  -7.779 1.36e-12 ***
H              0.47931    0.03674  13.044  < 2e-16 ***
X2B            0.25617    0.09160   2.797 0.005879 **
X3B            1.03417    0.24231   4.268 3.58e-05 ***
HR             0.95164    0.06822  13.950  < 2e-16 ***
SB             0.15907    0.08224   1.934 0.055070 .
CS            -0.37668    0.22847  -1.649 0.101428
BB             0.35750    0.03157  11.324  < 2e-16 ***
SO            -0.05014    0.02363  -2.122 0.035574 *
HBP            0.55322    0.14781   3.743 0.000264 ***
SH             0.44079    0.19665   2.241 0.026546 *
SF             0.65236    0.26675   2.446 0.015684 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.56 on 142 degrees of freedom
Multiple R-squared: 0.9248,     Adjusted R-squared: 0.919
F-statistic: 158.8 on 11 and 142 DF,  p-value: < 2.2e-16

> altotal.aov <- aov(R ~ H + X2B + X3B + HR + SB + CS + BB + SO + HBP + SH + SF)
> summary(altotal.cov)
Error in summary(altotal.cov) : object 'altotal.cov' not found
> summary(altotal.aov)
             Df Sum Sq Mean Sq   F value    Pr(>F)
H             1 595470  595470 1072.4898 < 2.2e-16 ***
X2B           1  33266   33266   59.9146 1.689e-12 ***
X3B           1   8866    8866   15.9678 0.0001031 ***
HR            1 214542  214542  386.4084 < 2.2e-16 ***
SB            1   4296    4296    7.7368 0.0061460 **
CS            1   5465    5465    9.8435 0.0020727 **
BB            1  90402   90402  162.8211 < 2.2e-16 ***
SO            1   3049    3049    5.4914 0.0204978 *
HBP           1   7850    7850   14.1386 0.0002475 ***
SH            1   3412    3412    6.1461 0.0143408 *
SF            1   3321    3321    5.9810 0.0156843 *
Residuals   142  78842     555
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> numerator <- (78842 - (346 + 68548))/11
> denominator <- (346 + 68548)/(140 + 14 - 22)
> numerator/denominator
[1] 1.732749

Created by Pretty R at inside-R.org

Chow’s test yields an F value of approximately 1.73. On 11 and 132 degrees of freedom, the critical value for 90% significance is 1.62. The critical value for 95% significance is 1.86. Since the F value falls between these critical values, it is significant at the 90% level but not 95%.

Advertisement

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: