Quickie: MLB Playoffs by Pitching Statistics

Quickie: MLB Playoffs by Pitching Statistics February 23, 2010

Posted by tomflesher in Baseball.
Tags: Baseball, OLS, playoffs, probit, regression
trackback

It’s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I’m trying to stay warm by turning up my hot stove the way only an economist can – crunching the numbers on playoffs.

I’m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of parsimony. It contains dummy variables teamdivwin and teamwildcard which take value 1 if the pitcher’s team won the division or the wildcard respectively. I then created a variable playoffs which took the value of the sum of teamdivwin and teamwildcard – just a playoff dummy variable.

Using a Probit model and a standard OLS regression model, I estimated the effects of individual pitching stats on playoffs. Neither model has very strong predictive value (linear has R-squared of about .05), which is unsurprising since it doesn’t take the team’s batting into account at all. None of the coefficient values are shocking – in the American League (designated as lg = 1), teams have a higher probability of making the playoffs because there are fewer teams, and although complete games appear to have a negative effect, the positive shutout effect more than makes up for that in both models. I’m interested in whether complete game wins and complete game losses have differential effects – that will probably be my next snowy-day project.

Results are behind the cut.

Results:

Call:
glm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV +
Lg + R, family = binomial(link = “probit”))

Deviance Residuals:
Min 1Q Median 3Q Max
-1.8444 -0.7356 -0.6261 -0.3803 2.4768

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.756627   0.046176 -16.386 < 2e-16 ***
W              0.123523   0.011183 11.046 < 2e-16 ***
SHO            0.187091   0.107494   1.740 0.081774 .
CG            -0.140882   0.060472 -2.330 0.019822 *
weightedsaves -0.076265   0.020332 -3.751 0.000176 ***
SV             0.097770   0.025446   3.842 0.000122 ***
Lg             0.190521   0.050481   3.774 0.000161 ***
R             -0.015532   0.001556 -9.985 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3423.4 on 3221 degrees of freedom
Residual deviance: 3251.9 on 3214 degrees of freedom
AIC: 3267.9

Number of Fisher Scoring iterations: 4

Call:
lm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV + Lg +
R)

Residuals:
Min 1Q Median 3Q Max
-0.72890 -0.24345 -0.18725 -0.04165 1.01024

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.225013   0.013119 17.151 < 2e-16 ***
W              0.035344   0.003105 11.382 < 2e-16 ***
SHO            0.058328   0.030826   1.892 0.058560 .
CG            -0.040513   0.017029 -2.379 0.017417 *
weightedsaves -0.022451   0.005671 -3.959 7.70e-05 ***
SV             0.029226   0.007193   4.063 4.96e-05 ***
Lg             0.055360   0.014435   3.835 0.000128 ***
R             -0.004171   0.000401 -10.401 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.406 on 3214 degrees of freedom
Multiple R-squared: 0.05262, Adjusted R-squared: 0.05056
F-statistic: 25.5 on 7 and 3214 DF, p-value: < 2.2e-16

Comments»

No comments yet — be the first.

The World's Worst Sports Blog