Quickie: MLB Playoffs by Pitching Statistics February 23, 2010
Posted by tomflesher in Baseball.Tags: Baseball, OLS, playoffs, probit, regression
trackback
It’s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I’m trying to stay warm by turning up my hot stove the way only an economist can – crunching the numbers on playoffs.
I’m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of parsimony. It contains dummy variables teamdivwin and teamwildcard which take value 1 if the pitcher’s team won the division or the wildcard respectively. I then created a variable playoffs which took the value of the sum of teamdivwin and teamwildcard – just a playoff dummy variable.
Using a Probit model and a standard OLS regression model, I estimated the effects of individual pitching stats on playoffs. Neither model has very strong predictive value (linear has R-squared of about .05), which is unsurprising since it doesn’t take the team’s batting into account at all. None of the coefficient values are shocking – in the American League (designated as lg = 1), teams have a higher probability of making the playoffs because there are fewer teams, and although complete games appear to have a negative effect, the positive shutout effect more than makes up for that in both models. I’m interested in whether complete game wins and complete game losses have differential effects – that will probably be my next snowy-day project.
Results are behind the cut.
Results:
Call:
glm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV +
Lg + R, family = binomial(link = “probit”))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8444 -0.7356 -0.6261 -0.3803 2.4768
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.756627 0.046176 -16.386 < 2e-16 ***
W 0.123523 0.011183 11.046 < 2e-16 ***
SHO 0.187091 0.107494 1.740 0.081774 .
CG -0.140882 0.060472 -2.330 0.019822 *
weightedsaves -0.076265 0.020332 -3.751 0.000176 ***
SV 0.097770 0.025446 3.842 0.000122 ***
Lg 0.190521 0.050481 3.774 0.000161 ***
R -0.015532 0.001556 -9.985 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3423.4 on 3221 degrees of freedom
Residual deviance: 3251.9 on 3214 degrees of freedom
AIC: 3267.9
Number of Fisher Scoring iterations: 4
Call:
lm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV + Lg +
R)
Residuals:
Min 1Q Median 3Q Max
-0.72890 -0.24345 -0.18725 -0.04165 1.01024
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.225013 0.013119 17.151 < 2e-16 ***
W 0.035344 0.003105 11.382 < 2e-16 ***
SHO 0.058328 0.030826 1.892 0.058560 .
CG -0.040513 0.017029 -2.379 0.017417 *
weightedsaves -0.022451 0.005671 -3.959 7.70e-05 ***
SV 0.029226 0.007193 4.063 4.96e-05 ***
Lg 0.055360 0.014435 3.835 0.000128 ***
R -0.004171 0.000401 -10.401 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.406 on 3214 degrees of freedom
Multiple R-squared: 0.05262, Adjusted R-squared: 0.05056
F-statistic: 25.5 on 7 and 3214 DF, p-value: < 2.2e-16
Comments»
No comments yet — be the first.