Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , , ,
1 comment so far

I was all set to fire up the Choke Index again this year. Unfortunately, Derek Jeter foiled my plan by making his 3000th hit right on time, so I can’t get any mileage out of that. Perhaps Jim Thome will start choking around #600 – but, frankly, I hope not. Since Jeter had such a callous disregard for the World’s Worst Sports Blog’s material, I’m forced to make up a new statistic.

This actually plays into an earlier post I made, which was about home field advantage for the Giants. It started off as a very simple regression for National League teams to see if the Giants’ pattern – a negative effect on runs scored at home, no real effect from the DH – held across the league. Those results are interesting and hold with the pattern that we’ll see below – I’ll probably slice them into a later entry.

The first thing I wanted to do, though, was find team effects on runs scored. Basically, I want to know how many runs an average team of Greys will score, how many more runs they’ll score at home, how many more runs they’ll score on the road if they have a DH, and then how many more runs the Phillies, the Mets, or any other team will score above their total. I’m doing this by converting Baseball Reference’s schedules and results for each team through their last game on July 10 to a data file, adding dummy variables for each team, and then running a linear regression of runs scored by each team against dummy variables for playing at home, playing with a DH, and the team dummies. In equation form,

$\hat{R} = \beta_0 + \beta_1 Home + \beta_2 DH + \delta_{PHI} + \delta_{ATL} + ... + \delta_{COL}$

For technical reasons, I needed to leave a team out, and so I chose the team that had the most negative coefficient: the Padres. Basically, then, the $\delta$ terms represent how many runs the team scores above what the Padres would score. I call this “RAP,” for Runs Above Padres. I then ran the same equation, but rather than runs scored by the team, I estimated runs allowed by the team’s defense. That, logically enough, was called “ARAP,” for Allowed Runs Above Padres. A positive RAP means that a team scores more runs than the Padres, while a negative ARAP means the team doesn’t allow as many runs as the Padres. Finally, to pull it all together, one handy number shows how many more runs better off a team is than the Padres:

$Padre Differential = RAP - ARAP$

That is, the Padre Differential shows whether a team’s per-game run differential is higher or lower than the Padres’.

The table below shows each team in the National League, sorted by Padre Differential. By definition, San Diego’s Padre Differential is zero. ‘Sig95’ represents whether or not the value is statistically significant at the 95% level.

$\begin{tabular}{|r||r|r|r|r|r|} \hline \textbf{Team}&\textbf{RAP}&\textbf{Sig95}&\textbf{ARAP}&\textbf{Sig95}&\textbf{Padre Differential}\\ \hline PHI&0.915521&1&\textbf{-0.41136}&0&\textbf{1.326881}\\ \hline ATL&0.662871&0&-0.26506&0&0.927931\\ \hline CIN&\textbf{1.44507}&1&0.75882&0&0.68625\\ \hline STL&1.402174&1&0.75&0&0.652174\\ \hline NYM&1.079943&1&0.58458&0&0.495363\\ \hline ARI&1.217101&1&0.74589&0&0.471211\\ \hline SFG&0.304031&0&-0.15842&0&0.462451\\ \hline PIT&0.628821&0&0.1873&0&0.441521\\ \hline MIL&1.097899&1&0.74016&0&0.357739\\ \hline WSN&0.521739&0&0.17391&0&0.347829\\ \hline COL&1.036033&0&0.81422&0&0.221813\\ \hline LAD&0.391595&0&0.38454&0&0.007055\\ \hline FLA&0.564074&0&0.66097&0&-0.0969\\ \hline CHC&0.771739&0&1.31522&1&-0.54348\\ \hline HOU&0.586857&0&1.38814&1&-0.80128\\ \hline \end{tabular}$

Unsurprisingly, the Phillies – the best team in baseball – have the highest Padre Differential in the league, with over 1.3 runs on average better than the Padres. Houston, in the cellar of the NL Central, is the worst team in the league and is .8 runs worse than the Padres per game. Florida and Chicago are both worse than the Padres and are both close to (Florida, 43) or below (Chicago, 37) the Padres’ 40-win total.

## Is scoring different in the AL and the NL?May 31, 2011

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , ,
1 comment so far

The American League and the National League have one important difference. Specifically, the AL allows the use of a player known as the Designated Hitter, who does not play a position in the field, hits every time the pitcher would bat, and cannot be moved to a defensive position without forfeiting the right to use the DH. As a result, there are a couple of notable differences between the AL and the NL – in theory, there should be slightly more home runs and slightly fewer sacrifice bunts in the AL, since pitchers have to bat in the NL and they tend to be pretty poor hitters. How much can we quantify that difference? To answer that question, I decided to sample a ten-year period (2000 until 2009) from each league and run a linear regression of the form

$\hat{R} = \beta_0 + \beta_1 H + \beta_2 2B + \beta_3 3B + \beta_4 HR + \beta_5 SB + \beta_6 CS + \\ \beta_7 BB + \beta_8 K + \beta_9 HBP + \beta_{10} Bunt + \beta_{11} SF$

Where runs are presumed to be a function of hits, doubles, triples, home runs, stolen bases, times caught stealing, walks, strikeouts, hit batsmen, bunts, and sacrifice flies. My expectations are:

• The sacrifice bunt coefficient should be smaller in the NL than in the AL – in the American League, bunting is used strategically, whereas NL teams are more likely to bunt whenever a pitcher appears, so in any randomly-chosen string of plate appearances, the chance that a bunt is the optimal strategy given an average hitter is much lower. (That is, pitchers bunt a lot, even when a normal hitter would swing away.) A smaller coefficient means each bunt produces fewer runs, on average.
• The strategy from league to league should be different, as measured by different coefficients for different factors from league to league. That is, the designated hitter rule causes different strategies to be used. I’ll use a technique called the Chow test to test that. That means I’ll run the linear model on all of MLB, then separately on the AL and the NL, and look at the size of the errors generated.

The results:

• In the AL, a sac bunt produces about .43 runs, on average, and that number is significant at the 95% level. In the NL, a bunt produces about .02 runs, and the number is not significantly different from saying that a bunt has no effect on run production.
• The Chow Test tells us at about a 90% confidence level that the process of producing runs in the AL is different than the process of producing runs in the NL. That is, in Major League Baseball, the designated hitter has a statistically significant effect on strategy. There’s structural break.

R code is behind the cut.