jump to navigation

Is scoring different in the AL and the NL? May 31, 2011

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , ,
1 comment so far

The American League and the National League have one important difference. Specifically, the AL allows the use of a player known as the Designated Hitter, who does not play a position in the field, hits every time the pitcher would bat, and cannot be moved to a defensive position without forfeiting the right to use the DH. As a result, there are a couple of notable differences between the AL and the NL – in theory, there should be slightly more home runs and slightly fewer sacrifice bunts in the AL, since pitchers have to bat in the NL and they tend to be pretty poor hitters. How much can we quantify that difference? To answer that question, I decided to sample a ten-year period (2000 until 2009) from each league and run a linear regression of the form

\hat{R} = \beta_0 + \beta_1 H + \beta_2 2B + \beta_3 3B + \beta_4 HR + \beta_5 SB + \beta_6 CS + \\    \beta_7 BB + \beta_8 K + \beta_9 HBP + \beta_{10} Bunt + \beta_{11} SF

Where runs are presumed to be a function of hits, doubles, triples, home runs, stolen bases, times caught stealing, walks, strikeouts, hit batsmen, bunts, and sacrifice flies. My expectations are:

  • The sacrifice bunt coefficient should be smaller in the NL than in the AL – in the American League, bunting is used strategically, whereas NL teams are more likely to bunt whenever a pitcher appears, so in any randomly-chosen string of plate appearances, the chance that a bunt is the optimal strategy given an average hitter is much lower. (That is, pitchers bunt a lot, even when a normal hitter would swing away.) A smaller coefficient means each bunt produces fewer runs, on average.
  • The strategy from league to league should be different, as measured by different coefficients for different factors from league to league. That is, the designated hitter rule causes different strategies to be used. I’ll use a technique called the Chow test to test that. That means I’ll run the linear model on all of MLB, then separately on the AL and the NL, and look at the size of the errors generated.

The results:

  • In the AL, a sac bunt produces about .43 runs, on average, and that number is significant at the 95% level. In the NL, a bunt produces about .02 runs, and the number is not significantly different from saying that a bunt has no effect on run production.
  • The Chow Test tells us at about a 90% confidence level that the process of producing runs in the AL is different than the process of producing runs in the NL. That is, in Major League Baseball, the designated hitter has a statistically significant effect on strategy. There’s structural break.

R code is behind the cut.


Is ‘luck’ persistent? May 25, 2011

Posted by tomflesher in Baseball, Economics.
Tags: , , ,

I’ve been listening to Scott Patterson’s The Quants in my spare time recently. One of the recurring jokes is Wall Street traders’ use of the word ‘Alpha’ (which usually represents abnormal returns in finance) to refer to a general quality of being skillful or having talent. That led me to think about an old concept I haven’t played with in a while – wins above expectation.

As a quick review, wins above expectation relate a team’s actual wins to its Pythagorean expectation. If the team wins more than expected, it has a positive WAE number, and if it loses more than expected, it has wins below expectation, or, equivalently, a negative WAE. It’s tempting to think of WAE as representing a sort of ‘alpha’ in the traders’ sense – since the Pythagorean Expectation involves groups of runs scored and runs allowed, it generates a probability that a team with a history represented by its runs scored/runs allowed stats will win a given game. If a team has a lot more wins than expected, it seems like that represents efficiency – scoring runs at crucial times, not wasting them on blowing out opponents – or especially skillful management. Alternatively, it could just be luck. Is there any way to test which it is?

It’s difficult. However, let’s break down what the efficiency factor would imply. In general, it would represent some combination of individual player skill (such as the alleged clutch hitting ability) and team chemistry, whether that boils down to on- or off-field factors. Assuming rosters don’t change much over the course of the year, then, efficiency also shouldn’t change much over the course of the year. Similarly, if a manager’s skill was the primary determinant of wins above expectation, then for teams that don’t change managers midyear, we wouldn’t expect much of a change throughout the course of the season. Most managers work up through the minors, so there probably isn’t a major on-the-job training effect to consider.

On the other hand, if wins above expectation are just luck, then we wouldn’t need to place any restrictions on them. Maybe they’d change. Maybe they wouldn’t. Who knows?

In order to test that idea, I pulled some data for the American League off Baseball Reference from last season. I split the season into pre- and post-All-Star Break sets and calculated the Pythagorean expectation (using the 1.81 exponent  referred to in Wikipedia) for each team. I found WAE for each team in each session, then found each team’s ‘Alpha’ for that session by dividing WAE by the number of games played. Basically, I assumed that WAE represented extra win probability in some fashion and assumed it existed in every game at about the same level. The results:

\begin{tabular}{ | l | c | c | c| r | }  \hline  Team & WAE1 & Alpha1 & WAE2 & Alpha2 \\ \hline  NYY & 0.823 & 0.009 & -2.474 & -0.033 \\ \hline  TBR & -0.5 & -0.003 & 0.207 & 0.003 \\ \hline  BOS & 0.494 & 0.006 & 0.900 & 0.012 \\ \hline  TEX & -1.041 & -0.012 & 0.291 & 0.004 \\ \hline  CHW & 2.379 & 0.027 & -0.244 & -0.003 \\ \hline  DET & 3.918 & 0.046 & -4.706 & -0.062 \\ \hline  MIN & -1.67 & -0.019 &.3.693 & 0.05 \\ \hline  LAA & 3.83 & 0.042 & -2.860 & -0.040 \\ \hline  TOR & -0.202 & -0.002 & 1.555 & 0.021 \\ \hline  OAK & -1.939 & -0.022 & -2.418 & -0.033 \\ \hline  KCR & 0.023 & 0.000 & 1.976 & 0.027 \\ \hline  SEA & 0.225 & 0.003 & 2.188 & 0.03 \\ \hline  CLE & -2.096 & -0.023 & 0.907 & 0.012 \\ \hline  BAL & -1.028 & -0.012 & 8.900 & 0.120 \\ \hline  \end{tabular}

As is evident from the table, a whopping 10 out of the 14 teams see a change in the sign of Alpha from before the All-Star Game to after the All-Star Game. The correlation coefficient of Alpha from pre- to post-All-Star is -.549, which is a pretty noisy correlation. (Note also that this very closely describes regression to the mean.) It’s not 0, but it’s also negative, implying one of two things: Either teams become less efficient and/or more badly managed, on average, after the break, or Alpha represents very little more than a realization of a random process, which might just as well be described as luck.

Diagnosing the AL December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , ,
add a comment

In the previous post, I crunched some numbers on a previous forecast I’d made and figured out that it was a pretty crappy forecast. (That’s the fun of forecasting, of course – sometimes you’re right and sometimes you’re wrong.) The funny part of it, though, is that the predicted home runs per game for the American League was so far off – 3.4 standard errors below the predicted value – that it’s highly unlikely that the regression model I used controls for all relevant variables. That’s not surprising, since it was only a time trend with a dummy variable for the designated hitter.

There are a couple of things to check for immediately. The first is the most common explanation thrown around when home runs drop – steroids. It seems to me that if the drop in home runs were due to better control of performance-enhancing drugs, then it should mostly be home runs that are affected. For example, intentional walks should probably be below expectation, since intentional walks are used to protect against a home run hitter. Unintentional walks should probably be about as expected, since walks are a function of plate discipline and pitcher control, not of strength. On-base percentage should probably drop at a lower magnitude than home runs, since some hits that would have been home runs will stay in the park as singles, doubles, or triples rather than all being fly-outs. There will be a drop but it won’t be as big. Finally, slugging average should drop because a loss in power without a corresponding increase in speed will lower total bases.

I’ll analyze these with pretty new R code behind the cut.