jump to navigation

Are This Year’s Home Runs Really That Different? December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , , , , ,
add a comment

This year’s home runs are quite confounding. On the one hand, home runs per game in the AL have dropped precipitously (as noted and examined in the two previous posts). On the other hand, Jose Bautista had an absolutely outstanding year. How much different is this year’s distribution than those of previous years? To answer that question, I took off to Baseball Reference and found the list of all players with at least one plate appearance, sorted by home runs.

There are several parameters that are of interest when discussing the distribution of events. The first is the mean. This year’s mean was 5.43, meaning that of the players with at least one plate appearance, on average each one hit 5.43 homers. That’s down from 6.53 last year and 5.66 in 2008.

Next, consider the variance and standard deviation. (The variance is the standard deviation squared, so the numbers derive similarly.) A low variance means that the numbers are clumped tightly around the mean. This year’s variance was 68.4, down from last year’s 84.64 but up from 2008’s 66.44.

The skewness and kurtosis represent the length and thickness of the tails, respectively. Since a lot of people have very few home runs, the skewness of every year’s distribution is going to be positive. Roughly, that means that there are observations far larger than the mean, but very few that are far smaller. That makes sense, since there’s no such thing as a negative home run total. The kurtosis number represents how pointy the distribution is, or alternatively how much of the distribution is found in the tail.

For example, in 2009, Mark Teixeira and Carlos Pena jointly led the American League in home runs with 39. There was a high mean, but the tail was relatively thin with a high variance. Compared with this year, when Bautista led his nearest competitor (Paul Konerko) by 15 runs and only 8 players were over 30 home runs, 2009 saw 15 players above 30 home runs with a pretty tight race for the lead. Kurtosis in 2010 was 7.72 compared with 2009’s 4.56 and 2008’s 5.55. (In 2008, 11 players were above the 30-mark, and Miguel Cabrera‘s 37 home runs edged Carlos Quentin by just one.)

The numbers say that 2008 and 2009 were much more similar than either of them is to 2010. A quick look at the distributions bears that out – this was a weird year.

Home Run Derby: Does it ruin swings? December 15, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , , , , ,
add a comment

Earlier this year, there was a lot of discussion about the alleged home run derby curse. This post by Andy on Baseball-Reference.com asked if the Home Run Derby is bad for baseball, and this Hardball Times piece agrees with him that it is not. The standard explanation involves selection bias – sure, players tend to hit fewer home runs in the second half after they hit in the Derby, but that’s because the people who hit in the Derby get invited to do so because they had an abnormally high number of home runs in the first half.

Though this deserves a much more thorough macro-level treatment, let’s just take a look at the density of home runs in either half of the season for each player who participated in the Home Run Derby. Those players include David Ortiz, Hanley Ramirez, Chris Young, Nick Swisher, Corey Hart, Miguel Cabrera, Matt Holliday, and Vernon Wells.

For each player, plus Robinson Cano (who was of interest to Andy in the Baseball-Reference.com post), I took the percentage of games before the Derby and compared it with the percentage of home runs before the Derby. If the Ruined Swing theory holds, then we’d expect

g(HR) \equiv HR_{before}/HR_{Season} > g(Games) \equiv Games_{before}/162

The table below shows that in almost every case, including Cano (who did not participate), the density of home runs in the pre-Derby games was much higher than the post-Derby games.

Player HR Before HR Total g(Games) g(HR) Diff
Ortiz 18 32 0.54321 0.5625 0.01929
Hanley 13 21 0.54321 0.619048 0.075838
Swisher 15 29 0.537037 0.517241 -0.0198
Wells 19 31 0.549383 0.612903 0.063521
Holliday 16 28 0.54321 0.571429 0.028219
Hart 21 31 0.549383 0.677419 0.128037
Cabrera 22 38 0.530864 0.578947 0.048083
Young 15 27 0.549383 0.555556 0.006173
Cano 16 29 0.537037 0.551724 0.014687

Is this evidence that the Derby causes home run percentages to drop off? Certainly not. There are some caveats:

  • This should be normalized based on games the player played, instead of team games.
  • It would probably even be better to look at a home run per plate appearance rate instead.
  • It could stand to be corrected for deviation from the mean to explain selection bias.
  • Cano’s numbers are almost identical to Swisher’s. They play for the same team. If there was an effect to be seen, it would probably show up here, and it doesn’t.

Once finals are up, I’ll dig into this a little more deeply.