jump to navigation

Pitchers Hit This Year (or, Two Guys Named Buchholz) December 23, 2010

Posted by tomflesher in Baseball.
Tags: , , , , , , , , , , , , , , ,
add a comment

Okay, I admit it. This post was originally conceived as a way to talk about the supremely weird line put up by Gustavo Chacin, who in his only plate appearance for Houston hit a home run to leave him with the maximum season OPS of 5.0. Unfortunately, Raphy at Baseball Reference beat me to it. Instead, I noticed while I was browsing the NL’s home run log to prepare to run some diagnostics on it that Kenley Jansen had two plate appearances comprising one hit and one walk. (Seriously, is there anything this kid can’t do?)

In Kenley’s case, that’s not entirely surprising, since he was a catcher until this season. His numbers weren’t great, but he was competent. What surprised me was that 75 pitchers since 2000 have finished the season with a perfect batting average. 9 were from this year, including Clay Buchholz and his distant cousing Taylor Buchholz. Evan Meek and Bruce Chen matched Jansen’s two plate appearances without an out. None of the perfect batting average crowd had an extra-base hit except for Chacin.

Since 2000, the most plate appearances by a pitcher to keep the perfect batting average was 4 by Manny Aybar in 2000.

At the other end of the spectrum, this year only three pitchers managed a perfect 1.000 on-base percentage without getting any hits at all. George Sherrill and Matt Reynolds both walked in their only plate appearances; Jack Taschner went them one better by recording a sacrifice hit in a second plate appearance.

Finally, to round things out, this year saw Joe Blanton and Heureusement, ici, c’est le Blog‘s favorite pitcher, Yovani Gallardo, each get hit by two pitches. Gallardo had clearly angered other pitchers by being so much more awesome than they were.

Are This Year’s Home Runs Really That Different? December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , , , , ,
add a comment

This year’s home runs are quite confounding. On the one hand, home runs per game in the AL have dropped precipitously (as noted and examined in the two previous posts). On the other hand, Jose Bautista had an absolutely outstanding year. How much different is this year’s distribution than those of previous years? To answer that question, I took off to Baseball Reference and found the list of all players with at least one plate appearance, sorted by home runs.

There are several parameters that are of interest when discussing the distribution of events. The first is the mean. This year’s mean was 5.43, meaning that of the players with at least one plate appearance, on average each one hit 5.43 homers. That’s down from 6.53 last year and 5.66 in 2008.

Next, consider the variance and standard deviation. (The variance is the standard deviation squared, so the numbers derive similarly.) A low variance means that the numbers are clumped tightly around the mean. This year’s variance was 68.4, down from last year’s 84.64 but up from 2008’s 66.44.

The skewness and kurtosis represent the length and thickness of the tails, respectively. Since a lot of people have very few home runs, the skewness of every year’s distribution is going to be positive. Roughly, that means that there are observations far larger than the mean, but very few that are far smaller. That makes sense, since there’s no such thing as a negative home run total. The kurtosis number represents how pointy the distribution is, or alternatively how much of the distribution is found in the tail.

For example, in 2009, Mark Teixeira and Carlos Pena jointly led the American League in home runs with 39. There was a high mean, but the tail was relatively thin with a high variance. Compared with this year, when Bautista led his nearest competitor (Paul Konerko) by 15 runs and only 8 players were over 30 home runs, 2009 saw 15 players above 30 home runs with a pretty tight race for the lead. Kurtosis in 2010 was 7.72 compared with 2009’s 4.56 and 2008’s 5.55. (In 2008, 11 players were above the 30-mark, and Miguel Cabrera‘s 37 home runs edged Carlos Quentin by just one.)

The numbers say that 2008 and 2009 were much more similar than either of them is to 2010. A quick look at the distributions bears that out – this was a weird year.

Diagnosing the AL December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , ,
add a comment

In the previous post, I crunched some numbers on a previous forecast I’d made and figured out that it was a pretty crappy forecast. (That’s the fun of forecasting, of course – sometimes you’re right and sometimes you’re wrong.) The funny part of it, though, is that the predicted home runs per game for the American League was so far off – 3.4 standard errors below the predicted value – that it’s highly unlikely that the regression model I used controls for all relevant variables. That’s not surprising, since it was only a time trend with a dummy variable for the designated hitter.

There are a couple of things to check for immediately. The first is the most common explanation thrown around when home runs drop – steroids. It seems to me that if the drop in home runs were due to better control of performance-enhancing drugs, then it should mostly be home runs that are affected. For example, intentional walks should probably be below expectation, since intentional walks are used to protect against a home run hitter. Unintentional walks should probably be about as expected, since walks are a function of plate discipline and pitcher control, not of strength. On-base percentage should probably drop at a lower magnitude than home runs, since some hits that would have been home runs will stay in the park as singles, doubles, or triples rather than all being fly-outs. There will be a drop but it won’t be as big. Finally, slugging average should drop because a loss in power without a corresponding increase in speed will lower total bases.

I’ll analyze these with pretty new R code behind the cut.

(more…)

What Happened to Home Runs This Year? December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , ,
1 comment so far

I was talking to Jim, the writer behind Apparently, I’m An Angels Fan, who’s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted “Year of the Pitcher” has affected home run production. Sure enough, on checking the AL Batting Encyclopedia at Baseball-Reference.com, production dropped by about .15 home runs per game (from 1.13 to .97). Is that normal statistical variation or does it show that this year was really different?

In two previous posts, I looked at the trend of home runs per game to examine Stuff Keith Hernandez Says and then examined Japanese baseball’s data for evidence of structural break. I used the Batting Encyclopedia to run a time-series regression for a quadratic trend and added a dummy variable for the Designated Hitter. I found that the time trend and DH control account for approximately 56% of the variation in home runs per year, and that the functional form is

\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH

with t=1 in 1955, t=2 in 1956, and so on. That means t=56 in 2010. Consequently, we’d expect home run production per game in 2010 in the American League to be approximately

\hat{HR} = .957 - .0188 \times 56 + .0004 \times 3136 + .0911 \approx 1.25

That means we expected production to increase this year and it dropped precipitously, for a residual of -.28. The residual standard error on the original regression was .1092, so on 106 degrees of freedom, so the t-value using Texas A&M’s table is 1.984 (approximating using 100 df). That means we can be 95% confident that the actual number of home runs should fall within .1092*1.984, or about .2041, of the expected value. The lower bound would be about 1.05, meaning we’re still significantly below what we’d expect. In fact, the observed number is about 3.4 standard errors below the expected number. In other words, we’d expect that to happen by chance less than .1% (that is, less than one tenth of one percent) of the time.

Clearly, something else is in play.

Home Run Derby: Does it ruin swings? December 15, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , , , , , , , , , , ,
add a comment

Earlier this year, there was a lot of discussion about the alleged home run derby curse. This post by Andy on Baseball-Reference.com asked if the Home Run Derby is bad for baseball, and this Hardball Times piece agrees with him that it is not. The standard explanation involves selection bias – sure, players tend to hit fewer home runs in the second half after they hit in the Derby, but that’s because the people who hit in the Derby get invited to do so because they had an abnormally high number of home runs in the first half.

Though this deserves a much more thorough macro-level treatment, let’s just take a look at the density of home runs in either half of the season for each player who participated in the Home Run Derby. Those players include David Ortiz, Hanley Ramirez, Chris Young, Nick Swisher, Corey Hart, Miguel Cabrera, Matt Holliday, and Vernon Wells.

For each player, plus Robinson Cano (who was of interest to Andy in the Baseball-Reference.com post), I took the percentage of games before the Derby and compared it with the percentage of home runs before the Derby. If the Ruined Swing theory holds, then we’d expect

g(HR) \equiv HR_{before}/HR_{Season} > g(Games) \equiv Games_{before}/162

The table below shows that in almost every case, including Cano (who did not participate), the density of home runs in the pre-Derby games was much higher than the post-Derby games.

Player HR Before HR Total g(Games) g(HR) Diff
Ortiz 18 32 0.54321 0.5625 0.01929
Hanley 13 21 0.54321 0.619048 0.075838
Swisher 15 29 0.537037 0.517241 -0.0198
Wells 19 31 0.549383 0.612903 0.063521
Holliday 16 28 0.54321 0.571429 0.028219
Hart 21 31 0.549383 0.677419 0.128037
Cabrera 22 38 0.530864 0.578947 0.048083
Young 15 27 0.549383 0.555556 0.006173
Cano 16 29 0.537037 0.551724 0.014687

Is this evidence that the Derby causes home run percentages to drop off? Certainly not. There are some caveats:

  • This should be normalized based on games the player played, instead of team games.
  • It would probably even be better to look at a home run per plate appearance rate instead.
  • It could stand to be corrected for deviation from the mean to explain selection bias.
  • Cano’s numbers are almost identical to Swisher’s. They play for the same team. If there was an effect to be seen, it would probably show up here, and it doesn’t.

Once finals are up, I’ll dig into this a little more deeply.

In Memoriam November 11, 2010

Posted by tomflesher in Baseball.
add a comment

In Flanders Fields the poppies blow
Between the crosses row on row,
That mark our place; and in the sky
The larks, still bravely singing, fly
Scarce heard amid the guns below.

We are the Dead. Short days ago
We lived, felt dawn, saw sunset glow,
Loved and were loved, and now we lie
In Flanders fields.

Take up our quarrel with the foe:
To you from failing hands we throw
The torch; be yours to hold it high.
If ye break faith with us who die
We shall not sleep, though poppies grow
In Flanders fields.

John McCrae

Fire Up The Hot Stove November 2, 2010

Posted by tomflesher in Baseball.
Tags: , , , , , ,
add a comment

Although I’m usually fairly heavy on the statistical content, I can’t help but mention a few impressions from Game 5 of the World Series last night.

  • If I didn’t have Baseball-Reference.com to tell me different, I’d have assumed Aubrey Huff wasn’t an everyday first baseman from the way he played last night. He was competent and made some nice picks, but he didn’t seem to have the ankle-preservation instinct that most everyday 1Bs do. He seemed to have his heels back quite far on the bag most of the time.
  • The rumors about the Yankees pursuing Cliff Lee strike me as cartoonish supervillainy. “If I cannot defeat you, I will simply BUY you!”
  • Game 3 was the Lee vs. Tim Lincecum gem that we all assumed Game 1 would be.
  • Somewhere, Bengie Molina is secretly pouring champagne all over himself.
  • If the postseason came before voting, Buster Posey would be a lock for Rookie of the

Quickie: Ryan Howard’s Choke Index October 25, 2010

Posted by tomflesher in Baseball.
Tags: , , , , ,
1 comment so far

The Choke Index is alive and well.

Previous to 2010, Ryan Howard of the Philadelphia Phillies hit home runs in three consecutive postseasons. He managed 7 in his 140 plate appearances, averaging out to .05 home runs per plate appearance. Not too shabby. It’s a bit below his regular season rate of about .067, but there are a bunch of things that could account for that.

This year, Ryan made 38 plate appearances and hit a grand total of 0 home runs in the postseason. What’s the likelihood of that happening? I use the Choke Index (one minus the probability of hitting 0 home runs in a given number of plate appearances) to measure that. As always, the closer a player gets to 1, the more unlikely his homer-free streak is.

The binomial probability can be calculated using the formula

f(k;n,p) = \Pr(K = k) = {n\choose k}p^k(1-p)^{n-k}

Or, since we’re looking for the probability of an event NOT occurring,

(1-p)^k

or .95^{38}= .142

using his career postseason numbers. That means that Ryan Howard’s 2010 postseason Choke Index is .858. Pretty impressive!

Burnett, Hughes, and Playoff Rotations October 12, 2010

Posted by tomflesher in Baseball.
Tags: , , , , , , , , , , , ,
add a comment

There was much discussion of the Yankees’ specialized playoff rotation for the American League Division Series. As is conventional in the ALDS, Joe Girardi went with a three-man rotation. CC Sabathia and Andy Pettitte were locks; the third starter could have been A.J. Burnett, Javier Vazquez, or Dustin Moseley. Girardi went with young All-Star Phil Hughes in the third slot. That, of course, led to a sweep of the Minnestoa Twins to advance to the American League Championship Series.

First of all, I think it was probably the right decision. Hughes pitched 176 1/3 innings and gave up 82 earned runs, for an ER/IP of about .47. In Burnett’s 186 2/3 innings, he allowed 109 runs for an ER/IP of about .58. Surprisingly, Burnett allowed 9 unearned runs for a rate of about .048 unearned runs per inning pitched, whereas Hughes had only one unearned run for a rate of about .006, but of course those numbers probably don’t say anything significant. With 730 batters faced, he allowed about .11 earned runs per batter, or about 1 earned run every 9 batters faced, while Burnett’s 829 batters faced mean he had similar numbers of .13 earned runs per batter and 7.69 batters.

Most importantly to me, Hughes was much more predictable. Burnett faced, on average, 4.68 batters per inning pitched, with a variance of .92. Hughes faced over half a batter less per inning – 4.13 – and had a variance of .33. That means that not only did Burnett allow more baserunners, but when he was off, he was very off. Although the decision gets tougher when you have a higher BF/IP and a lower variance, Hughes was both better and more consistent in a similar number of innings, so he has to get the nod.

(That said, it’s shocking that such similar numbers produced one 18-8 pitcher and one 10-15 pitcher.)

The only question now is what order to pitch the announced four-man rotation for the ALCS. Of the choices,

OPTION 3
Sabathia
Hughes
Pettitte
Burnett
Sabathia
Hughes
Pettitte

seems clearly superior to me. It allows Burnett to start but avoids starting him twice, gets Hughes in play quite often, and puts the very reliable Andy Pettitte in play for a potential Game Seven. The linked article lists as a con that Pettitte is considered the number 2 starter, but at the Major League level a manager can’t be concerned with such frivolities. Besides, Pettitte is an established company man. I’d be surprised if he balked at a rotation that both maximized the team’s chances to win and put him in position to be the clutch hero.

Incidentally, this option lends itself to using the same rotation in the World Series. Option 2:

Sabathia
Pettitte
Hughes
Sabathia
Burnett
Pettitte
Sabathia

leaves Sabathia unavailable to start Game 1 of the World Series and might put Pettitte on short rest depending on the schedule to start Game 1. I can’t see starting the Series with Hughes or Burnett.

Jim Thome’s Marginal Value October 5, 2010

Posted by tomflesher in Baseball, Economics.
Tags: , ,
add a comment

I’ve alluded to the similarity between Manny Ramirez and Jim Thome quite a bit. They both played in Cleveland for a few years before moving on to other teams. They’re each in the DH phase of their careers. Thome is about two years older than Ramirez, but otherwise they’ve had relatively similar production. That’s why it was so odd for the White Sox to let Thome go a few years back only to pick an injured, probably going-downhill Manny for about a quarter of the season when Ramirez is making about $18 million and Thome’s maximum salary was about $15.7 million. There’s an argument that Manny still has more productive years left than Thome, of course. (I happen to think that argument is wrong, but that’s just me.)

Just for fun, let’s take a look at their production since Manny’s trade.

In the last 24 games he played, Ramirez had 88 plate appearances, a respectable .420 OBP, and a Jeteresque .261 batting average. His win probability added was -.273, for those of you who are into that sort of thing. Meanwhile, over the same number of games, the flagging, decrepit Thome had only 79 plate appearances, with a paltry .333 batting average, and only a .494 OBP.

Thome’s salary this year for the Twins was $1.5 million.

I think the winner here is clear.