forecasting | The World's Worst Sports Blog

Jim Thome, Revised July 14, 2011

Posted by tomflesher in Baseball.
Tags: 600 home runs, Baseball, forecasting, Jim Thome
add a comment

In an earlier post, I predicted that if Jim Thome stayed healthy, he’d hit the 600 home run mark at some point in late July, with a loose prediction that he’d hit it around July 26 (the Twins’ 100th game). Since he got hurt, and since he’s been playing hurt for a while, it’s worth refiguring the date.

Thome needs five home runs.

This year, Thome has hit 6 home runs in 128 plate appearances for a rate of .046875 home runs per plate appearance, or one home run every 21 1/3 plate appearances. That’s down quite a bit from his career rate, which worked out to one home run every 13.5 plate appearances. Since his return, though, he’s hit 2 home runs in 34 plate appearances, or one every 17. If that represents his true production, then he’ll need about 5*17 = 85 plate appearances to hit five more home runs.

Since his return, Thome has averaged 2.8 plate appearances per game he played in, but he’s had two nights off. Per team game, that works out to 2.4 plate appearances. That means, roughly, he’ll need about 85/2.4 = 35.4 team games to hit those 5 home runs, or, to round it up, he’ll probably hit his 600th 35 games from now. That 35th game is team game #124, at home against the Yankees on August 18th. If he maintains his 2.4 plate appearances per team game and he produces at his career rate (every 13.5 plate appearances), he’ll need about 68 plate appearances, or 28 games and change. The 29th game is on Friday, August 12, in Cleveland. (Wouldn’t that be sweet for Thome?) If he continues hitting ever 21 1/3 plate appearances, that means he’ll need about 107 plate appearances, or about 44 games and change. The 45th game is August 27, at home against Detroit.

It’ll become easier to nail down, but there’s about a ten-day window where I’d lay my odds for Thome to hit #600. If I had to narrow it down to a week, I’d shoot for the six-game series that starts on the road at Detroit on August 15 and ends at home against the Yankees on August 21. That accounts for Thome’s depressed home run production but doesn’t penalize him for playing hurt the way that assuming his pre-injury rate would.

What Happened to Home Runs This Year? December 22, 2010

Posted by tomflesher in Baseball, Economics.
Tags: baseball-reference.com, forecasting, home runs, R, regression, standard error, statistics, time series, Year of the Pitcher
1 comment so far

I was talking to Jim, the writer behind Apparently, I’m An Angels Fan, who’s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted “Year of the Pitcher” has affected home run production. Sure enough, on checking the AL Batting Encyclopedia at Baseball-Reference.com, production dropped by about .15 home runs per game (from 1.13 to .97). Is that normal statistical variation or does it show that this year was really different?

In two previous posts, I looked at the trend of home runs per game to examine Stuff Keith Hernandez Says and then examined Japanese baseball’s data for evidence of structural break. I used the Batting Encyclopedia to run a time-series regression for a quadratic trend and added a dummy variable for the Designated Hitter. I found that the time trend and DH control account for approximately 56% of the variation in home runs per year, and that the functional form is

$\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911 \times DH$

with t=1 in 1955, t=2 in 1956, and so on. That means t=56 in 2010. Consequently, we’d expect home run production per game in 2010 in the American League to be approximately

$\hat{HR} = .957 - .0188 \times 56 + .0004 \times 3136 + .0911 \approx 1.25$

That means we expected production to increase this year and it dropped precipitously, for a residual of -.28. The residual standard error on the original regression was .1092, so on 106 degrees of freedom, so the t-value using Texas A&M’s table is 1.984 (approximating using 100 df). That means we can be 95% confident that the actual number of home runs should fall within .1092*1.984, or about .2041, of the expected value. The lower bound would be about 1.05, meaning we’re still significantly below what we’d expect. In fact, the observed number is about 3.4 standard errors below the expected number. In other words, we’d expect that to happen by chance less than .1% (that is, less than one tenth of one percent) of the time.

Clearly, something else is in play.

The World's Worst Sports Blog

Jim Thome, Revised July 14, 2011

What Happened to Home Runs This Year? December 22, 2010

Recent Posts

The Bad Economist

Email Subscription

Feeds