Mets Fans, Meet Your New Closer July 17, 2011
Posted by tomflesher in Baseball, Economics.Tags: Bobby Parnell, closers, Francisco Rodriguez, Jason Isringhausen, Mets, Pedro Beato
add a comment
It’s been a while since the Mets traded Francisco Rodriguez, the 1982 model, to the Milwaukee Brewers. Mets manager Terry Collins has indicated that Rule 5 draft pick Pedro Beato, cranky old man Jason Isringhausen, and veteran Met Bobby Parnell are in competition for the closer role. Rodriguez had a reputation for being unpredictable, and watching him certainly gave that impression – he pitched wildly and emotionally.
I decided to dig out K-Rod’s stats for this year and figure out what his numbers looked like, using a couple of measures of control: his K/BB ratio (aka ‘control ratio’), his K/9 and BB/9, and then his batters faced per out (BFPO). If Rodriguez is unpredictable, then he should have a relatively high standard deviation for BFPO. With that in mind, if predictability is an important factor in selecting a closer, these stats are relevant for Beato, Isringhausen, and Parnell as well. Here they are, for 2011:
The best number overall is bolded. The best from among the three closer candidates is italicized.
Rodriguez had the best KBB and BB9, as well as the lowest standard deviation, but his BFPO was the highest in the group. Since he wasn’t walking many batters, that indicates that he was giving up a lot of hits or otherwise allowing lots of runners. That’s not good – it breeds high-pressure situations, some of which are bound to result in runs.
Beato had the lowest BFPO, but Parnell led all the other categories for current Mets as well as having a better K/9 than Rodriguez as well. Parnell’s BFPO was only .02 below Frankie’s, and was .15 higher than Beato’s (and about .05 greater than Izzy’s). Without a lot more data, it’s hard to compare these numbers meaningfully. However, over the course of 70 innings, that .15 differential adds up to 31.5 extra baserunners for Parnell above Beato. Parnell’s lower standard deviation means that those runners are going to be spread a bit more evenly than Beato’s, but it’s tough to distinguish the best choice. Isringhausen has been strong as a setup man, and Beato, as a rookie, is still unpredictable.
Parnell will probably come out of this with the closer’s job, but Collins would be a fool not to leave Isringhausen where he is.
Jim Thome, Revised July 14, 2011
Posted by tomflesher in Baseball.Tags: 600 home runs, Baseball, forecasting, Jim Thome
add a comment
In an earlier post, I predicted that if Jim Thome stayed healthy, he’d hit the 600 home run mark at some point in late July, with a loose prediction that he’d hit it around July 26 (the Twins’ 100th game). Since he got hurt, and since he’s been playing hurt for a while, it’s worth refiguring the date.
Thome needs five home runs.
This year, Thome has hit 6 home runs in 128 plate appearances for a rate of .046875 home runs per plate appearance, or one home run every 21 1/3 plate appearances. That’s down quite a bit from his career rate, which worked out to one home run every 13.5 plate appearances. Since his return, though, he’s hit 2 home runs in 34 plate appearances, or one every 17. If that represents his true production, then he’ll need about 5*17 = 85 plate appearances to hit five more home runs.
Since his return, Thome has averaged 2.8 plate appearances per game he played in, but he’s had two nights off. Per team game, that works out to 2.4 plate appearances. That means, roughly, he’ll need about 85/2.4 = 35.4 team games to hit those 5 home runs, or, to round it up, he’ll probably hit his 600th 35 games from now. That 35th game is team game #124, at home against the Yankees on August 18th. If he maintains his 2.4 plate appearances per team game and he produces at his career rate (every 13.5 plate appearances), he’ll need about 68 plate appearances, or 28 games and change. The 29th game is on Friday, August 12, in Cleveland. (Wouldn’t that be sweet for Thome?) If he continues hitting ever 21 1/3 plate appearances, that means he’ll need about 107 plate appearances, or about 44 games and change. The 45th game is August 27, at home against Detroit.
It’ll become easier to nail down, but there’s about a ten-day window where I’d lay my odds for Thome to hit #600. If I had to narrow it down to a week, I’d shoot for the six-game series that starts on the road at Detroit on August 15 and ends at home against the Yankees on August 21. That accounts for Thome’s depressed home run production but doesn’t penalize him for playing hurt the way that assuming his pre-injury rate would.
Quickie: Halladay’s All-Star No-Hit Bid July 13, 2011
Posted by tomflesher in Baseball.add a comment
The All-Star Game is managed strangely. That’s a given. It’s the only place where Roy Halladay could start a game by retiring six consecutive batters and then be relieved by Cliff Lee. It’s the only place where you get a single inning pitched by Jered Weaver, no hits (one walk), and immediate relief in the second from David Robertson. (Robertson faced the minimum, but allowed a hit to Lance Berkman. Berk was then caught attempting to steal second.) Lee also pitched a no-hit inning before running into trouble in the third, requiring a call to the bullpen for the eventual winning pitcher, Tyler Clippard.
There has never been an All-Star Game where both starting pitchers were lifted with no hits. Since 1994, the norm has been to allow the starter to pitch no more than two innings. (Greg Maddux in 1994, Dwight Gooden in 1988, and Brett Saberhagen in 1987 each pitched three, but they’re the only ones since 1986 when Gooden and Roger Clemens each went three.) Even if we grant that Weaver only pitched one inning, the past three All-Star Games didn’t even feature no-hit first innings:
- 2010: David Price pitched a perfect top of the first, but Ubaldo Jimenez gave up a one-out walk and single.
- 2009: Tim Lincecum gave up a leadoff single to Ichiro Suzuki despite Halladay’s perfect bottom of the first.
- 2008: Lee pitched a perfect top of the first, but Ben Sheets gave up a one-out single to Derek Jeter.
Halladay’s batting average against this year has been .240, and his OBP against is .264. That means the probability of two perfect innings is
or odds of about 5.29 against.
Since the management of the All-Star Game is focused mostly on getting as many players in the game as possible, you can’t really fault Bruce Bochy for lifting Halladay. I have to say, though, I was pretty disappointed when Lee came out in relief to start the third.
A list of the thirteen All-Star Game pitchers prior to Halladay to be lifted after a no-hit start is here.
Home Field Advantage Again July 12, 2011
Posted by tomflesher in Baseball, Economics.Tags: attendance effects, Baseball, Giants, home field advantage, linear regression, probability, probit, statistics
add a comment
In an earlier post, I discussed the San Francisco Giants’ vaunted home field advantage and came to the conclusion that, while a home field advantage exists, it’s not related to the Giants scoring more runs at home than on the road. That was done with about 90 games’ worth of data. In order to come up with a more robust measure of home field advantage, I grabbed game-by-game data for the national league from the first half of the 2011 season and crunched some numbers.
I have two questions:
- Is there a statistically significant increase in winning probability while playing at home?
- Is that effect statistically distinct from any effect due to attendance?
- If it exists, does that effect differ from team to team? (I’ll attack this in a future post.)
Methodology: Using data with, among other things, per-game run totals, win-loss data, and attendance, I’ll run three regressions. The first will be a linear probability model of the form
where is a binary variable for playing at home, Attendance is announced attendance at the game, and AttH is listed attendance only if the team is at home and 0 if the team is on the road. Thus, I expect
so that a team on the road suffers from a larger crowd but a team at home reaps a larger benefit from a larger crowd. The linear probability model is easy to interpret, but not very rigorous and subject to some problems.
As such, I’ll also run a Probit model of the same equation to avoid problems caused by the simplicity of the linear probability model.
Finally, just as a sanity check, I’ll run the same regression, but for runs, instead of win probability. Since runs aren’t binary, I’ll use ordinary least squares, and also control for the possibility that games played in American League parks lead to higher run totals by controlling for the designated hitter:
Since runs are a factor in winning, I have the same expectations about the signs of the beta values as above.
Results:
Regression 1 (Linear Probability Model):
So, my prediction about the attendance betas was incorrect, but only because I failed to account for the squared terms. The effect from home attendance increases as we approach full attendance; the effect from road attendance decreases at about the same rate. There’s still a net positive effect.
Regression 2 (Probit Model):
Note that in both cases, there’s a statistically significant , meaning that teams are more likely to win at home, and that for large values of attendance, the Home effect outweighs the attendance effect entirely. That indicates that the attendance effect is probably spurious.
Finally, the regression on runs:
Regression 3 (Predicted Runs):
Again, with runs, there is a statistically significant effect from being at home, and a variety of possible attendance effects. For low attendance values, the Home effect is probably swamped by the negative attendance effect, but for high attendance games, the Home effect probably outweighs the attendance effect or the attendance effect becomes positive.
Again, the Home effect is statistically significant no matter which model we use, so at least in the National League, there is a noticeable home field advantage.
Padre Differential July 11, 2011
Posted by tomflesher in Baseball, Economics.Tags: Baseball, baseball-reference.com, linear regression, National League, Padre Differential, Padres, Phillies, runs allowed, runs scored, statistics
1 comment so far
I was all set to fire up the Choke Index again this year. Unfortunately, Derek Jeter foiled my plan by making his 3000th hit right on time, so I can’t get any mileage out of that. Perhaps Jim Thome will start choking around #600 – but, frankly, I hope not. Since Jeter had such a callous disregard for the World’s Worst Sports Blog’s material, I’m forced to make up a new statistic.
This actually plays into an earlier post I made, which was about home field advantage for the Giants. It started off as a very simple regression for National League teams to see if the Giants’ pattern – a negative effect on runs scored at home, no real effect from the DH – held across the league. Those results are interesting and hold with the pattern that we’ll see below – I’ll probably slice them into a later entry.
The first thing I wanted to do, though, was find team effects on runs scored. Basically, I want to know how many runs an average team of Greys will score, how many more runs they’ll score at home, how many more runs they’ll score on the road if they have a DH, and then how many more runs the Phillies, the Mets, or any other team will score above their total. I’m doing this by converting Baseball Reference’s schedules and results for each team through their last game on July 10 to a data file, adding dummy variables for each team, and then running a linear regression of runs scored by each team against dummy variables for playing at home, playing with a DH, and the team dummies. In equation form,
For technical reasons, I needed to leave a team out, and so I chose the team that had the most negative coefficient: the Padres. Basically, then, the terms represent how many runs the team scores above what the Padres would score. I call this “RAP,” for Runs Above Padres. I then ran the same equation, but rather than runs scored by the team, I estimated runs allowed by the team’s defense. That, logically enough, was called “ARAP,” for Allowed Runs Above Padres. A positive RAP means that a team scores more runs than the Padres, while a negative ARAP means the team doesn’t allow as many runs as the Padres. Finally, to pull it all together, one handy number shows how many more runs better off a team is than the Padres:
That is, the Padre Differential shows whether a team’s per-game run differential is higher or lower than the Padres’.
The table below shows each team in the National League, sorted by Padre Differential. By definition, San Diego’s Padre Differential is zero. ‘Sig95’ represents whether or not the value is statistically significant at the 95% level.
Unsurprisingly, the Phillies – the best team in baseball – have the highest Padre Differential in the league, with over 1.3 runs on average better than the Padres. Houston, in the cellar of the NL Central, is the worst team in the league and is .8 runs worse than the Padres per game. Florida and Chicago are both worse than the Padres and are both close to (Florida, 43) or below (Chicago, 37) the Padres’ 40-win total.
Don Kelly Wears The Crown Once Again July 9, 2011
Posted by tomflesher in Baseball.Tags: Brandon Inge, catching, Don Kelly, Super utility dervish, utility player
add a comment
As reigning Utility King, I didn’t think Don Kelly could do much more to impress me. It seems I was wrong. 
I’m not sure how I missed it, but about a week ago, Kelly made a fool of me for saying that he had played every position except catcher. I can’t say it much better than Samara Pearlstein did at Roar of the Tigers, though:
There was only one option left, and that option was the infinitely versatile Don Kelly. This season alone he has played every outfield position, first base, third base, DH, and pitcher– and now catcher. In previous seasons he has seen (brief) time at second base and shortstop as well.
(The photo is courtesy Samara’s ridiculously generous sharing policy. Thanks!)
… so is Brandon Inge angry about Don Kelly taking his emergency catcher job? For whatever reason, Jim Leyland ruled that out in the spring, so it looks like Kelly has taken over as the Tigers’ top utilityman.
Kelly last played shortstop for the Pirates in 2007 and second base for the Tigers in 2009. In addition to playing both sides of the battery, he’s played first, third, left, center, right, and designated hitter, and he’s been used as a pinch hitter and a pinch runner this year.
Home Field Advantage July 9, 2011
Posted by tomflesher in Baseball, Economics.Tags: Giants, home field advantage, linear regression
1 comment so far
The Mets unfortunately played a 10 PM game in San Francisco last night, so I’m short on sleep today. I do remember, though, that Gary Cohen mentioned, repeatedly, the Giants’ significant home field advantage. Even after last night’s loss at the hands of Carlos Beltran (coming from a rare blown save by Brian Wilson), the Giants have a .619 winning percentage at home (26-16) versus a .500 winning percentage on the road (24-24). Interestingly, their run differential is much worse at home – they’ve scored 205 and allowed 184 on the road for a total differential of +21, but their run differential at home is actually negative. They’ve scored 120 but allowed 135 for a differential of -15.
Some of that is due to the way walk-offs are scored – they end an inning immediately, so a scoring inning at home is cut short when the same inning on the road would continue and might lead to further scoring – but it’s still quite shocking to see that large a split. So far, the Giants have only scored 11 walk-off RBIs, compared with only 7 RBIs in the 9th inning on the road that came with the Giants ahead. So, even adding in an extra few runs wouldn’t account for the difference.
Last year, there wasn’t much of a home field effect at all. Running a very simple linear regression of runs scored against dummy variables for playing at home and playing with a DH, I estimated that
and only the intercept term, which represents (essentially) the unconditional average number of runs the Giants score, was significant.
For this year, the numbers are quite different.
with both the intercept and Home terms significant at the 95% level. It’s clear that the Giants are winning more at home, but it’s not because they’re scoring more at home.
Ichiro’s Body Armor July 8, 2011
Posted by tomflesher in Baseball.add a comment
In the previous post, I looked at Kevin Youkilis and his uncanny ability to be hit by pitches during the regular season. This time, I’d like to go in the opposite direction – who’s started the year with the fewest times hit by a pitch? That’d be Ichiro Suzuki, who’s gone 87 games (he had one night off this year) and 389 plate appearances without being plunked yet this year. Last year he was hit three times in 732 plate appearances, or about 4 in every 1000 plate appearances, for a rate of .004. Assuming his batting and league pitching haven’t changed, that means that any given streak of 389 plate appearances had a probability of occurring of
or roughly one in five. Ichiro has played eleven seasons in the US majors and made 7728 plate appearances, so it’s totally unsurprising that he’d have such a streak without being hit.
In fact, Ichiro is currently on a streak that started on July 7, 2010, of 155 games with at least one plate appearance that don’t involve being plunked. In order to make sense of this, let’s normalize his stats to per-game rather than per-plate-appearance. Over his career, Ichiro has been hit 47 times in 1675 games, meaning that on average he gets hit .028 times per game, or once every 35.6 games. Equivalently, the probability that he does not get hit is (1-.028) = .972. The likelihood of a streak of 155 games, then, is
It’s highly unlikely to occur, but assuming Ichiro is the same batter he always has been, and assuming he plays the remaining 74 games, the likelihood that he won’t be hit at all is
Using the binomial distribution, we can determine that there’s about a 26% chance he’ll be hit once in his next 74 games, and about a 27.4% chance he’ll be hit twice. After that, it drops off sharply. Finally, the probability that he’ll be hit in all 74 games remaining is 1.229e(-115), or so small as to be equivalent to zero for our purposes. (It’s about 1 behind a decimal point and 114 zeros.
Take Your Base July 7, 2011
Posted by tomflesher in Baseball, Economics.Tags: hit batsman, hit batsmen, hit by pitch, Kevin Youkilis, statistics
add a comment
As usual, Kevin Youkilis is getting hit at an alarming rate this year. A quick check of his stats from Baseball Reference shows that from 2004 to 2010, he got hit at about a 2% clip and was intentionally walked about .5% of the time. This year, he’s been hit nine times in 340 plate appearances, for about 2.6% of plate appearances ending in the phrase “Take your base.” He’s only been intentionally walked once, which isn’t out of line from his three IBBs last year. In contrast, he was “only” hit ten times last year, so he’s one away from eclipsing that mark and six away from tying his record 15 times hit (in 2007). Interestingly, Kevin has never been hit in the postseason.
It would be oversimplistic to say that guys who get hit a lot get hit because they’re jerks. There’s a plausible argument that Youkilis’ unorthodox batting stance is responsible for his high rate, and some guys just get hit more often. Crashburn Alley makes the point that getting hit is a legitimate skill, and Plunk Everyone has a truly dizzying array of information about players getting hit. My question, though, is whether it could be the case that Youkilis is hit less often in the postseason because pitchers are more careful.
In 2007, 2008, and 2009, Youkilis made a total of 123 postseason plate appearances. During that time, he was never hit, nor was he intentionally walked. His OBP was .376, compared with a .397 regular-season OBP over those years. It’s possible that he was simply slumping and not seen as a threat.
It’s also possible that Youk’s failure to get hit at a respectable 2% rate (we’d have expected about 2 1/2 plunks) was simply chance. As a quick check, assume that his regular season stats during 2007, 2008, and 2009 represent “true” information, and that the 123 plate appearances he made in the postseasons were all random draws from the same distribution. Since he was hit 43 times in 1834 plate appearances across 2007-09, his true rate would be 2.3% (closer to 2.34, but I rounded down – note that this cuts Youk a little extra slack). Then, 95% of 123-appearance distributions should have hit-by-pitch rates that fall within the window
where se is the standard error, calculated as
Thus, 95 out of 100 123-appearance runs should fall within the window
Obviously, since there can’t be a negative number of hit batsmen, zero is included in that interval. Youkilis isn’t necessarily being pitched around more effectively in the postseason – he’s just unlucky enough not to get plunked.