Mariano’s Walk-Off Beanball September 12, 2010
Posted by tomflesher in Baseball.Tags: As, David Robertson, Derek Jeter, hit batsman, hit by pitch, Jeff Francoeur, Jose Molina, Lenny DiNardo, Mariano Rivera, Nelson Cruz, odds, probability, Rangers, Yankees
add a comment
Mariano Rivera did something strange tonight: He plunked in the winning run. He hit Jeff Francoeur of the Texas Rangers to force in Nelson Cruz for the winning run in extra innings. It was his fourth hit batsman of the year and only his third loss.
A walk-off beaning requires an extraordinary set of circumstances. First of all, like all walk-off plays, it requires the home team to be at bat in the bottom of the inning. In this case, it was in extra innings rather than the bottom of the 9th. It additionally requires a tied game in the bottom of said inning. Finally, it requires the bases to be loaded when the plunking occurs.
This is all magnified by the face that Rivera does not ordinarily load the bases. Assuming his 2010 OBP against (.214) held, the probability the bases being loaded with two outs or fewer is:
Then, if that situation occurs, we still have to deal with the unlikely event of Mariano hitting a player with a pitch. Before this evening, Mo had hit three batters in 196 plate appearances, for a rate of about .0153. Thus, the probability of Mariano Rivera hitting a batter with a pitch after having loaded the bases is
That means that in 10,000 innings, we would expect that to occur about 4 times, assuming that Mariano wasn’t removed after having walked the bases (which would obviously introduce some bias).
Oddly, the last walk-off hit by pitch also involved the Yankees, albeit on the other side, way back on July 19 of 2008. That night, the A’s’ Lenny DiNardo hit Jose Molina with a pitch to force in Derek Jeter, again in extra innings. David Robertson grabbed the win that night.
Teixeira and Cano: Picking up slack? August 5, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, binomial distribution, Mark Teixeira, probability, Robinson Cano, statistics, Yankees
1 comment so far
Michael Kaye, the YES broadcaster for the Yankees, often pointed out between July 22 and August 4 that the Yankees were turning up their offense to make up for Alex Rodriguez‘s lack of home run production. That seems like it might be subject to significant confirmation bias – seeing a few guys hit home runs when you wouldn’t expect them to might lead you to believe that the team in general has increased its production. So, did the Yankees produce more home runs during A-Rod’s drought?
During the first 93 games of the season, the Yankees hit 109 home runs in 3660 plate appearances for rates of 1.17 home runs per game and .0298 home runs per plate appearance. From July 23 to August 3, they hit 17 home runs in 451 plate appearances over 12 games for rates of 1.42 home runs per game and .0377 home runs per plate appearances. Obviously those numbers are quite a bit higher than expected, but can it be due simply to chance?
Assume for the moment that the first 93 games represent the team’s true production capabilities. Then, using the binomial distribution, the likelihood of hitting at least 17 home runs in 451 plate appearances is
The cumulative probability is about .868, meaning the probability of hitting 17 or fewer home runs is .868 and the probability of hitting more than that is about .132. The probability of hitting 16 or fewer is .805, which means out of 100 strings of 451 plate appearances about 81 of them should end with 16 or fewer plate appearances. This is a perfectly reasonable number and not inherently indicative of a special performance by A-Rod’s teammates.
Kaye frequently cited Mark Teixeira and Robinson Cano as upping their games. Teixeira hit 18 home runs over the first 93 games and made 423 plate appearances for rates of .194 home runs per game and .0426 home runs per plate appearance. From July 23 to August 3, he had 5 home runs in 12 games and 54 plate appearances for rates of .417 per game and .0926. That rate of home runs per plate appearance is about 8% likely, meaning that either Teixeira did up his game considerably or he was exceptionally lucky.
Cano played 92 games up to July 21, hitting 18 home runs in 400 plate appearances for rates of .196 home runs per game and .045 per plate appearance. During A-Rod’s drought, he hit 3 home runs in 50 plate appearances over 12 games for rates of .25 and .06. That per-plate-appearance rate is about 39% likely, which means we don’t have enough evidence to reject the idea that Cano’s performance (though better than usual) is just a random fluctuation.
It will be interesting to see if Teixeira slows down as a home-run hitter now that Rodriguez’s drought is over.
Quickie: 600th Home Run for A-Rod August 4, 2010
Posted by tomflesher in Baseball.Tags: 599 home runs, 600 home runs, A-Rod, Alex Rodriguez, Choke Index
add a comment
Alex Rodriguez finally hit #600 deep to center field in Yankee Stadium on the third anniversary of his 500th home run. A-Rod hit the home run in his first plate appearance. There were 51 plate appearances since #599. He had a final Choke Index of .944, but luckily he won’t run into another milestone home run for at least a few years.
The ball landed in Monument Park, so the Yankees didn’t need to negotiate with a fan to get it back. (A security guard picked it up.) According to Michael Kaye, if the ball had landed in the stands, the Yankees would have been willing to pay for the person who caught the ball to have lunch with Alex Rodriguez and Cameron Diaz in exchange for getting the ball back, on top of an autographed baseball, hat, and bat. That opens interesting questions of valuation, much like those that came up after Doug Mientkiewicz attempted to keep the ball that he caught to make the final out in the 2004 World Series.
Is A-Rod’s Performance Different? August 3, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, Choke Index, OBP, p-value, probability, SLG, statistics, t-value, Yankees
1 comment so far
In games between milestone home runs, is Alex Rodriguez’ hitting similar to other times? (This is all a very polite way of asking, “Does A-Rod choke?”) It’s difficult to answer, because there’s so little data about those milestone home runs. A-Rod, though, has some statistically improbable results and it would be interesting to look at it a bit more closely.
Over 2008-2009, Alex played in 262 games and had 1129 plate appearances with 281 hits, 65 home runs, a triple:double ratio of 1:50, an OBP of .397, and a SLG of .553. His OBP has a margin of error of .0146, so we can be 95% confident that over those years his baseline production would be somewhere between .368 and .426 and absent any time or age effect that is the range in which A-Rod should produce for any given period.
Two recent milestone home runs come to mind as examples of Rodriguez’s reputed choking. First, the stretch between home run #499 and #500 was 8 games and 36 plate appearances. (I’m intentionally ignoring extra plate appearances on the days he hit #499 and #500.) During that time, Alex had an OBP of only .306. That’s a difference of .091 over 36 plate appearances and that performance has a standard error of about .078 when compared with his regular performance, implying a t-value of about 1.16. With 35 degrees of freedom, Texas A&M’s t Calculator gives a p-value of about .127, so this difference is marginally within the realm of chance. (The usual cutoff for significance would be .05.)
A-Rod hit his last home run on July 22. Discounting the plate appearances after his last home run, he’s played in 11 games with a paltry .255 OBP and .238 SLG over 47 plate appearances. His .255 OBP has a difference of about .142 and a standard error of about .064. That implies a t-value of about 2.21, with a p-value of about .016. That is, the probability of this difference occurring by chance is less than 2%. That gives us one result as close to significant and one as probably significant.
As a side note, A-Rod’s Choke Index continues to rise. He’s gone 48 plate appearances without a home run, and at a rate of .055 home runs per plate appearance the probability of that occurring by chance is about .066. That leaves his Choke Index at .934.
The Best Game Ever July 30, 2010
Posted by tomflesher in Baseball.Tags: 600 home runs, Alex Rodriguez, Andy Marte, Chan Ho Park, Colin Curtis, designated hitter, Frank Hermann, Gabe Kapler, Indians, Jess Todd, Joe Girardi, Joe Smith, losing DH, Marcus Thames, Mitch Talbot, Nick Swisher, position players pitching, probability, Rafael Perez, statistics, Tony Sipp, Yankees
2 comments
Two of my favorite things about baseball happened during tonight’s game between the Yankees and the Indians.
First of all, in the top of the ninth inning, corner infielder Andy Marte pitched for the Indians. Marte pitched a perfect ninth and coincidentally struck out Nick Swisher, who was brought in to pitch for the Yankees in a similar situation last year and struck out Gabe Kapler of the Tampa Bay Rays. I can’t promise it’s true, but I think that puts Swisher at the top of the list for involvement in position player pitcher strikeouts.
Marte’s presence was necessary because the Indians used seven other pitchers. Starter Mitch Talbot went only two innings, and the Indians got another two out of Rafael Perez. Frank Hermann took the loss for the Indians during his 1 1/3 innings. Tony Sipp pitched another 1 1/3, and Joe Smith managed to give up four earned runs in 1/3 of an inning before being removed for Jess Todd for an inning. In the bottom of the 9th, Marte was all the Indians had left.
Not to be outdone, Joe Girardi gave up his designated hitter by moving his DH – funnily enough, it was Swisher – into right field as part of a triple switch. Swisher moved to right field; Colin Curtis moved from right field to left field; Marcus Thames moved from left field to third base; finally, pitcher Chan Ho Park was put into the batting order in place of Alex Rodriguez, who came out of the game.
Finally, A-Rod is up to 33 plate appearances without a home run. Assuming his standard rate of .064 home runs per plate appearance, the likelihood of this happening by chance is . I stand by my belief that there’s something other than chance (i.e. distraction or other mental factors) causing Rodriguez’s hitting to suffer.
Matt Garza, Fifth No-Hitter of 2010 July 26, 2010
Posted by tomflesher in Baseball.Tags: Dallas Braden, Edwin Jackson, Matt Garza, no-hitters, Roy Halladay, Ubaldo Jimenez, Year of the Pitcher
1 comment so far
Tonight, Matt Garza pitched the fifth no-hitter of 2010. He joins Edwin Jackson, Roy Halladay, Dallas Braden, and Ubaldo Jimenez in the Year of the Pitcher club.
As I pointed out when Jackson hit his no-hitter, no-hit games are probably Poisson distributed. Let’s update the chart.
The Poisson distribution has probability density function
Maintaining our prior rate of 2.45 no-hitters per season, that means . Our function is then
The probabilities remain the same:
| n | p | cumulative |
| 0 | 0.0863 | 0.0863 |
| 1 | 0.2114 | 0.2977 |
| 2 | 0.2590 | 0.5567 |
| 3 | 0.2115 | 0.7683 |
| 4 | 0.1296 | 0.8978 |
| 5 | 0.0635 | 0.9613 |
| 6 | 0.0259 | 0.9872 |
| 7 | 0.0091 | 0.9963 |
| 8 | 0.0028 | 0.9991 |
| 9 | 0.0008 | 0.9998 |
| 10 | 0.0002 | 1.0000 |
And though the expectation (E(49)) and cumulative expectation (C(49)) remain the same, the observed values shift slightly:
| E(49) | Observed | C(49) | Total |
| 4.23 | 5 | 4.23 | 5 |
| 10.36 | 11 | 14.59 | 16 |
| 12.69 | 8 | 27.28 | 24 |
| 10.36 | 17 | 37.65 | 41 |
| 6.35 | 1 | 43.99 | 42 |
| 3.11 | 5 | 47.10 | 47 |
| 1.27 | 1 | 48.37 | 48 |
| 0.44 | 0 | 48.82 | 48 |
| 0.14 | 1 | 48.95 | 49 |
| 0.04 | 0 | 48.99 | 49 |
| 0.01 | 0 | 49.00 | 49 |
The tailing observations (say, for 4+ no-hitters) don’t quite match the expected frequencies, but the cumulative values match quite nicely. There might be some unobserved variables that explain the weirdness in the upper tail. Still, cumulatively, we have 47 seasons with 5 or fewer no-hitters, which is almost exactly what’s expected. This is unusual, but not outside the realm of statistical expectation.
600 Home Runs: Who’s Second? July 25, 2010
Posted by tomflesher in Baseball, Economics.Tags: 600 home runs, Alex Rodriguez, binomial distribution, Dodgers, home runs, Jim Thome, Manny Ramirez, quick and dirty stats, Twins
1 comment so far
Alex Rodriguez is, as I’m writing this, sitting at 599 home runs. Almost certainly, he’ll be the next player to hit the 600 home-run milestone, since the next two active players are Jim Thome at 575 and Manny Ramirez at 554. Today’s Toyota Text Poll (which runs during Yankee games on YES) asked which of those two players would reach #600 sooner.
There are a few levels of abstraction to answering this question. First of all, without looking at the players’ stats, Thome gets the nod at the first order because he’s significantly closer than Driving in 25 home runs is easier than driving in 46, so Thome will probably get there first.
At the second order, we should take a look at the players’ respective rates. Over the past two seasons, Thome has averaged a rate of .053 home runs per plate appearance, while Ramirez has averaged .041 home runs per plate appearance. With fewer home runs to hit and a higher likelihood of hitting one each time he makes it to the plate, Thome stays more likely to hit #600 before Ramirez does… but how much more likely?
Using the binomial distribution, I tested the likelihood that each player would hit his required number of home runs in different numbers of plate appearances to see where that likelihood reached a maximum. For Thome, the probability increases until 471 plate appearances, then starts decreasing, so roughly, I expect Thome to hit his 25th home run within 471 plate appearances. For Manny, that maximum doesn’t occur until 1121 plate appearances. Again, the nod has to go to Thome. He’ll probably reach the milestone in less than half as many plate appearances.
But wait. How many plate appearances is that, anyway? Until recently, Manny played 80-90% of the games in a season. Last year, he played 64%. So far the Dodgers have played 99 games and Manny appeared in 61 of them, but of course he’s disabled this year. Let’s make the generous assumption that Manny will play in 75% of the games in each season starting with this one. Then, let’s look at his average plate appearances per game. For most of his career, he averaged between 4.1 and 4.3 plate appearances per game, but this year he’s down to 3.6. Let’s make the (again, generous) assumption that he’ll get 4 plate appearances in each game from now on. At that rate, to get 1121 plate appearances, he needs to play in 280.25 games, which averages to 1.723 seasons of 162 games or about 2.62 seasons of 75% playing time.
Thome, on the other hand, has consistently played in 80% or more of his team’s games but suffered last year and this year because he hasn’t been serving as an everyday player. He pinch-hit in the National League last year and has, in Minnesota, played in about 69% of the games averaging only 3 plate appearances in each. Let’s give Jim the benefit of the doubt and assume that from here on out he’ll hit in 70% of the games and get 3.5 appearances (fewer games and fewer appearances than Ramirez). He’d need about 120.3 games, which equates to about 3/4 of a 162-game season or about 1.06 seasons with 70% playing time. Even if we downgrade Thome to 2.5 PA per game and 66% playing time, that still gives us an expectation that he’ll hit #600 within the next 1.6 real-time seasons.
Since Thome and Ramirez are the same age, there’s probably no good reason to expect one to retire before the other, and they’ll probably both be hitting as designated hitters in the AL next year. As a result, it’s very fair to expect Thome to A) reach 600 home runs and B) do it before Manny Ramirez.
Micah Owings and Cobb-Douglas Production July 22, 2010
Posted by tomflesher in Baseball, Economics.Tags: Brooks Kieschnick, Cobb-Douglas function, David Ortiz, Micah Owings, Reds, run production
1 comment so far
Micah Owings, who is one of the best two-way players in baseball since Brooks Kieschnick, was sent down to the minors by the Cincinnati Reds yesterday. As big a fan as I am of Micah (really, look at the blog), I think this was probably the right decision.
Owings was being used as a long reliever. For a big-hitting pitcher like Micah, that’s death to begin with. Relievers need to be available to pitch, so the Reds couldn’t get their money’s worth from Owings as a pinch hitter, since he wouldn’t be available to re-enter the game as a pitcher unless they used him immediately. They also weren’t getting their money’s worth as a pitcher, since, as Cincinnati.com notes, the Reds’ starting pitching was doing very well and so long relief wasn’t being used very often.
Letting Owings start in AAA will give him the best possible outcome – he’ll have regular opportunities to pitch, so he won’t rust, and he’ll get to bat at least some of the time. Owings needs to be cultivated as a batter because that’s where his comparative advantage is. I doubt he’ll ever be at the top of the rotation, but he could be a competent fifth starter. If he pitches often enough to get there, he’ll add significant value to the team in terms of his OBP above the expected pitcher. He’ll get on base more, so he’ll both advance runners and avoid making an out.
A baseball player is a factory for producing run differential. He does so using two inputs: defensive ability (pitching and fielding) and offensive ability (batting). In the National League, if a player can’t hit at all, he’s likely to produce very little in the way of run differential, but at the same time, if he’s a liability on defense, he’s not likely to be very useful either. Defense produces marginal runs by preventing opposing runs from scoring, and offense produces marginal runs by scoring runs. Having either one set to zero (in the case of a pitcher who can’t hit at all) or a negative value (an actively bad pitcher) would negatively affect the player’s run production. This is similar to a factory situation where labor and equipment are used to produce goods, and that situation is usually modeled using a Cobb-Douglas production function:
with Y = production, z = a productivity constant, K = equipment and technology, L = labor input, and is a constant between 0 and 1 that represents relatively how important the input is. K might be, for example, operating expenses for a machine to produce widgets, and L might be the wages paid to the operators of the machine. This function has the nice property that if we think both inputs are equally important (that is,
= .5) then production is maximized when the inputs are equal.
In general, production of run differential could be modeled using the same method. For example:
where P = pitching contribution, F = fielding contribution, B = batting contribution, and and
are both between 0 and 1 and would vary based on position. For example, David Ortiz is a designated hitter. His pitching ability is totally irrelevant, and so is his fielding ability outside of interleague games. The DH’s
would be 0 and his
would be very close to 0. On the other hand, an American League pitcher would have an
very close to 1 since pitcher fielding is not as important as pitching and his hitting is entirely inconsequential in the AL. Catchers would have
at 0 but
much higher than other positions.
The upshot of this method of modeling production is that it shows Owings can make up for being a less than stellar pitcher by helping his team score runs and be a considerably better investment than a pitcher with a slightly lower ERA but no run production.