I’m Still With 47 September 21, 2015
Posted by tomflesher in Baseball, Sports.Tags: #imwith47, gratuitous Hansel Robles, Mets, Subway Series, Yankees
add a comment
Hansel Robles took the loss last night on an ugly line – 2/3 of an inning pitched, 5 runs on 3 hits, a walk, and a wild pitch. It was a tough way to lose – the story of the game was Matt Harvey leaving after five shutout innings of one-hit baseball and, so the narrative goes, Robles coming in to crap it up. I’d like to suggest that it’s not entirely fair to throw this all on Robles.
Robles’ first batter was Jacoby Ellsbury, who reached on a throwing error by second baseman Daniel Murphy. His second batter was Brett Gardner, who reached on a fielder’s choice. Ellsbury was safe on the fielder’s choice due to a catching error by David Wright. Let’s keep track of that – although the official scorer considers Gardner to be Robles’ only earned run, Gardner should have grounded out.
At that point, Carlos Beltran hit a double, which should have been a completely innocuous hit with no one on. Brian McCann struck out – inning over, in a parallel universe where Juan Uribe hadn’t suffered an injury coming out of the game (or where Wilmer Flores or Kelly Johnson comes in to play second, rather than Murphy). Even allowing for Gardner to reach safely and Beltran to bat him home, that gets followed up by a wild pitch with Greg Bird at the plate, followed by walking Bird, and a swinging strikeout of Chase Headley. Worst case scenario, Robles gives up the tying run.
From there, it’s a totally different ballgame – Sean Gilmartin or Addison Reed comes in to at worst a tied game in the 7th, followed up by a chance for Tyler Clippard or Reed to take the eighth and Jeurys Familia closing to the strains of “Danza Kuduro” in the ninth. Don’t get me wrong – Collins has made a lot of excellent moves this season. Last night’s sixth was a comedy of (literally) errors, but a few other moves made it look like Collins had decided the game was already out of hand by the seventh.
What does a long game do to teams? April 13, 2015
Posted by tomflesher in Baseball.Tags: extra innings, file drawer, linear model, linear regression, Red Sox, Yankees
add a comment
Friday, the Red Sox took a 19-inning contest from the Yankees. Both teams have the unfortunate circumstance of finishing a game around 2:15 A.M. and having to be back on the field at 1:05 PM. Everyone, including the announcers, discussed how tired the teams would be; in particular, first baseman Mark Teixeira spent a long night on the bag to keep backup first baseman and apparent emergency pitcher Garrett Jones fresh, leading Alex Rodriguez to make his first career appearance at first base on Saturday.
Teixeira wasn’t the only player to sit out the next day – center fielders Jacoby Ellsbury and Mookie Betts, catchers Brian McCann and Sandy Leon, and most of the bullpen all sat out, among others. The Yankees called up pitcher Matt Tracy for a cup of coffee and sent Chasen Shreve down, then swapped Tracy back down to Scranton for Kyle Davies. Boston activated starter Joe Kelly from the disabled list, sending winning pitcher Steven Wright down to make room. Shreve and Wright each had solid outings, with Wright pitching five innings with 2 runs and Shreve pitching 3 1/3 scoreless.
All those moves provide some explanation for a surprising result. Interested in what the effect of these long games are, I dug up all of the games from 2014 that lasted 14 innings or more. In a quick and dirty data set, I traced the scores for each team in their next games along with the number of outs pitched and the length in minutes of the game.
I fitted two linear models and two log models: two each with the next game’s runs as the dependent variable and two each with the difference in runs (next game’s runs – long game’s runs) as the dependent variable. Each used the length of the game in minutes, the number of outs, the average runs scored by the team during 2014, and an indicator variable for the presence of a designated hitter in each game. For each dependent variable, I modeled all variables in a linear form once and the natural log of outs and the natural log of the length of the game once.
With runs scored as the dependent variable, nothing was significant. That is, no variable correlated strongly with an increase or decrease in the number of runs scored.
With a run difference model, the length of the game in minutes became marginally significant. For the linear model, extending the length of the game by one minute lowers the difference in runs by about .043 runs – that is, normalizing for the number of runs scored the previous day, extending the game by one minute lowered the runs the next day by about .043. In the semilog model, extending the game by about 1% lowered the run difference by about 14; this was offset by an extremely high intercept term. This is a very high semielasticity, and both coefficients had p-values between .01 and .015. Nothing else was even close.
With all of the usual caveats about statistical analysis, this shows that teams are actually pretty good at bouncing back from long games, either due to the fact that most of the time they’re playing the same team (so teams are equally fatigued) or due to smart roster moves. Either way, it’s a surprise.
RBIs with Two Outs July 4, 2011
Posted by tomflesher in Baseball, Economics.Tags: Boone Logan, Daniel Murphy, Hector Noesi, Jason Bay, Mets, Ramiro Pena, RBIs, Scott Hairston, statistics, Subway Series, two-out RBIs, Yankees
add a comment
Sunday’s Subway Series game between the Mets and Yankees ended with a bang – Jason Bay hit a single off Hector Noesi that brought home Scott Hairston. The tenth inning should have been over, but Ramiro Pena committed an error at shortstop that put Daniel Murphy on base for Boone Logan. Hairston’s run was unearned, but no matter – Noesi took the loss and the Mets won the game.
The final score was 3-2, and the interesting thing about the game was that all three of the Mets’ runs came with two outs. (My fiancée, Katie, suggested that this was unusual, and motivated most of the rest of this post.) In fact, so far, the Mets have had 347 RBIs (of 375 runs scored), and 147 of them have come with two outs. That’s about 42.4% of their RBIs. By contrast, only 1070 of 3274 plate appearances – 32.7% – come with two outs. (Less than a third of plate appearances come with two outs because of the double play, among other reasons.) The majority come with no men out (about 34.8%) with the remainder coming with one out. It seems like the high concentration of 2-out RBIs should be explained by the use of the sacrifice bunt, but the Mets have only had 31 sacrifice bunts this season – not nearly enough to account for the difference between 32.7% of plate appearances and 42.4% of RBIs.
Is that pattern common across baseball? So far, there have been 10,037 RBIs in Major League Baseball in the 2011 season. 3686 of them – about 36.7% – came with two outs. Excluding the Mets’ numbers, that’s 3539 out of 9690, or 36.5%. For the National League only, there were 1928 two-out RBIS of 5212 total, or 37%, with 1781 of 4865 (36.6%) of National League RBIs coming with two outs if you exclude the Mets. (Note that I’m defining ‘in the National League’ as ‘in National League parks,’ since what I’m interested in is whether the Mets’ concentration of RBIs can be partially explained by the rules requiring pitchers to bat.)
Assume that the Mets’ RBIs are drawn from the same distribution as all others’. Then, 95% of the time, I’d expect the proportion of RBIs that come with two outs to be within two standard errors of the National League’s proportion, excluding the Mets. (The ‘two standard errors’ comes from the fact that a t-distribution’s critical value for a large number of trials for 95% significance is 1.96. For less than an infinite number, two standard errors is a handy approximation.) For the Mets’ 347 RBIs, the standard error would be
Thus, 95% of the time, the Mets should be within the interval of (.366 – .052, .366+.052), or (.314, .418). Since, again, the Mets’ proportion is .424, the Mets are slightly outside the 95% confidence interval. That’s pretty close, and certainly could happen by chance, but it’s surprising nonetheless. The question then is whether this is due to some sort of strategy employed by the Mets’ management or to some sort of clutch playing ability by the Mets. Again, there’s more data to collect and crunch (as always).
Chad Billingsley’s Home Run June 6, 2011
Posted by tomflesher in Baseball.Tags: Casey Blake, Chad Billingsley, Diamondbacks, Dodgers, James Loney, Keith Osik, Matt Kemp, Nationals, Nick Swisher, Pitchers batting, position players pitching, Rays, Reds, Travis Wood, Yankees, Zach Duke
add a comment
Chad Billingsley had what was by all accounts an unremarkable start on the mound last night: 5 IP, 8 H, 4 R, all of them earned, 3 walks, 3 strikeouts, 1 HBP. Considering that the Dodgers have seven tough losses already (only the Rays and the Nationals have more), this would ordinarily be a short entry commenting on how Billingsley needs some work.
Actually, scratch that. I wouldn’t make that entry – the folks over at Mike Scioscia’s Tragic Illness would.
Billingsley managed to earn a mention last night by hitting the second home run of his career (solo in the second) and going 2 for 2 with a walk. Billingsley’s Win Probability Added (WPA) from the plate was a team-leading .215 (Matt Kemp was second with .168). Of course, he evened that out with actually subtracting WPA as a pitcher. Still, his walk in the third forced Casey Blake in for a second RBI, and his double in the fifth brought James Loney home and ultimately pulled Reds starter Travis Wood out of the game.
Oddly, Wood himself managed a three-RBI night back on May 9, as did the Diamondbacks’ Zach Duke on May 28. Like Billingsley, both of them took the win in those games.
The most stylish home runs by pitchers happen when the player doesn’t even know he’s a pitcher, though – on April 13, 2009, Nick Swisher hit a home run in the top of the fourth inning while playing first base and then was called on to pitch the bottom of the 8th in a 15-5 loss to the Rays. He’s the only player in the last 10 years to start the game as a position player, hit a home run, and pitch. Admittedly, that’s a weird set of conditions. Luckily, there’s another instance that almost fits, so I don’t feel like I’m cheating. Keith Osik didn’t start on May 20, 2000, but came in as part of a triple-switch in the top of the 8th to play third base. Osik hit a two-run homer to bring Mike Benjamin home in the bottom of the 8th, then gave up 5 earned runs on 5 hits in the top of the 9th.
Hopefully Billingsley will repeat his performance at the plate and will continue cleaning up on the mound. Last night was his first Cheap Win of the year, and he already has two Tough Losses. Not a bad showing as far as ability goes.
Teixeira’s Ability to Pick Up Slack: Re-Evaluating April 12, 2011
Posted by tomflesher in Baseball, Economics.Tags: Alex Rodriguez, binomial distribution, home runs, Mark Teixeira, Michael Kaye, Robinson Cano, Yankees
add a comment
In an earlier post, I discussed Yankees broadcaster Michael Kaye’s belief that Mark Teixeira and Robinson Cano were picking up slack during the time in which Alex Rodriguez was struggling to hit his 600th home run. I noticed that Teixeira had hit 18 home runs in 423 plate appearances during the first 93 games of the season for rates of .194 home runs per game and .0426 home runs per plate appearance. During the time between A-Rod’s #599 and #600, Teixeira’s performance was different in a statistically significant way: his production per game was up to .417 home runs per game and .0926 home runs per plate appearance.
Now, let’s take a look at the home stretch of the season. Teixeira played in 52 games, starting 51 of them, and hit 10 home runs in 230 plate appearances. That works out to .1923 home runs per game or .0435 per plate appearance. Those numbers are exceptionally similar to Teixeira’s production in the first stretch of the season, so it seems reasonable to say that those rates represent his standard rate of production.
This is prima facie evidence that Teixeira was working to hit more home runs, consciously or subconsciously, during the time that Rodriguez was struggling. The question then becomes, is there a reason to expect production to increase during the stretch between late July and early August? What if Mark was just operating better following the All-Star Break?
I chose a twelve-game stretch immediately following the All-Star Break to evaluate. This period overlaps with the drought between A-Rod’s 599th and 600th home runs, stretching from July 16 to July 28, so six games overlap and six do not. During that time, Teixeira hit 3 home runs in 56 plate appearances. His rate was therefore .0535 home runs per plate appearance.
If we assume that Teixeira’s true rate of production is about .043 home runs per plate appearance (his average over the season, excluding the drought), then the probability of his hitting exactly 3 home runs in a random 56-plate-appearance stretch is
He has a 43% chance of hitting 3 or more, compared with the complementary probability 57% probability of hitting fewer than 3. It’s well within the normal expected range. So, the All-Star Break effect is unlikely to explain Teixeira’s abnormal production last July.
Fire Up The Hot Stove November 2, 2010
Posted by tomflesher in Baseball.Tags: Aubrey Huff, Buster Posey, Cliff Lee, Giants, Rangers, Tim Lincecum, Yankees
add a comment
Although I’m usually fairly heavy on the statistical content, I can’t help but mention a few impressions from Game 5 of the World Series last night.
- If I didn’t have Baseball-Reference.com to tell me different, I’d have assumed Aubrey Huff wasn’t an everyday first baseman from the way he played last night. He was competent and made some nice picks, but he didn’t seem to have the ankle-preservation instinct that most everyday 1Bs do. He seemed to have his heels back quite far on the bag most of the time.
- The rumors about the Yankees pursuing Cliff Lee strike me as cartoonish supervillainy. “If I cannot defeat you, I will simply BUY you!”
- Game 3 was the Lee vs. Tim Lincecum gem that we all assumed Game 1 would be.
- Somewhere, Bengie Molina is secretly pouring champagne all over himself.
- If the postseason came before voting, Buster Posey would be a lock for Rookie of the
Mariano’s Walk-Off Beanball September 12, 2010
Posted by tomflesher in Baseball.Tags: As, David Robertson, Derek Jeter, hit batsman, hit by pitch, Jeff Francoeur, Jose Molina, Lenny DiNardo, Mariano Rivera, Nelson Cruz, odds, probability, Rangers, Yankees
add a comment
Mariano Rivera did something strange tonight: He plunked in the winning run. He hit Jeff Francoeur of the Texas Rangers to force in Nelson Cruz for the winning run in extra innings. It was his fourth hit batsman of the year and only his third loss.
A walk-off beaning requires an extraordinary set of circumstances. First of all, like all walk-off plays, it requires the home team to be at bat in the bottom of the inning. In this case, it was in extra innings rather than the bottom of the 9th. It additionally requires a tied game in the bottom of said inning. Finally, it requires the bases to be loaded when the plunking occurs.
This is all magnified by the face that Rivera does not ordinarily load the bases. Assuming his 2010 OBP against (.214) held, the probability the bases being loaded with two outs or fewer is:
Then, if that situation occurs, we still have to deal with the unlikely event of Mariano hitting a player with a pitch. Before this evening, Mo had hit three batters in 196 plate appearances, for a rate of about .0153. Thus, the probability of Mariano Rivera hitting a batter with a pitch after having loaded the bases is
That means that in 10,000 innings, we would expect that to occur about 4 times, assuming that Mariano wasn’t removed after having walked the bases (which would obviously introduce some bias).
Oddly, the last walk-off hit by pitch also involved the Yankees, albeit on the other side, way back on July 19 of 2008. That night, the A’s’ Lenny DiNardo hit Jose Molina with a pitch to force in Derek Jeter, again in extra innings. David Robertson grabbed the win that night.
Teixeira and Cano: Picking up slack? August 5, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, binomial distribution, Mark Teixeira, probability, Robinson Cano, statistics, Yankees
1 comment so far
Michael Kaye, the YES broadcaster for the Yankees, often pointed out between July 22 and August 4 that the Yankees were turning up their offense to make up for Alex Rodriguez‘s lack of home run production. That seems like it might be subject to significant confirmation bias – seeing a few guys hit home runs when you wouldn’t expect them to might lead you to believe that the team in general has increased its production. So, did the Yankees produce more home runs during A-Rod’s drought?
During the first 93 games of the season, the Yankees hit 109 home runs in 3660 plate appearances for rates of 1.17 home runs per game and .0298 home runs per plate appearance. From July 23 to August 3, they hit 17 home runs in 451 plate appearances over 12 games for rates of 1.42 home runs per game and .0377 home runs per plate appearances. Obviously those numbers are quite a bit higher than expected, but can it be due simply to chance?
Assume for the moment that the first 93 games represent the team’s true production capabilities. Then, using the binomial distribution, the likelihood of hitting at least 17 home runs in 451 plate appearances is
The cumulative probability is about .868, meaning the probability of hitting 17 or fewer home runs is .868 and the probability of hitting more than that is about .132. The probability of hitting 16 or fewer is .805, which means out of 100 strings of 451 plate appearances about 81 of them should end with 16 or fewer plate appearances. This is a perfectly reasonable number and not inherently indicative of a special performance by A-Rod’s teammates.
Kaye frequently cited Mark Teixeira and Robinson Cano as upping their games. Teixeira hit 18 home runs over the first 93 games and made 423 plate appearances for rates of .194 home runs per game and .0426 home runs per plate appearance. From July 23 to August 3, he had 5 home runs in 12 games and 54 plate appearances for rates of .417 per game and .0926. That rate of home runs per plate appearance is about 8% likely, meaning that either Teixeira did up his game considerably or he was exceptionally lucky.
Cano played 92 games up to July 21, hitting 18 home runs in 400 plate appearances for rates of .196 home runs per game and .045 per plate appearance. During A-Rod’s drought, he hit 3 home runs in 50 plate appearances over 12 games for rates of .25 and .06. That per-plate-appearance rate is about 39% likely, which means we don’t have enough evidence to reject the idea that Cano’s performance (though better than usual) is just a random fluctuation.
It will be interesting to see if Teixeira slows down as a home-run hitter now that Rodriguez’s drought is over.
Is A-Rod’s Performance Different? August 3, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, Choke Index, OBP, p-value, probability, SLG, statistics, t-value, Yankees
1 comment so far
In games between milestone home runs, is Alex Rodriguez’ hitting similar to other times? (This is all a very polite way of asking, “Does A-Rod choke?”) It’s difficult to answer, because there’s so little data about those milestone home runs. A-Rod, though, has some statistically improbable results and it would be interesting to look at it a bit more closely.
Over 2008-2009, Alex played in 262 games and had 1129 plate appearances with 281 hits, 65 home runs, a triple:double ratio of 1:50, an OBP of .397, and a SLG of .553. His OBP has a margin of error of .0146, so we can be 95% confident that over those years his baseline production would be somewhere between .368 and .426 and absent any time or age effect that is the range in which A-Rod should produce for any given period.
Two recent milestone home runs come to mind as examples of Rodriguez’s reputed choking. First, the stretch between home run #499 and #500 was 8 games and 36 plate appearances. (I’m intentionally ignoring extra plate appearances on the days he hit #499 and #500.) During that time, Alex had an OBP of only .306. That’s a difference of .091 over 36 plate appearances and that performance has a standard error of about .078 when compared with his regular performance, implying a t-value of about 1.16. With 35 degrees of freedom, Texas A&M’s t Calculator gives a p-value of about .127, so this difference is marginally within the realm of chance. (The usual cutoff for significance would be .05.)
A-Rod hit his last home run on July 22. Discounting the plate appearances after his last home run, he’s played in 11 games with a paltry .255 OBP and .238 SLG over 47 plate appearances. His .255 OBP has a difference of about .142 and a standard error of about .064. That implies a t-value of about 2.21, with a p-value of about .016. That is, the probability of this difference occurring by chance is less than 2%. That gives us one result as close to significant and one as probably significant.
As a side note, A-Rod’s Choke Index continues to rise. He’s gone 48 plate appearances without a home run, and at a rate of .055 home runs per plate appearance the probability of that occurring by chance is about .066. That leaves his Choke Index at .934.