BABIP as a Defensive Metric October 11, 2014Posted by tomflesher in Baseball, Economics.
Tags: BABIP, BJ Upton, models, statistics
add a comment
I follow OOTP on Facebook, and this Reddit thread about editing the Braves to go 0-162 popped up the other day.
I went into commissioner mode and basically ranked everyone’s stats to go 0-550 with 550 Ks (although when I went back, OOTP changed it to give them all a few hits and a couple of walks, etc.) I did not have to edit BJ Upton, as he was already programmed to do so.
One reply asked whether 1-BABIP is a valid defensive metric, and that got the wheels turning. (Note that for statistical purposes, summary statistics for 1-BABIP will be the same magnitude and the opposite sign as statistics for BABIP, so I went ahead and just used BABIP.)
For a quick check, I checked in at Baseball Reference to get the National League’s team-level statistics for the last 5 years, then correlated BABIP to runs allowed by the team. That correlation is .741 – that’s a pretty strong correlation. Similarly, the correlation between BABIP and team wins was about -.549. It’s a weaker and negative correlation, which is expected – negative because an added point of opposing team BABIP would mean more balls in play were falling in as hits, and weaker because it ignores the team’s offensive production entirely.
If BABIP accurately describes a team’s defensive power, then a statistical model that models team runs allowed as a function of fielding-independent pitching and pitching-independent fielding should explain a large proportion, but not all, of the runs allowed by a team, and thereby explain a significant but smaller proportion of the team’s wins.
I crunched two models to test this, each with the same functional form: Dependent Variable = a + b*FIP + c*BABIP. With Runs as the dependent variable, the R2 of the model was .8625; with Wins as the dependent variable, the R2 was .5246. Since R2 roughly describes the percent of variation explained by the model, this makes a lot of sense. In the Runs model, about 14% of runs come due to something other than home runs, walks, or hits, such as baserunning and errors; in the Wins model, about 47% of team wins are explained by something other than defense and pitching. (Say…. offense? That’s crazy.) In both models, the coefficients are statistically significant at the 99% level.
BABIP’s coefficient in the Runs model is 3444.44, which means that a batting average on balls in play of 1.000 would lead to about 3444 runs scored over a season; more realistically, if BABIP increases by .01, that would translate to about 34 runs per season. Its coefficient in the Wins model is -328.757, meaning that an increase of .01 in BABIP corresponds to about 3.29 extra losses. That’s surprisingly close to the 10 runs-1 win ratio that Bill James uses as a rule of thumb.
Since the correlations were strong, this bears a closer look at game-level rather than simply team-level data.
Mets Run Support by Starting Pitcher August 1, 2014Posted by tomflesher in Baseball.
Tags: Jacob deGrom, Mets, pitching, run support, Zack Wheeler
Yesterday’s post discussed distributional wins and losses based on the Mets’ inconsistent bunching of runs together. Since the boys didn’t play last night, I had a pretty stable dataset to work with, and the opportunity to crunch some numbers to see if the hypothesis that we’re working with is true. In addition, I took a look at each of our current starting rotation’s run support numbers and found some surprising things.
First of all, no pitcher had a statistically significant run support number than any other. Although Dillon Gee‘s run support is .77 lower than the average pitcher, for example, the p-value is .44, meaning the probablity that that’s statistically different from 0 is just about 56%. Jacob deGrom has a similar number – .796 runs below the average, but a .42 p-value. The only pitcher with a positive effect on run support is Bartolo Colon, but his p-value is a whopping .72, meaning it’s more likely than not that his number is a statistical artifact.
The runs allowed are a bit more stable – deGrom allows 1.18 runs fewer than average with a .2 p-value – but Gee, Jonathon Niese, Colon, and Zack Wheeler all have statistically 0 effect on runs allowed. Their ps are, respectively, .91, .84, .64, and .79. Basically, this means that an effect would have to be really big to show up in such a small sample size, not even all 108 games are covered in the sample.
Another way of tracking pitcher run support is to track team wins and losses in the games started by those pitchers and compare it to the team’s Pythagorean expectation in those games. This is a bit more revealing; for example, the Mets are 6-8 in starts by deGrom, but would have a Pythagorean expectation of about .568, or about 8-6, in those games. Wheeler also ends up with a Pythagorean expectation better than his record, predicting the Mets would have won 11 rather than 10 of his 22 games. The other pitchers are more or less in line with their expectations, although, like Zack, the pitchers don’t always get credit for the wins they pitched in.
Behind the cut is the table of regression results for a linear model with a dummy variable for each pitcher’s starts, plus a totally useless Away game dummy to look for home field advantage. (Surprise: There is none for the Mets, but all pitchers do allow roughly .74 more runs on the road than at home.)
What If The Mets Spread Their Runs More Evenly? July 31, 2014Posted by tomflesher in Baseball.
Tags: Mets, Runs, statistics
add a comment
The Mets have had quite a run lately – they sandwiched a 6-0 shutout loss on Tuesday between a 7-1 rout and an 11-2 dismantling of the Phillies. The whole series is a microcosm of the Mets’ season – the wildly inconsistent run production, the overuse of Josh Edgin, the disappointing start from Dillon Gee, and the totally unnecessary hit by Jeurys Familia. (Familia is 2 for 2 on the year with a 2.000 OPS.) If the Mets had spread out those 18 runs among the 3 games, there would have been a slightly different result – free baseball on Tuesday, but let’s assume the Mets would have lost the game anyway. In fact, the Mets have an average of 3.92 runs over the first 108 games of the season, and they’ve allowed an average of 3.79. If the Mets had spread out all of those runs evenly, then on average, the Mets would have won every game. (Fractional runs mess this up a little.) Of course, the Mets have been pretty wild with the runs they allow, as the graph at right suggests.
Let’s leave a little bit more to the opponents and just examine the Mets’ distribution. Above, the same graph shows the Mets’ distribution of runs. What would happen if they scored exactly 3.92 runs in every game? That would surely have taken a couple of losses off their docket, but probably earn them a couple of wins, as well. In fact, there are 15 games where the Mets scored below their average that they could have won if they’d scored over 3 runs. These losses are disproportionately spread over the Mets’ younger starting pitchers. Although Jonathan Niese, Dillon Gee, Jenrry Mejia, Rafael Montero and Daisuke Matsuzaka each started one of these games, and Bartolo Colon started two, Zack Wheeler and Jacob deGrom each started four. Those aren’t all starting pitcher losses, but Wheeler and deGrom have both had several tough losses that could have been taken away through some better run support.
On the other hand, there were 11 games the Mets won that they would have lost by scoring only 3.92 runs. Mejia,, Matsuzaka and deGrom each started one of these games, with Wheeler and Colon each starting two, but Niese is clearly the beneficiary of a lot of convenient run support here – he started four of these games that would have been losses.
After 108 games, the Mets have a 52-56 mark. Turning 11 of those wins into losses and 15 of those losses into wins means that number could be reversed – to a 56-52 mark – with more consistent run support for the starting pitchers. They have the capability to score those runs, and have definitely benefited from bunching those runs up at times, but on the whole deGrom and Wheeler would be better off, as would the entire team, with a bit more consistency.
John Baker Gets the W July 30, 2014Posted by tomflesher in Baseball.
Tags: John Baker, position players pitching, utility pitchers
add a comment
In more ways than one!
Much like Madison Bumgarner a few weeks ago, John Baker managed to be the winning pitcher and score the game-winning run for Chicago in last night’s game against the Rockies. Baker, a light-hitting backup catcher, came in from the bullpen for his first professional pitching appearance and pitched a clean 16th inning, walking 1 and striking out none on eleven pitches. Immediately after getting off the mound, Rockies left-hander Tyler Matzek walked Baker, who was then bunted over to second by utilityman Emilio Bonifacio. Arismendy Alcantara added some levity by getting plunked, Anthony Rizzo singled Baker over to third, and Starlin Castro lined a sacrifice fly to right field to bring Baker home for the win.
Welington Castillo deserves an honorable mention for catching all sixteen innings of the game. We can only hope he gets tonight’s game off.
Holy Cow, More On Ruben Tejada’s OBP July 29, 2014Posted by tomflesher in Baseball.
Tags: OBP, Ruben Tejada
1 comment so far
Last night, Ruben Tejada once again hit in the 8th batting order position. In four plate appearances, he walked once, in the bottom 8th; there’s been some discussion that Tejada’s OBP is inflated by intentional walks being thrown to get to the pitcher’s spot, though that definitely wasn’t the case here because the next player was lefty specialist Josh Edgin. As expected, Edgin was lifted for pinch hitter Bobby Abreu, who grounded into a double play. (Hmm. Maybe that was the intent. But Abreu only has 3 GIDPs on 140 plate appearances this year.)
Tejada’s stats by batting order position show some patterns. As an eighth-position hitter, Tejada has 198 plate appearances, 34 hits, 2 home runs, 32 walks, and 31 strikeouts, for a .213/.354/.288 line. In other order positions, he has 128 plate appearances, 27 hits, 0 homers, 14 walks, and 30 strikeouts, giving him a .245/.320/.275 line. Let’s assume, for the moment, that that .320 OBP line is Ruben’s true mark. That means his mark at the 8th inning should be, with 95% probability, somwhere in the range of .320 +/- .066, or somewhere between .254 and .388. Obviously, .354 is in that range. In fact, the .034 difference is about 1 standard error out, meaning there’s about a 70% chance of achieving that mark by chance alone.
In other words, it looks like there’s a statistically significant effect for Ruben batting in the 8th position. If we remove Ruben’s 9 intentional walks received in the 8th position and replace them with 2 hits and 7 outs, we’re left with a truly terrible .297 OBP, which is surprisingly even worse than his OBP while batting elsewhere, and one within one standard deviation of his .320 mark. That is, of course, a worst case scenario, assuming he wouldn’t walk at all in those 9 appearances. If he walked 3 out of 9 times, as his other stats would indicate, that would put him at a still not great .313 OBP.
Tags: extra innings, free baseball, reader questions
add a comment
Occasionally the World’s Worst Sports Blog likes to answer reader questions, which come in either by email at TheBadEconomist@gmail.com or through search engine queries. Today’s reader question: Which teams do the worst in extra innings? There are three measures we can take to see which teams are really the worst in extra innings.
The first is to look at the bare number of extra-innings losses. The Miami Marlins, with an extra-innings record of 6-9, hold that honor. That gives them an extra-innings win-loss percentage of .400, which isn’t great, but it’s well within the realm of chance. In fact, if extra-innings games really are a statistical crapshoot, then margin of error for 15 games is about .130.
There are a few teams that do worse in extra innings than Miami, assuming you ignore the number of games played. Both the Texas Rangers and the Toronto Blue Jays are 1-3 in extras for a win-loss of .250, and the Washington Nationals and Los Angeles Dodgers aren’t much better with records of 3-8 and attendant win percentages of .273. Those are still within the margin of error for such a small sample size. In fact, almost no teams are statistically better than chance in extra innings – only the Orioles, with a .786 win-loss mark in 14 games, are statistically outside the margin of error.
There are a few teams that are much worse than even their scores would lead us to expect. These are teams with really lousy pythagorean luck – that is, their runs allowed and runs scored predict they’d have a much better record than expected.
The unluckiest team so far has been the Chicago White Sox, with a Pythagorean expectation in extra-innings games of .450 and an actual win percentage of .286, for a mark of -.164. Texas and Toronto each come in at .159 and .156, respectively, with the Dodgers, the Nationals, the Reds, the Mariners, and the Cubs all coming in at -.100 or worse. The Giants are the luckiest team, with a luck number of .222.
What reader questions would you like me to address? Use the form below to make a request!
Tags: Bartolo Colon, Mets
add a comment
Bartolo Colon‘s previous start gave a solid 6 2/3 innings of perfect baseball before Robinson Cano broke it up with a single. Though Bart had raised some concerns earlier in the year with his inconsistent performance, he’s shown he still has the capability to throw an excellent ballgame and not lose control when it gets broken up.
The Mets have a perfectly cromulent rotation – Jonathan Niese, Dillon Gee, Zack Wheeler, and Jacob deGrom are currently in the rotation, and Daisuke Matsuzaka, Dana Eveland, and Carlos Torres each have the capability to function as a swing starter – and a bullpen that is slowly becoming more reliable. Though the Mets are allowing a below-average 3.8 runs per game, they’re also scoring a below-average 3.9, indicating that the highest marginal benefit is probably to disassemble Colon for a bat or two.
Trading Colon would leave a hole in the starting rotation that could be filled with one of the bullpen arms; Eveland and Josh Edgin are both operating as lefty bullpen arms, so Eveland might be the more reasonable choice. In the alternative, a AAA starter, rather than a bullpen pitcher, might be promoted. In either case, that leaves a net zero change in the balance between bats and arms. With Wilmer Flores up from Vegas, we can avoid the unfortunate situation of Eric Campbell playing shortstop again. Wilmer may also be able to help by keeping Campbell out of defensive-replacement scenarios, allowing him to focus on pinch hitting. Alternatively, grabbing a low-budget DH player to function as a professional pinch hitter would also be an option, and allow Flores to continue to develop in Las Vegas.
Essentially, the team needs to start supporting its pitchers more consistently. Dropping Colon would eliminate some variance in run support and open up the possibility of using the extra budget room to develop more run support.
July 18, 2014: Tales of Interest July 19, 2014Posted by tomflesher in Baseball.
Tags: Mets, Tales of Interest
add a comment
- Kirk Nieuwenhuis has a .580 slugging average. Let me put that into slightly different terms for you. When Kirk walks up to the plate, assuming he doesn’t walk, he’s averaged over HALF A BASE. ASSUMING HE DOESN’T WALK. And that’s including his rough start! In 37 plate appearances since returning from Las Vegas, he’s at .656.
- Another day, another intentional walk for Ruben Tejada. Ruben’s OBP is .358, and in the 8th position (usually with the pitcher behind him) it jumps to a filthy .375. Yeah, it’s a bit inflated, but even if you removed his ten intentional walks from the season entirely, you still end up with 92 times on base and 287 plate appearances for a .320 OBP. The median OBP for qualified shortstops is .317; I never would have guessed Tejada for an above-average batter. Yeah, yeah, he’s got the pitcher behind him. He’s also costing us less than $4,000 per plate appearance (and falling).
- Bobby Abreu‘s OBP, meanwhile, is .377. I’m so glad we have a credible threat off the bench. The man’s even got a bunch of doubles, which would be triples if Kirk were hitting them.
- Lucas Duda (.482), Curtis Granderson (.422), David Wright (.416) and Daniel Murphy (.408) are qualified and have SLG above .400. On the other hand, since coming back from the disabled list, Juan Lagares hasn’t walked at all in 63 plate appearances. Last night, Juan was 1 for 4 with 2 RBIs.
- Since moving to relief, Jenrry Mejia has a 2.25 ERA, including his two blown saves. That’s 2.95 in save situations, but 0.69 in successfully converted saves. When it rains, it pours.
Oh, Madison, You’ll Make Fools Of Us All July 14, 2014Posted by tomflesher in Baseball.
Tags: Jacob deGrom, Madison Bumgarner, Pitchers batting
1 comment so far
Just the other day, I said that pitchers don’t reliably hit well enough to consistently earn themselves cheap wins, and then Madison Bumgarner goes and hits a go-ahead (and game-winning) grand slam. Jacob deGrom hit an RBI of his own, but it was as part of a 9-1 Mets rout of the Marlins. Bumgarner actually earned his win (a cheap one, at that) by hitting the go-ahead RBI. Sickeningly, he did the same thing back in April.
Interestingly, Travis Wood is another pitcher who has twice this year had at least as many RBIs as the margin of victory for his team – once in April, once in May, and once in June – although in one case the save was blown. Dan Haren and Edinson Volquez each have two games as well, although Volquez only nabbed one win. A handful of other pitchers have at least one RBI in one-run games as well.
So, Madison, mea culpa. I’m sorry I ever doubted you.
Quality Starts and Differential Luck July 12, 2014Posted by tomflesher in Baseball, Economics.
Tags: quality starts, Zack Wheeler
add a comment
On July 11, Zack Wheeler gave the Mets a quality start by either definition – he pitched 6 2/3 innings and allowed only one run for a game score of 64. The Mets managed to convert it into a win, which they’ve managed to do in 27 of their 46 wins thus far this year. Zack’s made 12 quality starts this year (by the sabermetric definition of a game score of 50 or more), but the Mets have managed to convert only 5 of them into Ws for Zack; the team is 7-5 in those games, while Zack himself is 5-2. That’s a far cry from the Giants’ freakish Tim Lincecum (9-0 in 12 quality starts) and the Angels’ Garrett Richards (10-0 in 15 quality starts). (The whole list of pitchers with quality starts so far is here.)
That got me thinking – which teams do the best at converting quality starts into wins? Which teams are the worst? What’s the relationship? I grabbed all of these numbers and put them together into a spreadsheet in order to play with them.
First, a quick review of terms: A cheap win is a pitcher win in a non-quality start. A tough loss is a pitcher loss in a quality start. “Luck” is whatever I happen to be measuring at the moment, but today ‘luck differential’ refers to the difference between the percentage of wins that are cheap and the percentage of losses that are tough; in other words, luck differential = 100*[(CW/W) - (TL/L)]. For an individual pitcher, these are fairly random occurrences – no pitcher in MLB today hits reliably enough to consistently earn himself cheap wins – but it seems that aggregating by team allows for the quality of batting to smooth out over a large number of games.
The Texas Rangers lead the league in this sort of luck differential, with 4 of their 38 wins coming cheaply for over 10% cheap wins but only 2 of their 55 losses tough (3.64); the Atlanta Braves have the worst luck differential in the league with a high proportion of tough losses (17/42, or 39.53%) and a low number of cheap wins (3/50, or 6%) for a total of -33.53. The Mets themselves convert less than 50% of their quality starts into wins for the starting pitcher.
These numbers are indicative of a general trend. The more quality starts a team has, the more negative its luck differential is (ρ = -.72 – an extremely strong correlation) and the more wins a team has, the more negative its luck differential is (ρ = -.20 – a bit weaker). Essentially, teams with more quality starts generate more wins (ρ = .56), regardless of the fact that sometimes they lose those quality starts, too. Surprisingly, the Mets have a -21.67 luck differential, one of the most negative in the league, probably due to the fact that they convert so few quality starts into wins.