Kirk’s Big Spring March 20, 2015Posted by tomflesher in Baseball, Economics.
Tags: BABIP, KBB, Kirk Nieuwenhuis, Spring training
add a comment
Kirk Nieuwenhuis is having an incredible spring. All the usual caveats are in play – it’s spring training, so the stats are useless – but Kirk’s production has been exceptional. His slash line is .469/.553/.625 on 38 plate appearances. Let’s hit some sanity checks on Kirk’s production.
First of all, his BAbip is off the charts. This spring, Kirk’s batting average on balls in play is .536, which is ridiculously high. Kirk won’t be able to maintain that into the season. If he’s still got a .536 OBP by the trade deadline, I’ll eat my hat and post the video. Kirk’s BAbip has been pretty streaky, though. During his rough April, Kirk had a .300 BAbip, about the league average over the season; after coming back up in late June, he had a .377 BAbip over the remainder of the season, broken up as .625 over five June games with 11 at-bats, .267 over 28 at-bats in July, .400 over 23 August at-bats, and .348 over 32 at-bats in September.
From 2012 to 2013, Kirk’s BAbip dropped from about .379 to .246, and then shot back up to .370 in 2014. Using those numbers and taking first differences, then using the ratio of differences, that means we’d expect Kirk’s BAbip to drop to about .254 this season. Nonetheless, Kirk’s platoon splits are huge – against right-handed pitchers, from 2014, he’s got a .040/.050/.283 split (although he only made 9 at-bats and 10 plate appearances against left-handed pitchers). Though Kirk’s splits aren’t readily available, it’s possible that his big spring is residual of facing mostly right-handers.
In the spring, Kirk’s BAbip denominator (AB – HR – K – SF) is 28 and the numerator (H – HR) is 15. If we take Kirk’s previous-year .377 BAbip, over 28 trials we’d expect 15 or more successes to occur about 2.86% of the time. That’s just barely within the bounds of statistical significance (which would indicate we’d expect Kirk to hit between 6 and 15 times about 95% of the time), and well outside if we assume Kirk has a true mean of .254 (which would put our confidence interval at around 3-11 successes in 28 trials).
Second, take a look at Kirk’s K/BB ratio. Kirk has typically had a strikeout-to-walk ratio above 1; in 2013, he struck out about 2.67 times for every time he walked, and in 2014 it was about 2.44 strikeouts per walk. Over this small spring sample size, Kirk’s K/BB has actually dipped below 1, at 4/6 (or .667). Assuming Kirk walked 6 times anyway, using a conservative 2:1 K/BB ratio would turn 8 of Kirk’s hits into strikeouts. That would make Kirk’s BAbip tighten up to .350. Still strong, but not the obscene .536 we’ve seen. Even if we convert one walk to a strikeout and maintain a 2 K/BB, that would leave Kirk at .409, a very respectable spring.
Kirk’s numbers have been shocking, and of course he’s out of options, so he’s extremely likely to make the team. As a left-handed bat, he’d be a strong everyday player if the outfield weren’t so crowded, but with Michael Cuddyer and Juan Lagares in the mix already along with lefties Curtis Granderson and Matt den Dekker, it’s going to be tough to find Kirk a clean platoon spot.
What is BAbip? March 16, 2015Posted by tomflesher in Baseball.
Tags: BABIP, evergreen
add a comment
The first stat we all learned about as kids was the batting average, where you calculate what proportion of at-bats end with getting a hit. Then, of course, we start thinking about why there are weird exceptions – why doesn’t getting hit by a pitch count? Why don’t walks count? Why doesn’t advancing to first on catcher’s interference count? OBP, or on-base percentage, fixes that. (Well, maybe not the catcher’s interference part…)
Batting average has some interesting properties, though. It captures events that have unpredictable outcomes – when you walk, it’s basically impossible to be put out on your way to first. Ditto being hit by a pitch. Of course, BA does have some of those determined outcomes, too – home runs and strikeouts don’t have much dynamic nature to them, although you’ll occasionally see brilliant defense save a sure homer (a la Carl Crawford’s MVP performance in the or a sloppy catcher mishandle a third strike and forget to tag the batter. (I’m looking at you, Josh Paul.) Nonetheless, balls in play – balls that the batter makes contact with, forcing the defense to try to make a play – are a major source of variation in the game.
BAbip is measured as , meaning it takes the strikeouts and home runs out of the equation and (like all sane measures should!) includes sacrifice flies.
Since the ball is out of the pitcher’s control as soon as it leaves his hand, BAbip measures things that the pitcher isn’t responsible for – that is, it’s handy as a measure of pitching luck, or, teamwide, as a measure of defensive effectiveness. The NL team BAbip average was .299, and AL average BAbip was about .298.
Use Cases for BAbip:
– Evaluating hitting development. If a batter has had a stable BAbip for a while and his BAbip increases significantly, be suspicious! Particularly if his walk rate hasn’t increased, his home run rate hasn’t increased, and his strikeout rate hasn’t decreased, this might be a function of lucky hitting against bad or inefficient defenses. If the biggest part of an increase in production has been on balls in play, your hitter may not have actually improved. On the other hand, if you can see physical changes, or you have an explanation (e.g., went to AAA to work on his swing), you may see a more balanced improvement in OBP.
- Evaluating pitching luck. Most of the time, all the pitchers for the same team pitch in front of the same defense. Even with a personal catcher in the mix, expect most pitchers on a team to have similar batting averages on balls in play. If you have one pitcher whose BAbip is much higher than the rest of the pitchers, he may be pitching against bad luck. With that in mind, you can expect that pitcher to improve going forward.
- Comparing defenses. In 2014, Oakland had a .274 BAbip and allowed 572 runs – the best in the American league in BAbip and 18 runs behind Seattle – while Minnesota had a .317 BAbip and allowed 777 runs, the worst in both categories in the league. Defensive efficiency (a measure of 1 – BAbip) tracks closely with runs allowed. BAbip can operate as a quick and dirty check on how well a defense is performing behind a pitcher.
Spitballing: Pi Day March 14, 2015Posted by tomflesher in Baseball.
Tags: Pi Day
add a comment
Happy Pi Day! In honor of Pi Day, I’d like to share a few leaderboards.
First, league median BAbip was .297. Here are four pitchers who got a little less lucky than the average, since their BAbip was .314:
Then, let’s follow up with the other side: the unlucky hitters who would only get on base 3.14 out of every 10 plate appearances:
|Alejandro De Aza||8||.314||2014||TOT||*78/HD9|
In addition, let’s take a look at the two hitters from 2014 who hit the ball 3.14 out of every 10 at-bats:
Yeah, Robbie Cano managed to hit Pi in 2013 AND 2014. The boy must love his geometry.
Our last mention: This year’s Pi Day mascot is Stephen Strasburg, who had an ERA of 3.14. The league-average ERA for pitchers who started 60% of their games was 3.86, so Stephen was in pretty good shape.
Spring Training: Still Useless For Predicting Stats March 12, 2015Posted by tomflesher in Baseball.
Tags: Spring training
add a comment
A few days ago, I watched a Mets-Marlins spring training game that ended in a brutal 13-2 loss. It had all of the usual spring training fun – Zack Wheeler working too far inside and hitting two batters, Michael Cuddyer starting at first with Lucas Duda out, and Don Kelly’s hustle allowing him to draw a walk, steal a base, and score on a single, even while Cliff Floyd was snickering about how Jim Leyland kept him on the roster for no apparent reason in the playoffs.
(Yeah, I know, Kelly’s a Marlin. Shut up.)
During the game, I tweeted out a link to a file-drawer post from last year that indicated that there’s almost no correlation between spring performance and regular-season performance. I thought I’d run a quick update on that, so I dug up the Mets’ individual performance in spring training and analyze it compared to the regular season.
There were 15 Mets who had 30 plate appearances in Spring Training and 100 plate appearances in the regular season. That’s a really small sample, so accuracywise we’d better keep our fingers crossed, but it’s enough data to spitball a little.
I ran four correlations on this – spring and regular season batting average, OBP, SLG, and OPS – and then created an additional stat to measure whether hitters changed hitting style from spring to the regular season. This was a quick and dirty attempt to measure whether hitters favored OBP or SLG, so I took the ratio (SLG/OPS) and reasoned that a power hitter will have a larger ratio and a singles hitter will have a smaller. I measured this correlation, too, to determine if there were big changes.
The results are unsurprising – the correlations are really low. Batting average correlates at around .019, and SLG at .305. OBP actually had a negative correlation, indicating that a high spring OBP may be a bad sign for the regular season. This is probably sampling error, due to the tiny number of observations, due almost entirely to Anthony Recker’s magical .426 spring and average regular season. That was about a -.25 correlation, which explains why OPS has a -.05 (near-zero) correlation – that big flip in OBP is going to offset the OPS correlation, too.
The strongest correlation was style – at about .619, it’s a pretty good indicator that if a hitter’s SLG is how he scores, he’ll maintain that hitting style throughout the season.
What is OPS? January 12, 2015Posted by tomflesher in Baseball.
Tags: evergreen, OBP, OPS, SLG, statistics
Sabermetricians (which is what baseball stat-heads call ourselves to feel important) disregard batting average in favor of on-base percentage for a few reasons. The main one is that it really doesn’t matter to us whether a batter gets to first base through a gutsy drag bunt, an excuse-me grounder, a bloop single, a liner into the outfield, or a walk. In fact, we don’t even care if the batter got there through a judicious lean-in to take one for the team by accepting a hit-by-pitch. Batting average counts some of these trips to first, but not a base on balls or a hit batsman. It’s evident that plate discipline is a skill that results in higher returns for the team, and there’s a colorable argument that ability to be hit by a pitch is a skill. OBP is .
We also care a lot about how productive a batter is, and a productive batter is one who can clear the bases or advance without trouble. Sure, a plucky baserunner will swipe second base and score from second, or go first to third on a deep single. In an emergency, a light-hitting pitcher will just bunt him over. However, all of these involve an increased probability of an out, while a guy who can just hit a double, or a speedster who takes that double and turns it into a triple, will save his team a lot of trouble. Obviously, a guy who snags four bases by hitting a home run makes life a lot easier for his teammates. Slugging percentage measures how many bases, on average a player is worth every time he steps up to the plate and doesn’t walk or get hit by a pitch. Slugging percentage is . If a player hits a home run in every at-bat, he’ll have an OBP of 1.000 and a SLG of 4.000.
OPS is just On-Base Percentage plus Slugging Percentage. It doesn’t lend itself to a useful interpretation – OPS isn’t, for example, the average number of bases per hit, or anything useful like that. It does, however, provide a quick and dirty way to compare different sorts of hitters. A runner who moves quickly may have a low OBP but a high SLG due to his ability to leg out an extra base and turn a single into a double or a double into a triple. A slow-moving runner who can only move station to station but who walks reliably will have a low SLG (unless he’s a home-run hitter) but a high OBP. An OPS of 1.000 or more is a difficult measure to meet, but it’s a reliable indicator of quality.
The Hall of Fame Black Ink Test January 11, 2015Posted by tomflesher in Baseball.
Tags: Black Ink, evergreen, Hall of Fame
1 comment so far
The Baseball Hall of Fame‘s mission is “Preserving History, Honoring Excellence, Connecting Generations.” An important measure of the excellence honored in Cooperstown is called the Black Ink Test. “Black ink” refers to the boldface type used to show the league’s leader in an important category.
The categories used for the Black Ink Test are, of course, different for pitchers and batters, but they also vary depending on the importance of the stat. A batter who excels in hitting home runs is more valuable to a team than one who takes the most at-bats regardless of outcome. For batters, points are awarded as follows:
- One point for games, at-bats, or triples
- Two points for doubles, walks, or stolen bases
- Three points for runs scored, hits, or slugging percentage
- Four points for home runs, RBIs, or batting average
- One point for appearances, starts, or shutouts
- Two points for complete games, lowest Walks/9, or lowest Hits/9
- Three points for innings pitched, saves, or win-loss percentage
- Four points for wins, ERA, or strikeouts
That means that there are 30 black-ink points per year for batters and 30 for pitchers. (Multiple black-ink points can be awarded; for example, this year, at least 10 pitchers started 34 games in the National League, each of whom earns 1 point.) However, while it’s conceivable that a single batter could monopolize most of the categories, it’s not likely that a pitcher could – appearances and saves will go to a reliever, while most of the categories will go to a starter.
Because black ink requires a player lead his league, it’s hard to come by – and when there are more teams in a league, even the best players may not lead the league. One notable example of the bias toward older players is Ross Barnes, who was active for nine seasons from 1871 to 1881. (He didn’t play in 1878 or 1880.) Although Ross isn’t eligible for the Hall because he didn’t play ten seasons, he amassed an astonishing 60 points of black ink in the National Association by the age of 31. Since the National Association was only 9 teams, he competed against around 115 other batters for those points. During the 2014 season, the same 30 points of black ink were spread over 672 National League batters. Though Ross was truly an outstanding player, leading the league in nearly every category in 1873 and 1876, it was a lot easier to get those points then.
As of today, the batters with the most black ink not to be elected to the Hall of Fame are Barry Bonds (69), Pete Rose (68), and Alex Rodriguez (64). A-Rod and Rose, of course, aren’t eligible (A-Rod is still active). New Hall of Famer Craig Biggio had 17 and mediocre, forgettable middle-infielder Derek Jeter comes in at a whopping 10.
The pitchers with the most black ink not to be elected are Roger Clemens (100), Roy Halladay (48), Bucky Walters (48), and Justin Verlander (46). Verlander is still active and Halladay retired too recently to be elected, but Walters is truly a baffling case. New Hall of Famers this year were Randy Johnson (99), Pedro Martinez (58), and John Smoltz (34).
The Spectrum Club: 2014 Edition January 1, 2015Posted by tomflesher in Baseball.
Tags: Spectrum Club
add a comment
2013 and 2014 were unusually large Spectrum Clubs. The prestigious1 Spectrum Club consists of players who played as designated hitter and also pitched for their teams. Though there surely are a couple of people caught in this table who were primarily pitchers and just came in listed as a DH on the batting order, 2013 shows the largest Spectrum Club since the introduction of the designated hitter and 2014 following closely behind. The list of all Spectrum Club members is here.
This year inducted nine brand-new members. Although Mitch Maier and Darnell McDonald repeated from 2010 to 2011, everyone this year was a first-time pitcher/DH. As usual, though, they were all primarily position players.
This year’s inductees are:
Congratulations to this year’s inductees!
1 Not a guarantee.
BABIP as a Defensive Metric October 11, 2014Posted by tomflesher in Baseball, Economics.
Tags: BABIP, BJ Upton, models, statistics
add a comment
I follow OOTP on Facebook, and this Reddit thread about editing the Braves to go 0-162 popped up the other day.
I went into commissioner mode and basically ranked everyone’s stats to go 0-550 with 550 Ks (although when I went back, OOTP changed it to give them all a few hits and a couple of walks, etc.) I did not have to edit BJ Upton, as he was already programmed to do so.
One reply asked whether 1-BABIP is a valid defensive metric, and that got the wheels turning. (Note that for statistical purposes, summary statistics for 1-BABIP will be the same magnitude and the opposite sign as statistics for BABIP, so I went ahead and just used BABIP.)
For a quick check, I checked in at Baseball Reference to get the National League’s team-level statistics for the last 5 years, then correlated BABIP to runs allowed by the team. That correlation is .741 – that’s a pretty strong correlation. Similarly, the correlation between BABIP and team wins was about -.549. It’s a weaker and negative correlation, which is expected – negative because an added point of opposing team BABIP would mean more balls in play were falling in as hits, and weaker because it ignores the team’s offensive production entirely.
If BABIP accurately describes a team’s defensive power, then a statistical model that models team runs allowed as a function of fielding-independent pitching and pitching-independent fielding should explain a large proportion, but not all, of the runs allowed by a team, and thereby explain a significant but smaller proportion of the team’s wins.
I crunched two models to test this, each with the same functional form: Dependent Variable = a + b*FIP + c*BABIP. With Runs as the dependent variable, the R2 of the model was .8625; with Wins as the dependent variable, the R2 was .5246. Since R2 roughly describes the percent of variation explained by the model, this makes a lot of sense. In the Runs model, about 14% of runs come due to something other than home runs, walks, or hits, such as baserunning and errors; in the Wins model, about 47% of team wins are explained by something other than defense and pitching. (Say…. offense? That’s crazy.) In both models, the coefficients are statistically significant at the 99% level.
BABIP’s coefficient in the Runs model is 3444.44, which means that a batting average on balls in play of 1.000 would lead to about 3444 runs scored over a season; more realistically, if BABIP increases by .01, that would translate to about 34 runs per season. Its coefficient in the Wins model is -328.757, meaning that an increase of .01 in BABIP corresponds to about 3.29 extra losses. That’s surprisingly close to the 10 runs-1 win ratio that Bill James uses as a rule of thumb.
Since the correlations were strong, this bears a closer look at game-level rather than simply team-level data.
Mets Run Support by Starting Pitcher August 1, 2014Posted by tomflesher in Baseball.
Tags: Jacob deGrom, Mets, pitching, run support, Zack Wheeler
Yesterday’s post discussed distributional wins and losses based on the Mets’ inconsistent bunching of runs together. Since the boys didn’t play last night, I had a pretty stable dataset to work with, and the opportunity to crunch some numbers to see if the hypothesis that we’re working with is true. In addition, I took a look at each of our current starting rotation’s run support numbers and found some surprising things.
First of all, no pitcher had a statistically significant run support number than any other. Although Dillon Gee‘s run support is .77 lower than the average pitcher, for example, the p-value is .44, meaning the probablity that that’s statistically different from 0 is just about 56%. Jacob deGrom has a similar number – .796 runs below the average, but a .42 p-value. The only pitcher with a positive effect on run support is Bartolo Colon, but his p-value is a whopping .72, meaning it’s more likely than not that his number is a statistical artifact.
The runs allowed are a bit more stable – deGrom allows 1.18 runs fewer than average with a .2 p-value – but Gee, Jonathon Niese, Colon, and Zack Wheeler all have statistically 0 effect on runs allowed. Their ps are, respectively, .91, .84, .64, and .79. Basically, this means that an effect would have to be really big to show up in such a small sample size, not even all 108 games are covered in the sample.
Another way of tracking pitcher run support is to track team wins and losses in the games started by those pitchers and compare it to the team’s Pythagorean expectation in those games. This is a bit more revealing; for example, the Mets are 6-8 in starts by deGrom, but would have a Pythagorean expectation of about .568, or about 8-6, in those games. Wheeler also ends up with a Pythagorean expectation better than his record, predicting the Mets would have won 11 rather than 10 of his 22 games. The other pitchers are more or less in line with their expectations, although, like Zack, the pitchers don’t always get credit for the wins they pitched in.
Behind the cut is the table of regression results for a linear model with a dummy variable for each pitcher’s starts, plus a totally useless Away game dummy to look for home field advantage. (Surprise: There is none for the Mets, but all pitchers do allow roughly .74 more runs on the road than at home.)
What If The Mets Spread Their Runs More Evenly? July 31, 2014Posted by tomflesher in Baseball.
Tags: Mets, Runs, statistics
add a comment
The Mets have had quite a run lately – they sandwiched a 6-0 shutout loss on Tuesday between a 7-1 rout and an 11-2 dismantling of the Phillies. The whole series is a microcosm of the Mets’ season – the wildly inconsistent run production, the overuse of Josh Edgin, the disappointing start from Dillon Gee, and the totally unnecessary hit by Jeurys Familia. (Familia is 2 for 2 on the year with a 2.000 OPS.) If the Mets had spread out those 18 runs among the 3 games, there would have been a slightly different result – free baseball on Tuesday, but let’s assume the Mets would have lost the game anyway. In fact, the Mets have an average of 3.92 runs over the first 108 games of the season, and they’ve allowed an average of 3.79. If the Mets had spread out all of those runs evenly, then on average, the Mets would have won every game. (Fractional runs mess this up a little.) Of course, the Mets have been pretty wild with the runs they allow, as the graph at right suggests.
Let’s leave a little bit more to the opponents and just examine the Mets’ distribution. Above, the same graph shows the Mets’ distribution of runs. What would happen if they scored exactly 3.92 runs in every game? That would surely have taken a couple of losses off their docket, but probably earn them a couple of wins, as well. In fact, there are 15 games where the Mets scored below their average that they could have won if they’d scored over 3 runs. These losses are disproportionately spread over the Mets’ younger starting pitchers. Although Jonathan Niese, Dillon Gee, Jenrry Mejia, Rafael Montero and Daisuke Matsuzaka each started one of these games, and Bartolo Colon started two, Zack Wheeler and Jacob deGrom each started four. Those aren’t all starting pitcher losses, but Wheeler and deGrom have both had several tough losses that could have been taken away through some better run support.
On the other hand, there were 11 games the Mets won that they would have lost by scoring only 3.92 runs. Mejia,, Matsuzaka and deGrom each started one of these games, with Wheeler and Colon each starting two, but Niese is clearly the beneficiary of a lot of convenient run support here – he started four of these games that would have been losses.
After 108 games, the Mets have a 52-56 mark. Turning 11 of those wins into losses and 15 of those losses into wins means that number could be reversed – to a 56-52 mark – with more consistent run support for the starting pitchers. They have the capability to score those runs, and have definitely benefited from bunching those runs up at times, but on the whole deGrom and Wheeler would be better off, as would the entire team, with a bit more consistency.