What is the Corsi statistic? (And why is there a Fenwick number?) November 16, 2015

Growing up in Buffalo, I was surrounded by hockey, whether it was watching the Sabres or heading to the rink to watch my brother play defense as a bantam or high schooler. During those years, my father, who could barely skate, often served as a volunteer coach for my brother’s teams. Like Malcolm Gladwell’s story of Vivek Ranadivé leading his “little blonde girls” to success using out-of-the-box basketball coaching, my father felt he was bringing an outsider’s perspective to the game by emphasizing a simple philosophy: own the puck.

This is easier said than done, of course, and when a group of squirts, peewees, or bantams head out onto the ice they need to apply some serious skill in order to “own the puck.” Overall, though, the point of owning that puck is to put it into the net. So, logically, the more a team controls the puck, the more likely it is to control the game.

It’s possible, of course, for a team to take many more shots and still lose, but the Corsi stat is meant to measure overall control. As such, it includes all attempted shots, so Corsi, as such, is defined as Shots + Attempted Shots – Shots Against – Attempted Shots Against. This gives you a simple differential in shots.

You’ll also see the following stats:

  • Corsi For: Shots + Attempted Shots by the team, making it possible to isolate whether a team is making too few shots or allowing too many
  • Corsi Against: Shots + Attempted Shots by the opposing team
  • Corsi For Percentage (CF%): 100*Corsi For/(Corsi For + Corsi Against), giving a ratio rather than a simple differential. This measures what percentage of shots and shot attempts a team makes compared to its opponents. A CF% above 50% means a team attempts more shots than its opponent.
  • Corsi On: A team’s Corsi while a particular player is on the ice scaled up to 60 minutes of ice time, effectively measuring whether the player’s Corsi is as good as, better than, or worse than the team’s as a whole. A Corsi ON greater than the team’s means the player contributes proportionally more to the team than ice time would indicate.
  • Corsi Relative (Corsi REL): Corsi On – Corsi Off, showing whether a team performs better or worse with a player on the ice. If Corsi REL is positive, the team does a better job with the player on the ice.

Corsi was named after a Buffalo Sabres goaltending coach. Bob McKenzie of TSN shared the story of the Corsi number in 2014. Financial analyst Tim Barnes, writing under the pseudonym Vic Ferrari, heard Sabres GM Darcy Regier discussing shot attempts and save percentage as a goalie metric, but Ferrari didn’t care for the name “Regier Number” or “Ruff Number” (for Sabres coach Lindy Ruff). After browsing photos of the Sabres staff, Ferrari settled on Jim Corsi (above) as the eponym for the statistic. Interestingly, Corsi actually did come up with the idea and planted it in Regier’s head.

A similar stat, the Fenwick, simply discounts blocked shots since blocking shots is a skill.

What is BAbip? March 16, 2015

The first stat we all learned about as kids was the batting average, where you calculate what proportion of at-bats end with getting a hit. Then, of course, we start thinking about why there are weird exceptions – why doesn’t getting hit by a pitch count? Why don’t walks count? Why doesn’t advancing to first on catcher’s interference count? OBP, or on-base percentage, fixes that. (Well, maybe not the catcher’s interference part…)

Batting average has some interesting properties, though. It captures events that have unpredictable outcomes – when you walk, it’s basically impossible to be put out on your way to first. Ditto being hit by a pitch. Of course, BA does have some of those determined outcomes, too – home runs and strikeouts don’t have much dynamic nature to them, although you’ll occasionally see brilliant defense save a sure homer (a la Carl Crawford’s MVP performance in the or a sloppy catcher mishandle a third strike and forget to tag the batter. (I’m looking at you, Josh Paul.) Nonetheless, balls in play – balls that the batter makes contact with, forcing the defense to try to make a play – are a major source of variation in the game.

BAbip is measured as \frac{H - HR}{AB - SO - SH + SF}, meaning it takes the strikeouts and home runs out of the equation and (like all sane measures should!) includes sacrifice flies.

Since the ball is out of the pitcher’s control as soon as it leaves his hand, BAbip measures things that the pitcher isn’t responsible for – that is, it’s handy as a measure of pitching luck, or, teamwide, as a measure of defensive effectiveness. The NL team BAbip average was .299, and AL average BAbip was about .298.

Use Cases for BAbip:

Evaluating hitting development. If a batter has had a stable BAbip for a while and his BAbip increases significantly, be suspicious! Particularly if his walk rate hasn’t increased, his home run rate hasn’t increased, and his strikeout rate hasn’t decreased, this might be a function of lucky hitting against bad or inefficient defenses. If the biggest part of an increase in production has been on balls in play, your hitter may not have actually improved. On the other hand, if you can see physical changes, or you have an explanation (e.g., went to AAA to work on his swing), you may see a more balanced improvement in OBP.

– Evaluating pitching luck. Most of the time, all the pitchers for the same team pitch in front of the same defense. Even with a personal catcher in the mix, expect most pitchers on a team to have similar batting averages on balls in play. If you have one pitcher whose BAbip is much higher than the rest of the pitchers, he may be pitching against bad luck. With that in mind, you can expect that pitcher to improve going forward.

– Comparing defenses. In 2014, Oakland had a .274 BAbip and allowed 572 runs – the best in the American league in BAbip and 18 runs behind Seattle – while Minnesota had a .317 BAbip and allowed 777 runs, the worst in both categories in the league. Defensive efficiency (a measure of 1 – BAbip) tracks closely with runs allowed. BAbip can operate as a quick and dirty check on how well a defense is performing behind a pitcher.

What is OPS? January 12, 2015

Sabermetricians (which is what baseball stat-heads call ourselves to feel important) disregard batting average in favor of on-base percentage for a few reasons. The main one is that it really doesn’t matter to us whether a batter gets to first base through a gutsy drag bunt, an excuse-me grounder, a bloop single, a liner into the outfield, or a walk. In fact, we don’t even care if the batter got there through a judicious lean-in to take one for the team by accepting a hit-by-pitch. Batting average counts some of these trips to first, but not a base on balls or a hit batsman. It’s evident that plate discipline is a skill that results in higher returns for the team, and there’s a colorable argument that ability to be hit by a pitch is a skill. OBP is \frac{H+BB+HBP}{AB+BB+HBP+SF}.

We also care a lot about how productive a batter is, and a productive batter is one who can clear the bases or advance without trouble. Sure, a plucky baserunner will swipe second base and score from second, or go first to third on a deep single. In an emergency, a light-hitting pitcher will just bunt him over. However, all of these involve an increased probability of an out, while a guy who can just hit a double, or a speedster who takes that double and turns it into a triple, will save his team a lot of trouble. Obviously, a guy who snags four bases by hitting a home run makes life a lot easier for his teammates. Slugging percentage measures how many bases, on average a player is worth every time he steps up to the plate and doesn’t walk or get hit by a pitch. Slugging percentage is \frac{(\mathit{1B}) + (2 \times \mathit{2B}) + (3 \times \mathit{3B}) + (4 \times \mathit{HR})}{AB} = \frac{\text{Total Bases}}{AB}. If a player hits a home run in every at-bat, he’ll have an OBP of 1.000 and a SLG of 4.000.

OPS is just On-Base Percentage plus Slugging Percentage. It doesn’t lend itself to a useful interpretation – OPS isn’t, for example, the average number of bases per hit, or anything useful like that. It does, however, provide a quick and dirty way to compare different sorts of hitters. A runner who moves quickly may have a low OBP but a high SLG due to his ability to leg out an extra base and turn a single into a double or a double into a triple. A slow-moving runner who can only move station to station but who walks reliably will have a low SLG (unless he’s a home-run hitter) but a high OBP. An OPS of 1.000 or more is a difficult measure to meet, but it’s a reliable indicator of quality.

The Hall of Fame Black Ink Test January 11, 2015

The Baseball Hall of Fame‘s mission is “Preserving History, Honoring Excellence, Connecting Generations.” An important measure of the excellence honored in Cooperstown is called the Black Ink Test. “Black ink” refers to the boldface type used to show the league’s leader in an important category.

The categories used for the Black Ink Test are, of course, different for pitchers and batters, but they also vary depending on the importance of the stat. A batter who excels in hitting home runs is more valuable to a team than one who takes the most at-bats regardless of outcome. For batters, points are awarded as follows:

  1. One point for games, at-bats, or triples
  2. Two points for doubles, walks, or stolen bases
  3. Three points for runs scored, hits, or slugging percentage
  4. Four points for home runs, RBIs, or batting average

Pitchers receive:

  1. One point for appearances, starts, or shutouts
  2. Two points for complete games, lowest Walks/9, or lowest Hits/9
  3. Three points for innings pitched, saves, or win-loss percentage
  4. Four points for wins, ERA, or strikeouts

That means that there are 30 black-ink points per year for batters and 30 for pitchers. (Multiple black-ink points can be awarded; for example, this year, at least 10 pitchers started 34 games in the National League, each of whom earns 1 point.) However, while it’s conceivable that a single batter could monopolize most of the categories, it’s not likely that a pitcher could – appearances and saves will go to a reliever, while most of the categories will go to a starter.

Because black ink requires a player lead his league, it’s hard to come by – and when there are more teams in a league, even the best players may not lead the league. One notable example of the bias toward older players is Ross Barnes, who was active for nine seasons from 1871 to 1881. (He didn’t play in 1878 or 1880.) Although Ross isn’t eligible for the Hall because he didn’t play ten seasons, he amassed an astonishing 60 points of black ink in the National Association by the age of 31. Since the National Association was only 9 teams, he competed against around 115 other batters for those points. During the 2014 season, the same 30 points of black ink were spread over 672 National League batters. Though Ross was truly an outstanding player, leading the league in nearly every category in 1873 and 1876, it was a lot easier to get those points then.

As of today, the batters with the most black ink not to be elected to the Hall of Fame are Barry Bonds (69), Pete Rose (68), and Alex Rodriguez (64). A-Rod and Rose, of course, aren’t eligible (A-Rod is still active). New Hall of Famer Craig Biggio had 17 and mediocre, forgettable middle-infielder Derek Jeter comes in at a whopping 10.

The pitchers with the most black ink not to be elected are Roger Clemens (100), Roy Halladay (48), Bucky Walters (48), and Justin Verlander (46). Verlander is still active and Halladay retired too recently to be elected, but Walters is truly a baffling case. New Hall of Famers this year were Randy Johnson (99), Pedro Martinez (58), and John Smoltz (34).