##
How has hitting changed this year? Evidence from the first half of 2017 *August 22, 2017*

*Posted by tomflesher in Baseball.*

Tags: Baseball, binomial distribution, home runs, MLB, probability, Stuff Gary Cohen Says

trackback

Tags: Baseball, binomial distribution, home runs, MLB, probability, Stuff Gary Cohen Says

trackback

It’s no secret that MLB hitters are hitting more home runs this year. In June, USA Today’s Ted Berg called the uptick “so outrageous and so unprecedented” as to require additional examination, and he offered a “juiced” ball as a possibility (along with “juiced” players and statistical changes to players’ approaches). DJ Gallo noted a “strange ambivalence” toward the huge increase in home runs, and June set a record for the most home runs in a month. Neil Greenberg makes a convincing case that the number of homers is due to better understanding of the physics of hitting.

How big a shift are we talking about here? Well, take a look at the numbers from 2016’s first half. (That’s defined as games before the All-Star Game.) That comprises 32670 games and 101450 plate appearances. In that time period, hitters got on base at a .323 clip. About 65% of hits were singles, with 19.6% doubles, 2.09% triples, and 13.2% home runs. Home runs came in about 3.04% of plate appearances (3082 home runs in 101450 plate appearances).

Taking that rate as our prior, how different are this year’s numbers? For one, batters are getting on base only a little more – the league’s OBP is .324 – but hitting more extra-base hits every time. Only 63.7% of hits in the first year were singles, with 19.97% of hits landing as doubles, 1.78% triples, and 14.5% home runs. There were incidentally, more homers (3343) in fewer plate appeances (101269). Let’s assume for the moment that those numbers are significantly different from last year – that the statistical fluctuation isn’t due to weather, “dumb luck,” or anything else, but has to be due to some internal factor. There weren’t that many extra hits – again, OBP only increased by .001 – but the distribution of hits changed noticeably. Almost all of the “extra” hits went to the home run column, rather than more hits landing as singles or doubles.

In fact, there were more fly balls this year – the leaguewide grounder-to-flyer ratio fell from .83 in 2016 to .80 this year. That still doesn’t explain everything, though, since the percentage of fly balls that went out of the park rose from 9.2% to 10%. (Note that those are yearlong numbers, not first-half specific.) Not only are there more fly balls, but more of them are leaving the stadium as home runs. The number of fly balls on the infield has stayed steady at 12%, and although there are slightly more walks (8.6% this year versus 8.2% last year), the strikeout rate rose by about the same number (21.5% this year, 21.1% last year).

Using last year’s rate of 3082 homers per 101450 plate appearances, I simulated 100,000 seasons each consisting of 101269 plate appearances – the number of appearances made in the first half of 2017. To keep the code simple, I recorded only the number of home runs in each season. If the rates were the same, the numbers would be clustered around 3077. In fact, in those 100,000 seasons, the median and mean were both 3076, and the distribution shown above has a clear peak in that region. Note in the bottom right corner, the distribution’s tail basically disappears above 3300; in those 100,000 seasons, the most home runs recorded was 3340 – 3 fewer than this year’s numbers. In fact, the probability of having LESS than 3343 home runs is 0.9999992. If everything is the same as last year, the probability of this year’s home runs occurring simply by chance is .0000008, or roughly 8 in 10 million.

“Let’s assume for the moment that those numbers are significantly different from last year – that the statistical fluctuation isn’t due to weather, “dumb luck,” or anything else….”

OK, but what happens to your results if we do not assume this? Can we track weather/temperatures and find out if a ball travels better at certain specific temperatures (or during some other weather related event); and see if maybe more games have been played during such an event/degree this year? (hey, they keep telling me global warming is a thing, don’t ya know).

Is it worthwhile to determine whether this phenomenon is happening in every stadium or if it is only happening in some? That could possibly lead to determining certain factors are in act not contributing to the result.

Sure, in principle those are things we could track, although it would involve a more concerted data collection effort. For example, it’s probably humidity, rather than temperature, that has the biggest weather-related effect on ball travel, but weather reports for baseball games typically only include precipitation and temperature.

As far as stadium-level effects, there’s probably a two-stage strategy for evaluating the stadium and correcting for the unbalanced nature of first-half schedules. (Some teams face the Dodgers a lot before the break, and some teams face the Padres.) Controlling for the DH would be trivial (in fact, I’ve done it in much much older posts).

So, yeah, it would be possible, but it would be hard to write up in 20 minutes and 500 words.