jump to navigation

Kirk’s Big Spring March 20, 2015

Posted by tomflesher in Baseball, Economics.
Tags: , , ,
add a comment

Kirk Nieuwenhuis is having an incredible spring. All the usual caveats are in play – it’s spring training, so the stats are useless – but Kirk’s production has been exceptional. His slash line is .469/.553/.625 on 38 plate appearances. Let’s hit some sanity checks on Kirk’s production.

First of all, his BAbip is off the charts. This spring, Kirk’s batting average on balls in play is .536, which is ridiculously high. Kirk won’t be able to maintain that into the season. If he’s still got a .536 OBP by the trade deadline, I’ll eat my hat and post the video. Kirk’s BAbip has been pretty streaky, though. During his rough April, Kirk had a .300 BAbip, about the league average over the season; after coming back up in late June, he had a .377 BAbip over the remainder of the season, broken up as .625 over five June games with 11 at-bats, .267 over 28 at-bats in July, .400 over 23 August at-bats, and .348 over 32 at-bats in September.

From 2012 to 2013, Kirk’s BAbip dropped from about .379 to .246, and then shot back up to .370 in 2014. Using those numbers and taking first differences, then using the ratio of differences, that means we’d expect Kirk’s BAbip to drop to about .254 this season. Nonetheless, Kirk’s platoon splits are huge – against right-handed pitchers, from 2014, he’s got a .040/.050/.283 split (although he only made 9 at-bats and 10 plate appearances against left-handed pitchers). Though Kirk’s splits aren’t readily available, it’s possible that his big spring is residual of facing mostly right-handers.

In the spring, Kirk’s BAbip denominator (AB – HR – K – SF) is 28 and the numerator (H – HR) is 15. If we take Kirk’s previous-year .377 BAbip, over 28 trials we’d expect 15 or more successes to occur about 2.86% of the time. That’s just barely within the bounds of statistical significance (which would indicate we’d expect Kirk to hit between 6 and 15 times about 95% of the time), and well outside if we assume Kirk has a true mean of .254 (which would put our confidence interval at around 3-11 successes in 28 trials).

Second, take a look at Kirk’s K/BB ratio. Kirk has typically had a strikeout-to-walk ratio above 1; in 2013, he struck out about 2.67 times for every time he walked, and in 2014 it was about 2.44 strikeouts per walk. Over this small spring sample size, Kirk’s K/BB has actually dipped below 1, at 4/6 (or .667). Assuming Kirk walked 6 times anyway, using a conservative 2:1 K/BB ratio would turn 8 of Kirk’s hits into strikeouts. That would make Kirk’s BAbip tighten up to .350. Still strong, but not the obscene .536 we’ve seen. Even if we convert one walk to a strikeout and maintain a 2 K/BB, that would leave Kirk at .409, a very respectable spring.

Kirk’s numbers have been shocking, and of course he’s out of options, so he’s extremely likely to make the team. As a left-handed bat, he’d be a strong everyday player if the outfield weren’t so crowded, but with Michael Cuddyer and Juan Lagares in the mix already along with lefties Curtis Granderson and Matt den Dekker, it’s going to be tough to find Kirk a clean platoon spot.


Spring Training: Still Useless For Predicting Stats March 12, 2015

Posted by tomflesher in Baseball.
add a comment

A few days ago, I watched a Mets-Marlins spring training game that ended in a brutal 13-2 loss. It had all of the usual spring training fun – Zack Wheeler working too far inside and hitting two batters, Michael Cuddyer starting at first with Lucas Duda out, and Don Kelly’s hustle allowing him to draw a walk, steal a base, and score on a single, even while Cliff Floyd was snickering about how Jim Leyland kept him on the roster for no apparent reason in the playoffs.

(Yeah, I know, Kelly’s a Marlin. Shut up.)

During the game, I tweeted out a link to a file-drawer post from last year that indicated that there’s almost no correlation between spring performance and regular-season performance. I thought I’d run a quick update on that, so I dug up the Mets’ individual performance in spring training and analyze it compared to the regular season.

There were 15 Mets who had 30 plate appearances in Spring Training and 100 plate appearances in the regular season. That’s a really small sample, so accuracywise we’d better keep our fingers crossed, but it’s enough data to spitball a little.

I ran four correlations on this – spring and regular season batting average, OBP, SLG, and OPS – and then created an additional stat to measure whether hitters changed hitting style from spring to the regular season. This was a quick and dirty attempt to measure whether hitters favored OBP or SLG, so I took the ratio (SLG/OPS) and reasoned that a power hitter will have a larger ratio and a singles hitter will have a smaller. I measured this correlation, too, to determine if there were big changes.

The results are unsurprising – the correlations are really low. Batting average correlates at around .019, and SLG at .305. OBP actually had a negative correlation, indicating that a high spring OBP may be a bad sign for the regular season. This is probably sampling error, due to the tiny number of observations, due almost entirely to Anthony Recker’s magical .426 spring and average regular season. That was about a -.25 correlation, which explains why OPS has a -.05 (near-zero) correlation – that big flip in OBP is going to offset the OPS correlation, too.

The strongest correlation was style – at about .619, it’s a pretty good indicator that if a hitter’s SLG is how he scores, he’ll maintain that hitting style throughout the season.

From the File Drawer: Does Spring Training Predict Wins? March 18, 2014

Posted by tomflesher in Baseball.
Tags: , ,
1 comment so far


It’s an idle curiosity, and more information is never a bad thing, but first, you need to establish whether there’s actually any information being generated. It would be useful, potentially, to have a sense of at least how the first few weeks of the season might go, so I decided to crunch some numbers to see whether I could torture the data far enough to get a good predictive measure. I grabbed the spring training and regular season stats from 2012 and 2013 and started at it.

First round.

Correlation! Correlations are useful. The correlation of spring winning percentage and regular-season winning percentage? A paltry .069. That’s not even worth looking at. This is going to be harder than I thought.

Second round.

Well, maybe if we try a Pythagorean expectation, we might get something useful. Let’s try the 2 exponent…. Hm. That correlation is even worse (.063). Well, maybe the 1.82 “true” exponent will help…. .065. This isn’t going to work very well.

Third round.

Okay. This is going to involve some functional-form assumptions if we really want to go all Mythbusters on the data’s ass and figure out something that works. First, let’s validate the Pythagorean expectation by running an optimization to minimize the sum of squared errors, with runratio = Runs Allowed/Runs Scored and perc = regular season winning percentage:

> min.RSS <- function(data,B) {with(data,sum((1/(1 + runratio^B) – perc)^2))}
> result<-optimize(min.RSS, c(0,10),data=data)
> result
[1] 1.799245

[1] 0.04660422

That “$minimum” value means that the optimal value for B (the pythagorean exponent) is around 1.80 (to the nearest hundredth). The “$objective” value is the sum of squared errors. Let’s try the same thing with the Spring data:

> spring.RSS <- function(data,SprB) {with(data,sum((1/(1 + runratio.spr^SprB) – Sprperc)^2))}
> springresult<-optimize(spring.RSS, c(0,10),data=data)
> springresult
[1] 2.243336

[1] 0.1253673

Alarmingly, even with the same amount of data, the sum of squared errors is almost triple the same measure for the regular-season data. The exponent is also pretty far off. Now for some cross-over: can we set up a model where the spring run ratio yields a useful measure of regular-season win percentage? Let’s try it out:

> cross.RSS <- function(data,crossB) {with(data,sum((1/(1 + runratio.spr^crossB) – perc)^2))}
> crossresult<-optimize(cross.RSS,c(0,10),data=data)
> crossresult
[1] 0.08985465

[1] 0.3214856

> crossperc <- 1/(1 + runratio.spr^crossresult$minimum)
> cor(perc,crossperc)
[1] 0.05433157

.054, everybody! That’s the worst one yet!

Now, if anyone ever asks, go ahead and tell them that at least based on an afternoon of noodling around with R, spring training will not predict regular-season wins.

Just for the record, the correlation between the Pythagorean expectation and wins is enormous:

> pythperc<-1/(1 + runratio^result$minimum)
> cor(perc,pythperc)
[1] 0.9250366