Addendum on Pythagorean Expectation May 20, 2010
Posted by tomflesher in Baseball, Economics.Tags: Baseball, economics, Pythagorean expectation, statistics
1 comment so far
I noted below that the sample size of 13 games is too small to make a determination as to whether the proportions of conditions expected to predict the winning team – the home team, the team with the higher Pythagorean expectation, the team with more runs scored, and the team with the higher run differential – is significantly different from chance. If chance were the only determinant of the winner, then we would expect each proportion to be .5, since you’d expect a randomly-selected home team to win half the games, a randomly-selected team with higher run differential to win half the games, and so on.
Making the standard statistical assumptions, the margin of error using proportions is . Three of the proportions were .46, meaning that the margin of error would be
which simplifies to
. Using 12 degrees of freedom, a t-table shows that the critical value for 95% confidence is 2.18. Thus, the binomial confidence interval method, tells us we can be 95% sure that the true value of the proportion lies within the range .46 ± 2.18*.1382 = .46 ± .30 = .16 … .76. Clearly, this range is far too large to reject the conclusion that the proportion is significantly different from .5.
For the simple measure of more runs, the proportion was .31, meaning that the margin of error is or
. The 95% confidence interval around .31 is .31 ± 2.18*.1283 = .31 ± .2797 = .03 … .59. Again, .5 is included in this range.
Quickie: Dallas Braden's Perfect Game May 11, 2010
Posted by tomflesher in Baseball.Tags: Baseball, Braden's perfect game, Buehrle's perfect game, Dallas Braden, Oakland As, probability, sabermetrics, Tampa Bay Rays
add a comment
Dallas Braden of the Oakland As pitched a perfect game Sunday, on Mother’s Day. Under the methods discussed last year after Buehrle’s perfect game, Braden – who’s been active for four seasons – has an OBP-against of .328. That means he has a probability for any given plate appearance of .672 of the batter not reaching base.
Since he sat down 27 batters consecutively, the probability of that event happening is (.672)27, or .0000218; equivalently, given his current stats, a bit over 2 in every 100,000 games that Braden pitches should be perfect games.
Over the same period (2007-2010), the American League OBP has hovered between .331 (this year) and .338 (2007). .336 was the mode (2008, 2009), so I’ll use it to estimate that the chance for a perfect game facing the league average team would be (.664)27, or .0000157, or equivalently about 1.5 out of every 100,000 games should be a perfect game.As you can see, it’s more likely for Braden than the average pitcher, but not by much.
Nice job, Dallas!
As a side note, the Tampa Bay Rays were the victim of BOTH perfect games. Their team OBP was .343 in 2009, with a probability not to get on base of .657, meaning that the probability of getting 27 batters seated consecutively is about 1.2 in 100,000. Since many other teams have lower team OBPs, it’s very surprising that the Rays were the victims of both games.
Quickie: MLB Playoffs by Pitching Statistics February 23, 2010
Posted by tomflesher in Baseball.Tags: Baseball, OLS, playoffs, probit, regression
add a comment
It’s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I’m trying to stay warm by turning up my hot stove the way only an economist can – crunching the numbers on playoffs.
I’m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of parsimony. It contains dummy variables teamdivwin and teamwildcard which take value 1 if the pitcher’s team won the division or the wildcard respectively. I then created a variable playoffs which took the value of the sum of teamdivwin and teamwildcard – just a playoff dummy variable.
Using a Probit model and a standard OLS regression model, I estimated the effects of individual pitching stats on playoffs. Neither model has very strong predictive value (linear has R-squared of about .05), which is unsurprising since it doesn’t take the team’s batting into account at all. None of the coefficient values are shocking – in the American League (designated as lg = 1), teams have a higher probability of making the playoffs because there are fewer teams, and although complete games appear to have a negative effect, the positive shutout effect more than makes up for that in both models. I’m interested in whether complete game wins and complete game losses have differential effects – that will probably be my next snowy-day project.
Results are behind the cut.
Cy Young gives me a headache. January 15, 2010
Posted by tomflesher in Baseball, Economics.Tags: Baseball, baseball-reference.com, Bill James, Cy Young predictor, economics, Eric Gagne, linear regression, R, Rob Neyer, sabermetrics, Tim Lincecum, Weighted saves, Weighted shutouts
add a comment
As usual, I’ve started my yearly struggle against a Cy Young predictor. Bill James and Rob Neyer’s predictor (which I’ve preserved for posterity here) did a pretty poor job this year, having predicted the wrong winner in both leagues and even getting the order very wrong compared to the actual results. Inside, I’d like to share some of my pain, since I can’t seem to do much better.
Sabernomics on A-Rod and Steroid Use February 11, 2009
Posted by tomflesher in Baseball.Tags: Alex Rodriguez, Baseball, Sabernomics, steroids
add a comment
At Sabernomics, JC Bradbury crunches some numbers on home run numbers for Alex Rodriguez during the seasons in which he admits steroid use:
So, what were A-Rod’s steroids worth? 2.37 home runs over two seasons, or a little over one home run a season. At least, that is the estimate based on the method I laid out above; however, it’s probably best to say that there was no observed effect.
In the comments section, Bradbury crunches the walk numbers to control for the possibility that a more powerful A-Rod was less selective at the plate and, again, finds no observable effect. There are some moderately outlandish hypotheses that could account for this, such as the league’s pitchers cycling steroids coincident with Rodriguez, so that a roided-up A-Rod would hit against roided-up pitchers and a clean A-Rod would hit against clean pitchers, but, well, Occam’s Razor.
Statistical evidence that the Rays are outclassed. October 27, 2008
Posted by tomflesher in Baseball.Tags: Baseball, Phillies, Rays
1 comment so far
The series thus far.
Q.E.D.
Poor Kazmir. October 17, 2008
Posted by tomflesher in Baseball.Tags: ALCS, Baseball, Cy Young, John Smoltz, Mike Mussina, Rays, Red Sox, Scott Kazmir, weird lines
add a comment
Last night, Scott Kazmir pitched 6 scoreless innings in ALCS game 5, giving up 2 hits and 3 walks but striking out 7 batters. He totalled up to a game score of 72 points. His bullpen then proceeded to give up 8 runs, allowing the Red Sox to come back and win the game (thus extending the series to game 5).
Has Scotty suffered the greatest postseason indignity ever? Nope. Not even close. That honor belongs to Mike Mussina of the 1997 Orioles.
Bailouts! September 25, 2008
Posted by tomflesher in Baseball.Tags: Andy Pettitte, Baseball, John Garland, Nate Robertson, Research, Reverse quality starts, Sean Bergman, Sidney Ponson, weird lines
add a comment
That’s right… in the interest of keeping up with this week’s news about the $700b bailout of the financial sector, I’m going to take a look at key instances of bailouts by the bullpen.