Those who accept and use Wins Above Replacement (WAR) as a tool to think about baseball have enough knowledge of and/or faith in its framework and methodology that they often treat WAR’s general accuracy more as a given than as a point of contention. On the other hand, a large number of baseball “traditionalists” refuse to acknowledge the possibility that the calculation could work at all. Still more are simply turned off by the cold indifference of numbers purporting to represent their heroes’ worth in the abstract. Unlike home runs or doubles, WAR is not something a player can do. It’s an artificial number derived by measuring a player’s performance against that of a person who does not even exist.
As a result, public debate about WAR (and sabermetrics in general) tends to focus on whether the calculation works and not how well. This isn’t exactly surprising. Given sports media’s widespread fixation on the yes/no rather than reasoned discussion of contested issues, the actual mechanics of such a polarizing thing can take a back seat to the perceived Big Picture. The “yes” crowd, while continuing to use advanced analytics to frame their discussion of the game, retreat to niche forums to discuss the ocean of grey that exists in baseball numbers. Naysayers soldier on, discounting advanced metrics as nerdy attempts to rewrite a narrative made clear through obvious stats while scoffing at WAR for its ignorance of mysterious baseball attributes such as “grit” and “the will to win.” The fertile middle ground, as it were, remains largely unplowed. Even as things like WAR and BABIP (batting average on balls in play) make more frequent appearances in mainstream baseball journalism, the presence of “sabermetric” stats in mass-consumption media often feels more like casually hip namedropping than an invitation to discuss why and how well they work.
Relevant to the research I’ve done, little widely-seen public discussion occurs over WAR’s correlation – if any – with overall team success. This should be surprising; after all, the primary purpose of the WAR calculation (and most “counting” statistics, for that matter) is to help determine how valuable players are to their respective teams. WAR depends on a predetermined (replacement) level of team success, and derives player value by estimating the number of hypothetical “wins” a player has added to the base replacement level. It follows, then, that a general – and I emphasize general — idea of WAR’s success as a player valuation tool can be gotten by seeing how well the combined total Wins Above Replacement of a team’s players matches with the team’s actual win/loss record. Of course, extremely detailed and intelligent analysis of concepts like WAR can be found en masse at places like Baseball Prospectus, the Society for American Baseball Research (SABR), Fangraphs, Baseball Think Factory, and elsewhere. As I mentioned, though, such discussion often acknowledges the validity of advanced metrics as a matter of course.
“Sabes,” as labeled lovingly by a longtime columnist who covers my hometown team, regularly complain that so many reflexively reject advanced baseball analysis because they don’t understand it. This may be true, and a lot of the blame may actually be attributable to sabermetricians and their followers. For one, the snark oozing from both sides of The Great Stats Debate makes people with opposing viewpoints less likely to listen to each other at all. Looking past that, much substantive discussion of non-traditional analytical tools in baseball is written in a style that can seem daunting (inaccessible, even) to uninitiated fans or writers. In multiple parts following this introduction, I’ll do my best to foster discussion from all sides by presenting information related to WAR in what I hope is a readable, easily understandable way.
The question, then: in 2013, how closely did combined individual Wins Above Replacement (WAR) totals reflect team success? Depending upon which side you take in the debate over baseball analytics, the results may or may not surprise you. In short, the answer is that WAR did pretty well. In the second installment of this piece, I’ll start unpacking the data.