In the world of Advanced Stats, I'm a little down about the state of basketball. Plus-minus and on/off statistics seem to be the rage, and the reality is that while newer data can bring more understanding to basketball, the old boxscore is a pretty useful tool. When I read the book Moneyball, so many of the lessons espoused by Billy Beane and the Oakland As made so much sense. For example:
And yet, over the years, I've heard the following myth thrown out: "Advanced Stats in baseball are easy, but basketball has so many interaction effects, it's a different game." I figured I'd go over a few of the familiar arguments in this line of thinking.
This tends to be the argument behind why the baseball box score is trustworthy. Except, it's not true. Sorry. On the most simple of offensive plays - a strike out - three people are involved - the pitcher, the catcher, and the batter. As soon as the batter gets a hit, it gets more complicated. In a routine groundout, you add a fielder and the first baseman (five people involved). If the batter hits to the outfield, you can get a fielder, the cutoff man, and whoever is covering the base (six people involved). The same is true of a double play. If there are baserunners on, then we can get nine-plus players involved in a given play.
Now, I've heard that some of these players don't matter. I don't buy it given that the baseball box score has a column for recording when a player messes up a routine play. I have a theory behind why people think this. Basketball is a lot like baseball, except it's played in fast forward. In baseball, each team gets twenty-seven outs, so that means a given game is roughly fifty-four possessions. And baseball games take around three hours. That's about three and a half minutes a possession. In basketball, an average game takes about two and a half hours and has close to 200 possessions. That means each possession takes forty-five seconds. That means an NBA game is playing at about four times the speed as a baseball game. Sure, a simple pass to first base doesn't seem important played in slow motion, but speed it up and it starts looking like basketball!
We have over a hundred years of data in baseball. Additionally, it's some of the cleanest sports data out there. What's more, thanks to the "Moneyball revolution", the cool new technology was adopted quicker, so we even have the exact location of pitches, their speed, and their spin. Compared to the NBA, who barely has forty years of the modern box score, it seems like a wash that baseball would be better. Except, this brings up an important key in data. How valuable data is does not always line up with how clean it is.
Assists, for instance, are a subjective stat. They rely on scorekeepers. It's hard to know how much to credit the passer versus the scorer (I'm repeating a great interview from Dean Oliver here), and yet, they are very important. They are also one of the most consistent statistics in basketball year to year. In fact, virtually all of the stats in basketball are more consistent year to year than the most consistent stats in baseball! Yes, baseball has cleaner data, but basketball has more useful data. A problem we have in data analysis is having data you can use often trumps having useful data. And in that regard, it is easier to do a lot more baseball analysis, but it doesn't change the fact that the information found in basketball data is more useful.
This is where I'll upset a lot of people. Sports, relative to other fields, just are not that complicated. The math involved in many "advanced stats" might not get you by in an introductory level class in statistics. Of course, many people still mess them up, but that's another post. And sports, in general, are not that complex. A key reason being, they have rules! Unlike business, where the rules constantly change, or physics, where we don't actually know all the rules, sports has all the rules written down. And they're not that complex!
One of my favorite scenes on the subject.
It's possible basketball is more complicated than baseball. But both have very clear rules about to win, and both have useful stats identifying players that win. In basketball, the bottleneck is that there are only so many stars to go around. In baseball, the bottleneck is stuff is random, and the best team doesn't always win. But regardless, the holdup on either sport is not complexity! And it's remarkable to me that this line still works on people.
Baseball got a few decade headstart on basketball in regards to the "data revolution." Basketball has jumped in with gusto, and candidly, I'm just not impressed. However, it's odd to me that the people that started all of the data analysis haven't looked at basketball and gone: "Oh, it's the same problem, but with more consistent players, this is much simpler." Rather, many have gone: "Of course, basketball is more complicated! Better data is needed!" The insight in Moneyball wasn't more data was required. The data had been there all along! The "secret" was connecting the data to the right question: "What wins games?" Maybe we'll get there one day in basketball. The good news is that some teams have clearly figured it out, but they're not in any hurry to write best-selling books giving away the recipe.