Why London 2012 “Statistics” Shows Need for Big Data
Doug Hadden, VP Products
There’s been a lot of interest in sports statistics popularized by the book (and movie) Moneyball. The premise of Moneyball is that traditional measures used in sports are often incorrect. The advent of “big data” and big data techniques such as visualization promise to change our preconceptions about sports. The Olympics Per Capita web site is an example of this that shows (at this moment) that Grenada is the most successful country at London 2012 based on population I’ve noticed a lot of tweets and comments about this. And, there is some nice map visualization.
Of course, it’s all claptrap and has little to do with reality. Doesn’t even pass the statistics smell test.
That’s not big data
The strength of big data is the ability to analyze more information (volume) from different sources (variety) coming with more speed (velocity). Yet, this analysis shows an analysis of very little information (only medals) from a single source (Olympic medals) and a handful a day. And, GDP per capita and population are highly aggregate “little data” constructs.
What would a big data analysis of London 2012 success consider?
- All Olympic performances compared within categories – coming in 5th. in one event may be a better performance than coming in 2nd in another
- Biometric information like distance traveled, heart rate etc.
- Number of athletes for each sport world-wide
- Skewing of results from team sports that favour countries with larger populations, sports that have multiple similar contests (i.e. swimming vs. running) and sports that require more expensive equipment and coaching
- Skewing of results where a single win is necessary for propelling Grenada (or Dominica) to number 1
- Potential impact of factors such as training season, jet lag, elevation
- Skewing of results based on degree of judging by sport
- Importance of Winter Olympic sports results
Statistics and Confirmation Bias
“Little data” analysis helps to confirm our biases. We can rearrange the information to confirm the bias that Canada is outperforming the United States. Or, we can decide to rank countries, as done by the press in Canada, by medals won. This brings Canada to 12th best as of this writing. (As does NBC). Or, like the BBC, weight each medal to put Canada 32nd. We can then select the measure that most supports our point of view.
Big data is about eliminating theory (or reducing the impact of theory) to achieve insight. Not making up your mind and finding support in the statistics. This is bringing sports down to the level of political campaigns!