I’m a sucker for competitions with lots of prize money… So I went fishing on the web looking for data mining contests. I only found three results – do you know of any others? Comment on this post and I can update the list for everyone. Here’s the competitions I found:
1. Of course, round 1 of the Netflix competition has ended, but did you know there’s a round 2 — also with a $1 million prize? Round 2 will be a time-limited contest involving sparse datasets. The full details for the Netflix 2 prize will be announced in the near future on their website. Once the contest has been officially started, it will have a progress prize at 6 months and then finish at 18 months.
2. There’s a statistical methods competition called the OMOP Cup: Method Competition. It’s organized by the Observational Medical Outcomes Partnership. The purpose is to improve on current methods of utilizing real-time data to ensure drug safety. There are two parts to the competition (taken from the website):
- Challenge 1 explores how well your method works when provided an entire dataset, so the goal is accurate classification of which drugs are associated with which outcomes.
- Challenge 2 evaluates the timeliness of detection of drug-event associations by having your methods run against data sequentially as it accumulates over time.
The total prize money is $20,000. Visit the OMOP Cup: Method Competition website for full details.
3. Every year, KDD (Knowledge Discovery and Data-Mining ) sponsors a data-mining competition with a cash prize of around $5000. The competition is usually announced in Spring, so apologies for mentioning it now – you will have to wait until 2010. You can look at info on past competitions here.
An algorithm from researchers at Cornell has managed to data-mine the underlying laws of physics in just under one day.
In another example of data-mining, researchers were able to answer one part of a hieroglyphic mystery that has perplexed archeologists to this day. The Indus Script from 4,000 years ago has remained undeciphereable. Some linguists have insisted that it is no language at all, but merely political ciphers (like the Democtratic donkey or Republican elephant) that were important in that day. The problem is that the longest chain of Indus Script contains only 27 characters, making it extremely difficult to crack. A group of researchers has now managed to “prove” that it is a real language by showing that the entropy level of the order of the Indus characters is very similar to human language.
In other fascinating news, did you know that you can’t tie your shoelaces? If you don’t believe me, visit 


Recent Comments