Tag Archive for 'datasets'

Data mining contests

I’m a sucker for competitions with lots of prize money… So I went fishing on the web looking for data mining contests. I only found three results – do you know of any others? Comment on this post and I can update the list for everyone. Here’s the competitions I found:

1. Of course, round 1 of the Netflix competition has ended, but did you know there’s a round 2 — also with a $1 million prize? Round 2 will be a time-limited contest involving sparse datasets. The full details for the Netflix 2 prize will be announced in the near future on their website. Once the contest has been officially started, it will have a progress prize at 6 months and then finish at 18 months.

2. There’s a statistical methods competition called the OMOP Cup: Method Competition. It’s organized by the Observational Medical Outcomes Partnership. The purpose is to improve on current methods of utilizing real-time data to ensure drug safety. There are two parts to the competition (taken from the website):

  • Challenge 1 explores how well your method works when provided an entire dataset, so the goal is accurate classification of which drugs are associated with which outcomes.
  • Challenge 2 evaluates the timeliness of detection of drug-event associations by having your methods run against data sequentially as it accumulates over time.

The total prize money is $20,000. Visit the OMOP Cup: Method Competition website for full details.

3. Every year, KDD (Knowledge Discovery and Data-Mining ) sponsors a data-mining competition with a cash prize of around $5000. The competition is usually announced in Spring, so apologies for mentioning it now – you will have to wait until 2010. You can look at info on past competitions here.

Google’s Palimpset cancelled

I was surprised to read that Google has decided to cancel its data hosting service, nee Palimpsest. The name Palimpsest came from the book, Archimedes Palimpsest. To read more on the project which recently restored Archimedes Palimpset click here.

Google was originally planning to host Terabytes of data, including astronomical and large governmental data sets. Many bloggers had been hailing Google’s service as the start of a new era of transparency for the USA.

The decision to cancel Google’s Palimpset came just a week ago. The sharp downturn in the economy played a key role. Google’s stock fell from an all-time high of about $700 a year ago to around $300 today. This caused Google to sharply curtail “experimental” programs that have no guaranteed revenues.