Archive for the 'datasets' Category

Page 2 of 2

One researcher’s peek into the online gaming world

EverQuest II image (from Wikipedia)

Professor Noshir Contractor from Northwestern University and his colleagues recently data-mined the massive 60-terabyte dataset released by EverQuest II. In case someone out there has never heard of EverQuest, it’s an online roll-playing game occupying the spare time of about 45 million people.

Noshir found some interesting results as a result of his efforts. First off, those 45 million people are NOT mostly teens. The average age was much higher. And despite having the entire world literally at their finger-tips, people still tended to network with their geographical neighbors.

“People end up playing with people nearby, often with people they already know,” Contractor said. “It’s not creating new networks. It’s reinforcing existing networks. You can talk to anyone anywhere, and yet individuals 10 kilometers away from each other are five times more likely to be partners than those who are 100 kilometers away from each other.”

I would have expected some grouping due to language barriers, but the geographical localization far exceeds that. I wonder if the even stronger statement could be made that players tend to play with people they previously knew?

A survey was also distributed to 7,000 players. Using the results of the survey, Professor Contractor found disproportionate rates of self-reported depression vs. the general population. Additionally, he found that players tended to understimate how much time they devoted to the game, and that women are generally the most devoted and content (but apparently don’t like to play with other women!).

Hmmm…. So perhaps the guys are depressed because after long hours of playing EverQuest with mainly other guys they realize after the game has been switched off they still don’t have a girlfriend? Whereas women spend their time mostly getting attention from lots of admiring EverQuest men so it makes up for not having a boyfriend?

One thing this shows is that patterns can exist in data, but the reasons behind those patterns may still be a mystery. I wonder if Professor Contractor has plans to follow up his research with a further survey to try to tease out of players why they interact the way they do – now that he knows how they interact!

Whatever the next step, it was an interesting series of finds extracted from a massive dataset. Please refer to the original article written by Megan Fellman from Northwestern University’s news center. You can also visit Professor Noshir Contractor’s blog.

-Lyndie Chiou

Surprising uses of corporate data mining

amexpcard

The NY Times recently described an interesting policy by American Express. The credit card company had been lowering the credit limits of customers who had shopped at certain retail outlets. Using their proprietary dataset gathered over their customer base, American Express had identified certain retailers whose customers had a hard time paying their credit card bills. They concluded that all customers who shopped at these stores were a credit risk and correspondingly lowered their credit limits. The catch here is that American Express did not reveal their retailer black list. So customers had no way of knowing which stores to avoid. Walmart? Niemann-Marcus? Baskin-Robbins?

Once the story appeared into the press American Express recanted its policy. In fact, they went one step further and insisted this had never been their policy, despite thousands of letters to “curtailed” customers that explicitly detailed otherwise. Read the original NY Times report of the surprising use of American Express data mining.

In other news about creative uses of data mining…

Microsoft Live search is thinking about inserting social technology into its searching service. They found that a relatively new technique called “groupization” turned up more relevant results in an internal test-run. The idea is to use a person’s social network to influence the results that are returned to a user. A user searches using a set of keywords which are then correlated with the results the user’s social group found relevant. While Microsoft was keen on the idea, they were also worried that the implementation on a large scale might be nigh impossible. You can read a summary of the idea at the online website ars technica.

Personally, I have my doubts that this would really contribute to my personal search results. I’ve noticed that the advertisements on the social networking site Facebook are 180-degrees different from what I’m actually interested in. I get a bunch of ads for the acai berry diet, movies,  and how to get long eye-lashes. In case anyone from Facebook is reading this, I don’t want to detangle my eye-lashes in the morning! I think the low relevance of the ads on Facebook have to do with the fact that it’s a social networking site and therefore relatively “fluffy”. Perhaps Microsoft could create its own pre-defined community titles and a user could click their interests/hobbies when they create an account profile. Then based on these broad categories, Microsoft could perform the “groupization”. This might be a scalable approach to their strategy.

-Lyndie Chiou

Data Scraping

I came across a useful post on the blog ouseful.wordpress.com. The blogger, Tony Hirst, blogs about whatever he finds interesting. He figured out a way to scrape the data off of Wikipedia using the Google spreadsheet function =importHTM(””,”table”,N).

The  blogger gives detailed instructions on how to extract a table containing population data from England using the Google spreadsheet function. He then uses Yahoo! pipes to geocode the data and create a Google mashup. It’s a very ingenious method of extracting limited datasets!

Another blogger, GoogleMapsMania, suggested using Batchgeocode to get the latitude and longitudinal data and then applying that data in the Google Spreadsheet Map Wizard to map out the data

-Lyndie Chiou

Google’s Palimpset cancelled

I was surprised to read that Google has decided to cancel its data hosting service, nee Palimpsest. The name Palimpsest came from the book, Archimedes Palimpsest. To read more on the project which recently restored Archimedes Palimpset click here.

Google was originally planning to host Terabytes of data, including astronomical and large governmental data sets. Many bloggers had been hailing Google’s service as the start of a new era of transparency for the USA.

The decision to cancel Google’s Palimpset came just a week ago. The sharp downturn in the economy played a key role. Google’s stock fell from an all-time high of about $700 a year ago to around $300 today. This caused Google to sharply curtail “experimental” programs that have no guaranteed revenues.

-Lyndie Chiou




6 visitors online now
0 guests, 6 bots, 0 members
Max visitors today: 30 at 10:09 am UTC
This month: 30 at 05-19-2012 10:09 am UTC
This year: 36 at 04-22-2012 11:33 am UTC
All time: 44 at 11-08-2010 02:08 am UTC