Tag Archive for 'Google'

International Women’s Day

Thanks to Google’s ngrams project page I have wasted my scarce spare hours looking at micro trends in literature. A couple of months ago, the Google ngrams project presented a database of all the words from Google’s extensive book collection. Making the books freely available presents copyright issues, but a database of word frequency in a collection of books is legal. They even created a simple graphing tool so you can basically play with the data. Or you can download the entire dataset for your own purposes. Micro-trends in literature might not sound very exciting, but once I started trying words, it became an addictive tool to try to prove my zany cultural theories.

One graph seemed very appropriate for today, International Women’s Day.  I plotted the words “men”, “women” and “children” versus time. And look!

“men” in blue, “women” in red, “children” in green

The years range from 1800 to 2008 and you can see clearly that the word “men” (the blue line) rules by a long shot up until about 1920. To be fair, “men” can be used in a generalized sense to mean both men and women similar to the word “mankind”. Since there’s no context I can’t distinguish what percentage of the words actually refer to both sexes.

But the interesting part of the graph is the uptick in the usage of “women” starting during the era of 1960s feminism. Even more interesting, “women” overtakes “men” in the mid-1990s.

Shortly afterwards, “women” decreases and “men” once again rule. A decline in feminism? Or perhaps the bubble in the 1990s was due to the peak in so-called chick-lit which has since gone out of favor. To provide a cultural reference point, Bridget Jones Diary, the epitome of chick-lit, came out in 1996.

“Children” seem to have a steady increase all the way from the 1800s to the present day. The slow rate of increase in the word “children” surprises me since there’s been an explosion of children’s books since the days of Beatrix Potter. Perhaps Google has disdained uploading children’s literature into its database? I also tried the words “boy” and “girl” and they show a lower percentage of usage than “children”:

“children” in blue, “boy” in red, “girl” in green

Happy International Women’s Day.

-Lyndie Chiou

Data Scraping

I came across a useful post on the blog ouseful.wordpress.com. The blogger, Tony Hirst, blogs about whatever he finds interesting. He figured out a way to scrape the data off of Wikipedia using the Google spreadsheet function =importHTM(””,”table”,N).

The  blogger gives detailed instructions on how to extract a table containing population data from England using the Google spreadsheet function. He then uses Yahoo! pipes to geocode the data and create a Google mashup. It’s a very ingenious method of extracting limited datasets!

Another blogger, GoogleMapsMania, suggested using Batchgeocode to get the latitude and longitudinal data and then applying that data in the Google Spreadsheet Map Wizard to map out the data

-Lyndie Chiou

Google’s Palimpset cancelled

I was surprised to read that Google has decided to cancel its data hosting service, nee Palimpsest. The name Palimpsest came from the book, Archimedes Palimpsest. To read more on the project which recently restored Archimedes Palimpset click here.

Google was originally planning to host Terabytes of data, including astronomical and large governmental data sets. Many bloggers had been hailing Google’s service as the start of a new era of transparency for the USA.

The decision to cancel Google’s Palimpset came just a week ago. The sharp downturn in the economy played a key role. Google’s stock fell from an all-time high of about $700 a year ago to around $300 today. This caused Google to sharply curtail “experimental” programs that have no guaranteed revenues.

-Lyndie Chiou




7 visitors online now
1 guests, 6 bots, 0 members
Max visitors today: 7 at 10:05 am UTC
This month: 12 at 02-02-2012 01:28 pm UTC
This year: 23 at 01-04-2012 10:32 pm UTC
All time: 44 at 11-08-2010 02:08 am UTC