Data-mining in school districts

Starting a decade ago more and more schools across the US began implementing data mining programs to improve school performance.

It was claimed that data-mining real-time and after the school year had ended could identify students who were in danger of not graduating. Real-time data mining would be a way to find these students early on so they could be provided with targeted assistance in their weak subjects.

Data-mining after the school year was over provided a method to check how programs performed and to evaluate the weaknesses and strengths of individual teachers and schools.

After a media splash, more and more school districts turned to data-mining as a way to improve their overall performance.

I decided to follow up on schools to try to data-mine the effect of data-mining. Ok, that was originally what I wanted to do, but very quickly I realized it would be a long research project, not just a short project. So this analysis is not that sophisticated. I just looked to see if its graduation rates improved at one school since it started its data-mining program.

One of the earliest schools to jump onto the data-mining bandwagon was Broward County School District in Florida. This is a large school with a low graduation rate. Broward County School District is in the nations’s top 10 largest school districts with almost 250,000 students. They were featured in an article in 2000 in CNN, detailing their plans to provide data-mining services via a $2 million grant from IBM.

The year the data-mining project began, the graduation rate was 62.3%. In 2008-2009 school year, the most recent school year with reported rates, it was 76.3%. Below is a graph of the graduation rate from 1998-2008 (data-mining began in 2000).

Given the Broward High School’s class size of about 1,200 kids/grade, that’s about 1,300 extra students who graduated during the 8 year time span who otherwise would not have made it — about the same size as a whole class.

Of course, it’s hard to tell if this can all be put down to the benefits of real-time data-mining. To do this study properly, all the schools in the country that had implemented data-mining programs should be compared against all the schools that hadn’t. This brings me to another point… Data in the education domain is very difficult to obtain and seemingly unreliable! I found at least 3 different websites with conflicting data for the same clearly-defined measure of graduation rate. In the end, I went with the stats listed on the Florida Department of Education which differed from the Broward County website data which was also different from the National Center for Education Statistics data! These are all government sources and therefore reliable, one would have thought…

Thus I drop the ball of examining the usefulness of data-mining in education right here… I wish education data was more centralized and therefore easier to access and hopefully more reliable. Any parent who is trying to decide where to live in order to put their kids in the best schools probably has wished the exact same thing, as well as researchers looking for ways to improve education.

UPDATE: After I wrote this post, I discovered a mine-load of data (although mainly test scores, incidents, etc., not graduation rates) at this Florida Dept of Education link. I may revisit this subject to form a more proper conclusion about Broward District’s results sometime in the future!