New fantasy app: GIGO for public databases and websites

Eva Martin, ex-slaveOne of the most important and powerful features of computational journalism is the ability to pull information from multiple databases and remix them in a variety of ways. Of course, that means that errors in those databases will be compounded and remixed as well. I wrote a bit about this problem in an October 27, 2009 post for Blogher:

“Last April, Amy Gahran blogged a Los Angeles Times story revealing that a crime map featured on the Los Angeles police department website was producing faulty data because of an error in the software that plotted the locations of specific crimes. Thus, crime clusters were showing up in low-crime neighborhoods, and some high-crime areas appeared deceptively safe. The error was particularly vexing for the high-profile news aggregator,, which relied on the maps and as part of its coverage.”

The thing is, that kind of error is relatively easy to solve, compared to other kinds of errors that crop up in public records.

For example,  sometimes we learn that database information is erroneous long after it is created.  For example, police corruption scandals can throw years of crime data into doubt. In Philadelphia in the 1990s, revelations of drug dealing, and other criminal acts by officers in the city’s 39th precinct cast doubt on 1400 prior criminal convictions.  However, if I obtain records from the Philadelphia courts or district attorney’s office for that period, can I necessarily be sure that the appropriate asterisks have been retroactively applied to those cases?

Here’s a more challenging example — not about errors in a database, but potential errors in data interpretation. About 10 years ago, I taught an interdisciplinary humanities course for which I used the University of Virginia’s online exhibit drawn from the WPA slave narratives. It’s an invaluable collection that includes transcripts and even some audio recordings from the late 1930s. The collection has an equally invaluable disclaimer designed to help contemporary readers place the narratives in appropriate historical context:

Often the full meanings of the narratives will remain unclear, but the ambiguities themselves bear careful consideration. When Emma Crockett spoke about whippings, she said that “All I knowed, ’twas bad times and folks got whupped, but I kain’t say who was to blame; some was good and some was bad.” We might discern a number of reasons for her inability or unwillingness to name names, to be more specific about brutalities suffered under slavery. She admitted that her memory was failing her, not unreasonable for an eighty-year-old. She also told her interviewer that under slavery she lived on the “plantation right over yander,”and it is likely that the children or grandchildren of her former masters, or her former overseers, still lived nearby; the threat of retribution could have made her hold her tongue.

Even with the disclaimers, I found some students concluded that the slaves interviewed had not suffered that much in captivity. I had to help them to read the documents in historical and cultural context. As more primary documents become accessible to people who aren’t experts in the subject matter, the opportunity for misreading and missing the context of those documents multiply.

So I was thinking, what is there was a kind of wiki for collecting errors in public databases, enhanced with a widget that could be embedded in any website? Call it GIGO: Garbage In Garbage Out. Create an online form that would allow people to submit errors – with appropriate documentation, of course. Perhaps use the kind of vetting process, uses to come up with a list of credible sites in response to a given search request. (Here’s an example of a Hakia search on global warming.)  What do you think?

Posted in Civic media, Computational Thinking, Journalism, Research and tagged , , , , , , , , , .


My professional background is in public information, magazine journalism, blogging and journalism education. My current research is founded on the premise that democracy requires the broad participation of a computationally fluent citizenry. Civic media industries must reflect the communities they serve at the level of ownership, research and development, news gathering, presentation and community engagement. This adds greater urgency to the already critical need to broaden participation in computing. To that end, I have collaborated on curricular models for infusing computing into journalism education at both the scholastic and collegiate levels, and for promoting civic engagement in computer science education. My current interest is in exploring the potential of stochastic networks and as enhancement to social computing tools for broadening civic participation.
While most of this blog is devoted to my research in computational journalism and trends in journalism education, I occasionally do some storytelling of my own. This blog picks up where my other blogs, Professor Kim’s News Notes ( and The Nancybelle Project ( left off.