One of the most important and powerful features of computational journalism is the ability to pull information from multiple databases and remix them in a variety of ways. Of course, that means that errors in those databases will be compounded and remixed as well. I wrote a bit about this problem in an October 27, 2009 post for Blogher:
“Last April, Amy Gahran blogged a Los Angeles Times story revealing that a crime map featured on the Los Angeles police department website was producing faulty data because of an error in the software that plotted the locations of specific crimes. Thus, crime clusters were showing up in low-crime neighborhoods, and some high-crime areas appeared deceptively safe. The error was particularly vexing for the high-profile news aggregator, Everyblock.com, which relied on the maps and as part of its coverage.”
The thing is, that kind of error is relatively easy to solve, compared to other kinds of errors that crop up in public records.
For example, sometimes we learn that database information is erroneous long after it is created. For example, police corruption scandals can throw years of crime data into doubt. In Philadelphia in the 1990s, revelations of drug dealing, and other criminal acts by officers in the city’s 39th precinct cast doubt on 1400 prior criminal convictions. However, if I obtain records from the Philadelphia courts or district attorney’s office for that period, can I necessarily be sure that the appropriate asterisks have been retroactively applied to those cases?
Here’s a more challenging example — not about errors in a database, but potential errors in data interpretation. About 10 years ago, I taught an interdisciplinary humanities course for which I used the University of Virginia’s online exhibit drawn from the WPA slave narratives. It’s an invaluable collection that includes transcripts and even some audio recordings from the late 1930s. The collection has an equally invaluable disclaimer designed to help contemporary readers place the narratives in appropriate historical context:
Often the full meanings of the narratives will remain unclear, but the ambiguities themselves bear careful consideration. When Emma Crockett spoke about whippings, she said that “All I knowed, ’twas bad times and folks got whupped, but I kain’t say who was to blame; some was good and some was bad.” We might discern a number of reasons for her inability or unwillingness to name names, to be more specific about brutalities suffered under slavery. She admitted that her memory was failing her, not unreasonable for an eighty-year-old. She also told her interviewer that under slavery she lived on the “plantation right over yander,”and it is likely that the children or grandchildren of her former masters, or her former overseers, still lived nearby; the threat of retribution could have made her hold her tongue.
Even with the disclaimers, I found some students concluded that the slaves interviewed had not suffered that much in captivity. I had to help them to read the documents in historical and cultural context. As more primary documents become accessible to people who aren’t experts in the subject matter, the opportunity for misreading and missing the context of those documents multiply.
So I was thinking, what is there was a kind of wiki for collecting errors in public databases, enhanced with a widget that could be embedded in any website? Call it GIGO: Garbage In Garbage Out. Create an online form that would allow people to submit errors – with appropriate documentation, of course. Perhaps use the kind of vetting process, Hakia.com uses to come up with a list of credible sites in response to a given search request. (Here’s an example of a Hakia search on global warming.) What do you think?