What is a computational journalist?

A friend posed this question on Facebook in response to my last blog post, and I was tempted to respond, “We’re still figuring it out.” Then I was tempted to be glib and say, “It’s CAR (computer assisted reporting) on the Information Superhighway.” There’s a sense in which both of these statements are true, and yet, there are some things that can be said with some degree of confidence.

Computational journalism is the application of computing tools and processes to the traditional task of defining, gathering and presenting news. This definition is what I was reaching for in my May 2009 essay, “How Computational Thinking is Changing Journalism and What’s Next.” As Adrian Holovaty explained in this September, 2006, blog post, computers aggregate and manipulate structured data, so we make the best use of the technology when we organize our content accordingly. This not only means cataloging our content in ways that make it easier to find (SEO metadata, tags, links and trackbacks for example), but choosing the most efficient and effective forms of information-gathering and presentation for the task and audience at hand.

One example that I used in my essay involved building a module into a local newspaper’s content management system that would pick up specific pieces of metadata from a wire service’s RSS feed (such the time stamp and the dateline) and automatically dump the headline into a breaking news field that loads on the front page.

This kind of automation is one way in which computing technologies can help make the newsgathering process more efficient and timely.  Megan Taylor’s July 2010 post for Poynter reported on how companies such as the New York Times are building applications that automate the retrieval and manipulation of certain kinds of information, such as congressional votes.  Taylor also noted that news operations routinely employ algorithms, or step-by-step procedures that can be codified, or sometimes translated into software applications that can aid reporting and editing.  The third important quality is abstraction, which is a way of generalizing about objects or processes. For example, this web page is governed by an cascading style sheet that is built on a set of abstractions such as “text,” “header,” “link,” “post” and “footer.” Each of these “objects” has properties, such as font, color and alignment  that define its “style.” The webpage interacts with a database organized according to its own set of abstractions.

Why is this useful for the non-programmer journalist to understand?  For one thing, I’ve found it helps me understand what programmers are talking about when we are collaborating. For example, when I worked with my computer science colleague Monisha Pulimood and our students to create the content management systems for our campus online magazine Unbound and our Interactive Journalism Institute for Middle Schoolers, our programmers had to ask detailed questions about the journalists’ workflow in order to create the databases and interfaces for each system. It took a while to understand what was most useful and relevant on both sides, when we worked on unbound, but the process was much smoother during the IJIMS project because we were more practiced at the conversation.

Computational includes, but is not limited to computer assisted reporting.

Sarah Cohen, Duke University’s Knight Foundation Chair in Computational Journalism’s 2009 report “Accountability through Algorithm: Developing the Field of Computational Jounrlaism (.pdf), , envisions new tools that will help reporters gather, analyze and present data and interact with news consumers and sources in more efficient, useful and engaging ways.

One simple example is  Gumshoe, the database manager that Pulimood  and her students built to help another TCNJ journalism colleague, Donna Shaw, analyze data she’d obtained about the disposition of gun crimes in the Philadelphia municipal courts. Using a sample of data from just a two-month period in 2006, Shaw and her students were able to document the fact that hundreds of cases weren’t going to trial, often because evidence and/or witnesses disappeared.  Shaw’s findings were part of the document trail that led to “Justics: Delayed, Dismissed, Denied” a Philadelphia Inquirer multi-part series  on problems in the Philadelphia court system that ran in 2009. (One of the reporters on that project, Emilie Lounsberry, has since joined our TCNJ journalism faculty.) (Reference)

Social network analysis is another great computational tool. I really like this 2006 project created by students from Emerson College a few years ago that illuminated how social networks affected the transmission of health information in Boaston’s Chinatown. The network maps are accompanied by a series of video podcasts about health care issues in the neighborhood.

News games are another important area of development, and I think that collaboration between journalists and game developers are going to lead to the emergence of multithreaded interarctive non-fiction narratives. Another TCNJ colleague, Ursula Wolz, has been helping me think about the possibilities of this field for the last several years. In 2007, we published a paper and a Poynter. org post outlining our idea for a multi-threaded non-fiction storytelling engine. We’ve made progress since then, which I hope to be able to demonstrate in more detail in the coming months. For the moment, here is a very primitive example of a fictional mutithreaded story that I wrote in Scratch using a simple storytelling engine that Wolz wrote for my interactive storytelling class last Spring. (This was actually part of a larger collaboration supported by the CPATH distributed expertise project, which Wolz and I will be presenting, along with our Villanova colleagues, Tom Way and Lillian Cassel, at the SIGSCE conference next March.)


Endnotes

  1. Shaw, Donna., Pulimood, Sarah Monisha. and Lounsberry, Emilie.The Gumshoe Project: A Model for Collaboration Between a Small College and a Large NewspaperPaper presented at the annual meeting of the Association for Education in Journalism and Mass Communication, The Denver Sheraton, Denver, CO, Aug 04, 2010 . 2010-11-15
  2. (with U. Wolz) “ Multi-threaded Interactive Storytelling for Literary Journalism “, The New Media Consortium Summer Conference 2007, Sparking Innovative Learning and Creativity”, invited expanded paper, http://www.nmc.org/publications, pp 38 – 45, 2007

Superintendent: IJIMS Strengthened Learning and Professional Development in New Jersey School District

In June, 2009, my colleague Ursula Wolz and I had a chat with outgoing Ewing New Jersey Public Schools Superintendent Raymond Broach about his views on the IJIMS Project. IJIMS or the Interactive Journalism Institute for Middle Schoolers, is collaboration between Ewing township’s middle school and The College of New Jersey that is supported by the National Science Foundation’s Broadening Participation in Computing Project.  Wolz is the Project’s principal investigator; I am a co-PI along with Monisha Pulimood. The other TCNJ members of our team are gender equity specialist Mary Switzer, several TCNJ student research assistants, and a select group of volunteer mentors. Meredith Stone is our external evaluator.

Our hypothesis was that students who don’t think of themselves as “computing types”  can be successfully introduced to computing and programming concepts by learning to do multimedia journalism about their own communities. Our research results more than validate our hypothesis.

In this interview, Dr. Broach lauded the constructivist nature of the IJIMS model – a method of teaching the emphasizes collaboration and discovery, making students participants in creating knowledge, not merely absorbing knowledge. Broach noted that the Fisher teachers and guidance counselor who collaborated with us also received training in multimedia journalism and programming in Scratch. This, he said was a departure from the usual professional development model, because it required the teachers to learn skills that weren’t necessarily part of their training.

By the way, one of the Fisher teachers, Laura Fay, recently presented her experience teaching the Scratch programming language in the 8th grade language arts classroom at a meeting for investigators in the BPC program. You can read the notes from the presentation she and Ursula Wolz gave on the IJIMS project:

New fantasy app: GIGO for public databases and websites

Eva Martin, ex-slaveOne of the most important and powerful features of computational journalism is the ability to pull information from multiple databases and remix them in a variety of ways. Of course, that means that errors in those databases will be compounded and remixed as well. I wrote a bit about this problem in an October 27, 2009 post for Blogher:

“Last April, Amy Gahran blogged a Los Angeles Times story revealing that a crime map featured on the Los Angeles police department website was producing faulty data because of an error in the software that plotted the locations of specific crimes. Thus, crime clusters were showing up in low-crime neighborhoods, and some high-crime areas appeared deceptively safe. The error was particularly vexing for the high-profile news aggregator, Everyblock.com, which relied on the maps and as part of its coverage.”

The thing is, that kind of error is relatively easy to solve, compared to other kinds of errors that crop up in public records.

For example,  sometimes we learn that database information is erroneous long after it is created.  For example, police corruption scandals can throw years of crime data into doubt. In Philadelphia in the 1990s, revelations of drug dealing, and other criminal acts by officers in the city’s 39th precinct cast doubt on 1400 prior criminal convictions.  However, if I obtain records from the Philadelphia courts or district attorney’s office for that period, can I necessarily be sure that the appropriate asterisks have been retroactively applied to those cases?

Here’s a more challenging example — not about errors in a database, but potential errors in data interpretation. About 10 years ago, I taught an interdisciplinary humanities course for which I used the University of Virginia’s online exhibit drawn from the WPA slave narratives. It’s an invaluable collection that includes transcripts and even some audio recordings from the late 1930s. The collection has an equally invaluable disclaimer designed to help contemporary readers place the narratives in appropriate historical context:

Often the full meanings of the narratives will remain unclear, but the ambiguities themselves bear careful consideration. When Emma Crockett spoke about whippings, she said that “All I knowed, ’twas bad times and folks got whupped, but I kain’t say who was to blame; some was good and some was bad.” We might discern a number of reasons for her inability or unwillingness to name names, to be more specific about brutalities suffered under slavery. She admitted that her memory was failing her, not unreasonable for an eighty-year-old. She also told her interviewer that under slavery she lived on the “plantation right over yander,”and it is likely that the children or grandchildren of her former masters, or her former overseers, still lived nearby; the threat of retribution could have made her hold her tongue.

Even with the disclaimers, I found some students concluded that the slaves interviewed had not suffered that much in captivity. I had to help them to read the documents in historical and cultural context. As more primary documents become accessible to people who aren’t experts in the subject matter, the opportunity for misreading and missing the context of those documents multiply.

So I was thinking, what is there was a kind of wiki for collecting errors in public databases, enhanced with a widget that could be embedded in any website? Call it GIGO: Garbage In Garbage Out. Create an online form that would allow people to submit errors – with appropriate documentation, of course. Perhaps use the kind of vetting process, Hakia.com uses to come up with a list of credible sites in response to a given search request. (Here’s an example of a Hakia search on global warming.)  What do you think?