What is a computational journalist?

A friend posed this question on Facebook in response to my last blog post, and I was tempted to respond, “We’re still figuring it out.” Then I was tempted to be glib and say, “It’s CAR (computer assisted reporting) on the Information Superhighway.” There’s a sense in which both of these statements are true, and yet, there are some things that can be said with some degree of confidence.

Computational journalism is the application of computing tools and processes to the traditional task of defining, gathering and presenting news. This definition is what I was reaching for in my May 2009 essay, “How Computational Thinking is Changing Journalism and What’s Next.” As Adrian Holovaty explained in this September, 2006, blog post, computers aggregate and manipulate structured data, so we make the best use of the technology when we organize our content accordingly. This not only means cataloging our content in ways that make it easier to find (SEO metadata, tags, links and trackbacks for example), but choosing the most efficient and effective forms of information-gathering and presentation for the task and audience at hand.

One example that I used in my essay involved building a module into a local newspaper’s content management system that would pick up specific pieces of metadata from a wire service’s RSS feed (such the time stamp and the dateline) and automatically dump the headline into a breaking news field that loads on the front page.

This kind of automation is one way in which computing technologies can help make the newsgathering process more efficient and timely. Megan Taylor’s July 2010 post for Poynter reported on how companies such as the New York Times are building applications that automate the retrieval and manipulation of certain kinds of information, such as congressional votes. Taylor also noted that news operations routinely employ algorithms, or step-by-step procedures that can be codified, or sometimes translated into software applications that can aid reporting and editing. The third important quality is abstraction, which is a way of generalizing about objects or processes. For example, this web page is governed by an cascading style sheet that is built on a set of abstractions such as “text,” “header,” “link,” “post” and “footer.” Each of these “objects” has properties, such as font, color and alignment that define its “style.” The webpage interacts with a database organized according to its own set of abstractions.

Why is this useful for the non-programmer journalist to understand? For one thing, I’ve found it helps me understand what programmers are talking about when we are collaborating. For example, when I worked with my computer science colleague Monisha Pulimood and our students to create the content management systems for our campus online magazine Unbound and our Interactive Journalism Institute for Middle Schoolers, our programmers had to ask detailed questions about the journalists’ workflow in order to create the databases and interfaces for each system. It took a while to understand what was most useful and relevant on both sides, when we worked on unbound, but the process was much smoother during the IJIMS project because we were more practiced at the conversation.

Computational includes, but is not limited to computer assisted reporting.

Sarah Cohen, Duke University’s Knight Foundation Chair in Computational Journalism’s 2009 report “Accountability through Algorithm: Developing the Field of Computational Jounrlaism (.pdf), , envisions new tools that will help reporters gather, analyze and present data and interact with news consumers and sources in more efficient, useful and engaging ways.

One simple example is Gumshoe, the database manager that Pulimood and her students built to help another TCNJ journalism colleague, Donna Shaw, analyze data she’d obtained about the disposition of gun crimes in the Philadelphia municipal courts. Using a sample of data from just a two-month period in 2006, Shaw and her students were able to document the fact that hundreds of cases weren’t going to trial, often because evidence and/or witnesses disappeared. Shaw’s findings were part of the document trail that led to “Justics: Delayed, Dismissed, Denied” a Philadelphia Inquirer multi-part series on problems in the Philadelphia court system that ran in 2009. (One of the reporters on that project, Emilie Lounsberry, has since joined our TCNJ journalism faculty.) (Reference)

Social network analysis is another great computational tool. I really like this 2006 project created by students from Emerson College a few years ago that illuminated how social networks affected the transmission of health information in Boaston’s Chinatown. The network maps are accompanied by a series of video podcasts about health care issues in the neighborhood.

News games are another important area of development, and I think that collaboration between journalists and game developers are going to lead to the emergence of multithreaded interarctive non-fiction narratives. Another TCNJ colleague, Ursula Wolz, has been helping me think about the possibilities of this field for the last several years. In 2007, we published a paper and a Poynter. org post outlining our idea for a multi-threaded non-fiction storytelling engine. We’ve made progress since then, which I hope to be able to demonstrate in more detail in the coming months. For the moment, here is a very primitive example of a fictional mutithreaded story that I wrote in Scratch using a simple storytelling engine that Wolz wrote for my interactive storytelling class last Spring. (This was actually part of a larger collaboration supported by the CPATH distributed expertise project, which Wolz and I will be presenting, along with our Villanova colleagues, Tom Way and Lillian Cassel, at the SIGSCE conference next March.)

Endnotes

Shaw, Donna., Pulimood, Sarah Monisha. and Lounsberry, Emilie.The Gumshoe Project: A Model for Collaboration Between a Small College and a Large NewspaperPaper presented at the annual meeting of the Association for Education in Journalism and Mass Communication, The Denver Sheraton, Denver, CO, Aug 04, 2010 . 2010-11-15
(with U. Wolz) “ Multi-threaded Interactive Storytelling for Literary Journalism “, The New Media Consortium Summer Conference 2007, Sparking Innovative Learning and Creativity”, invited expanded paper, http://www.nmc.org/publications, pp 38 – 45, 2007

What is a computational journalist? by Kim Pearson is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.