The basic
, two page document
every data journalist
should have read
and signed.
Crowdsourced here.

#ddj #ijf17 #perugiapledge


Bildschirmfoto 2017-04-10 um 13.05.28.png

Initiated by Mirko Lorenz and Bahareh Heravi

What is this?

This is a new approach to crowdsource a manifesto of rules, advice and principles for the practice of data-driven journalism. Imagine you are just starting to work with data in a newsroom or you find that you gradually do more and more data work. What are the basic rules for your work? What are are the most important aspects, what are the mistakes to avoid? Inspiration, good conduct or taboo. Or in short: What are the principles, approaches, practices, tools every data-journalist should know about and adhere to? We leave the question about “signing” this to a later point when it becomes clear whether we reach (a) a certain quality and (b) some consensus about this.


We are crowdsourcing a document of 1-2 pages here. Leave your mark, start contributing on page 2. The result will be presented and discussed during a panel at the International Journalism Festival in Perugia.

This document:

Share what you think is important for all data journalists, for data-driven journalism projects and data-driven newsrooms. It should be short, relevant, and memorable. A piece of advice. A relevant principle for any stage of working with data. We are collecting until April 2, 2017.  We are leaving this document open until we reach a satisfactory level of depth, guidance and contributions from the community.

A brief word about the scope: Two examples illustrate what we are trying to achieve here. First, the Munich declaration. Secondly the “necessary and proportionate principles” developed by a number of NGOs as to define common rules for their communication. Another example, specifically for journalists is the “Munich Declaration of the Duties and Rights of Journalists”, from 1971. Q: What should be in a similar document written for the use of data in newsrooms?

How to start your sentence:

As a data journalist I …

Data journalism should …

A data journalist should always…

A newsroom using data must...


UPDATE March 30,2017: We will ask data experts from around the world for contributions.

We encourage everyone contributing to leave her/his Twitter name under your contribution.

Plus, in preparation for the session at the International Journalism Festival we are going to structure this paper into topics, step by step.

The first iteration of this paper will be presented our session at the Internatinal Journalism Festival in Perugia, on Saturday: 11:45 Hotel Brufani - Sala Priori - April 8, 2007


START WRITING HERE (we encourage that you include Twitter name )

This paper aims to be a starting point for any journalist or communicator to understand how data can be used to enhance stories. The goal is that after reading through this you have a better understanding of the scope and goals of this discipline.


In an increasingly connected world with constant information flows data can be used in numerous ways: For comparison, for verification, for probing, to debunk false claims, to break down big developments for regions, communities and single users.

Data-driven journalism is neither new nor fundamentally different from standard journalism practices: You decide on a topic, you research, you verify, you aim to write the best story possible.

On the other side data as a source is often not even considered as an element, even in stories where data findings are the core. The goal of data-driven journalism is to use data effectively, regularly and correctly.


Familiarize yourself with the history of the field. Good starting points are articles about William Playfair, Charles Joseph Minard, Florence Nightingale, John Snow. When reading their biographies and seeing their works ask yourself: How did they overcome barriers, why did they have impact in working with data?


Explore the work of modern thought leaders and thinkers on data and data visualization: Read the books or blogs of Philip Meyer (who pioneered the use of social science methods for journalism and defined CAR (computer-assisted reporting). A free PDF of this book is available here.  Understand why visualisations are better than just tables of numbers, based on Anscombe’s Quartet.  For visualisations must-reads are the books of Edward Tufte, Stephen Few. For the basics, if you are a starter read the book by Dona M. Wong. Watch the TED talks of Hans Rosling. Try to regularly visit the blogs of Nathan Yau, Andy Kirk and data-driven journalism.net website.  Other experts or websites? Please add them here.

Finding data

Use the phone: Instead of just searching and getting frustrated by either finding too much or nothing at all (specifically on deadline): https://regexone.com/lesson/kleene_operators? Trust that there are experts at many sources who are happy to help. Often calling those experts will further shelter you against drawing conclusions to quickly, as they will point out both strengths and weaknesses of their specific data collections.

Big data sources: Simply going to the biggest data sources is one way to start. But to really use them you need to build up experience and knowledge about specific data sources. From WHO to the World Bank data store, on to national statistics to aggregators. Do not just glance at what the big sources offer, try to familiarize yourself with the specific structure by working - say - with WHO data for a few days and build a number of charts, maps or simply short insights you extracted from the data.

Specialized data sources: Just like a reporter needs a good list of contacts to call for questions/interviews/tips in his beat a data journalist should build up a specific list of data sources for a wide variety of topics. Some you might only need once, on occasion. Others are a treasure trove of ongoing reporting that is important. Here are just very few examples, to show why knowledge about the existence of such collections can give your reporting an edge:

For health topics the European Center for Disease Control in Sweden reports about the spread of the Zika virus. They have ongoing information about about the equally threatening antimicrobal resistance spread. Call them if you are new in this field. Do not get confused as there is a similarly named Centers for Disease Control in the US.

Another example: Flightradar24 has live air traffic, including data.

Cleaning, checking, merging data: Familiarize yourself with Excel, which is the most widespread tool. To start, try to spend some time reading Excel tutorials. There is an abundance of those, either as blog posts or as Videos on YouTube. Sometimes it helps to first view the videos and then work through the tutorials. Try to make a list of things you know how to do in Excel.

An alternative and by now equally powerful choice is Google Sheets. Key principles and a number of options similar or even the same. Google Sheets has one big advantage as it is fully online. So you can use Sheets as a shared resource or even as a database.

Cleaning messy datasets:

Open Refine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and exten ding it with web services and external data. For example, if you see that one and the same country is in the database with multiple different names (e.g. “US”, “USA” and “United States”) you can quickly clean this up with Refine.

Use R, a powerful statistical software, free, with a huge community. There are numerous examples how R can be used by journalists.

REGEX: For database queries, know that there is Regex (regular expressions). Here is a link to an excellent, interactive tutorial. 


Other: Every year, at NICAR Chrys Wu (@macdiva) compiles a list of sessions, tutorials. It might be a bit overwhelming, but if you want to advance beyond the basics, her annual collection is a rich and helpful resource.

Checking, validating, investigating: The suggestion is to split learning how to use a spreadsheet program from data validation. Data validation means that you take a hard look whether the data is actually correct, whether there are mistakes in there. For example: Calculate whether the sums of a budget are actually correct, which is simple. Try to find outliers and ask yourself whether they are “too good to be true”. The great thing about being a journalist is that both cases can lead to an important story: If you find a surprisingly positive development it can lead to a story. Same goes for a blatant error, a false claim, a clearly negative development.

Approach data validation critically, but not cynically: Data collections are traces of human activity. If they are done wrong, they are wrong - which should be reported. If they are done well, they can help to visualize a context, development, an issue, an event better than words, interviews or photos in some cases.

Collection of input (to be integrated step-by-step above)


You can of course tweet your contribution. Please use these hashtags to do that:
#ddj #perugiapledge #ijf17


Final document:

The final document will be presented at the International Journalism Festival in Perugia in this session: Data Journalism globally: Where is it is heading?

For questions please contact Mirko Lorenz or Bahareh Heravi.

ijf17 - PerugiaPledge.png