Yesterday I attended the Data and News Sourcing workshop co-organised by the Media Standards Trust and BBC College of Journalism. There were two sessions running in parallel and Martin Belham will no doubt write has written about the crowd sourcing news and crime data sessions I did not attend.
The first session was titled Open Government data, data mining and the semantic web. This is an area I have a degree of familiarity with but it was interesting to hear stories of wrestling with data on a day to day basis. As well as the general lack of journalism being done with the data published to date.
Alex Wood gave an interesting account of a BBC World Service data journalism project looking at the global occurrence of road accidents. Working initially with World Health Organisation data Alex made the point that the initial dataset helps you ask the right questions but does not necessarily give you the final answer. Scatter plots quickly show anomalies in data and raise issues about how data is collected and categorised. This is when you need to start talking to the people who understand and collect the data.
Chris Taggart spoke of similar challenges when dealing with local data. As the founder of openlylocal.com he has dedicated a number of years to the task of collecting data about and from local councils and politicians. The messiness of data, varying formats and the lack of id’s to stitch datasets together means openlylocal would not exist if it had not been for passionate individuals dedicating time and resources to it. Chris’s most recent collaboration Open Corporates represents a similar labour of love and co-founder Rob McKinnon spoke of the challenge of stitching together datasets when governments and councils have no common notion of a corporation running through their data. Nigel Shaldbolt was quick to point out these common addresses (uris) to tie together disparate datasets is an important outcome of the data.gov.uk work and its embracing of a approach that works with the web.
Aside from the challenges of collecting the data and shaping it into something meaningful Alex emphasised the importance of telling stories with the data. Kevin Marsh (College of Journalism) made an interesting point that this is not always the case. Newspapers for years have provided data alongside stories; weather information, tv listings and stock prices. Much of the value of a newspaper is shared between stories and pure information and in a digital environment this is no different. In fact the collection of a dataset like openly local can facilitate services that provide useful information at a very local and targeted level. This, it was suggested, is the modern equivalent of the local newspaper’s role as information/data provider. Very local data has potentially a huge, and currently unrecognised, value to audiences.
It was clear from the discussion that working with data is an involved and time intensive process. Perhaps for this reason we have not seen more stories or applications come out of the initial successes of opening data in the UK. Chris did question why organisations like the BBC were not biting his hand off to get access to the open corporates dataset.
The second session was Expert sources in science and health. There was some interesting discussion regarding the use of expert sources in media and the role organisations like the Science Media Centre play in ensuring expert scientists are available. Ben Goldacre raised the issue of transparency in journalism and how few stories link through to original sources like research papers. He cited his long running battle with the BBC to link to sources from their science stories.
Mark Henderson of The Times spoke about the difficulties for journalist in linking to sources. Often at the time of writing a story a research paper will not be published online or be hard to find. Even if you do link to the source publishers are notorious for changing the url’s of papers. Some of these issues have been resolved with the introduction of document object identifiers (DOI) but these are not used consistently across the publishing community.
The relative importance of communication and investigation to journalism were questioned. The panellist emphasised the importance of communication for the mainstream press relaying the developments of science and the playing out of the scientific process. This is in contrast to pointing to experts as a source of facts. Investigative journalism in science is particularly difficult as it requires a deep technical expertise in a given area. Because of this it was suggested the professional blogger community is better placed to provide this analysis work as they often focus solely on their particular area of expertise, have freedom to explore topics and the range of necessary contacts to draw upon.
It did occur to me that the challenges of investigative journalism in science are comparable to the challenges to journalism being done with the datasets currently being opened up. It will take a community of passionate experts to interrogate, analyse and uncover stories in very complex and specialised datasets. In order to be sustainable and to encourage the best data journalism, like the blogging community, it will need the support of the mainstream media.
At the same time there is a role for technologies like Linked Data to reduce the cost of collecting and analysing data and so make data sourced journalism easier to do. A common theme across the sessions was the need for clear and persistent urls for both documents, to aid linking to sources, as well as urls for common things (like corporations) to enable the joining up of the data where the interesting stories lie. As Ben Goldacre said the information architecture of journalism needs to be vastly improved.