In my last post I presented the case for the use of collections as an editorial layer on top of a metadata driven site. One of the most common types of collection in online journalism are lists of links around a story – commonly referred to as link journalism.
Link journalism is linking to other reporting on the web to enhance, complement, source, or add more context to a journalist’s original reporting.
How can these collections of links be best used to serve the core principles of journalism?
The BBC’s use of external links to cite sources has been criticised for not linking to the original source of a story.
Paul Bradshaw has written an excellent post on the subject and makes the following point:
In an online environment one of the biggest signals in how we build a picture of the trustworthiness of someone or something is the links surrounding it. Who is that person friends with? What does this website link to? Who gathers here? What do they say? What else does this person do? What is their background, their interests, their beliefs?
I find the distinction between the curation of content and the curation of context very useful. Paul highlights the value of using links to place the story in its context as opposed to merely pointing to similar content about the same story. In addition it also puts the source referenced by the BBC into context by saying something about how the BBC regards it.
BBC Journalism currently use several quite different strategies for linking to both related BBC stories and other sites on the web. The most common are the ’see alsos’ and the ‘related internet links’ that appear on stories.
These links are picked by the journalist as related in some way to the story. Generally the links sit in a template that is reused for similar stories so they tend to be fairly non-specific, often linking to home pages of sites rather than deep links to sources. They typically perform poorly in terms of click-throughs.
A different strategy is illustrated by the BBC Sport football gossip column. The column is created daily and provides an overview of the day’s football gossip. Short summaries of stories are written and then published with a link to the full story in the original source.
In comparison to the related internet links on story pages a significant amount of BBC Journalism’s external referrals go through this one page. The gossip column is a testament to how external links can be used in a meaningful and useful way.
How do you retain a sense of editorial voice and craft as information architectures become increasingly metadata driven?
The step change was in creating a populated domain model for the games. The things that made up this vocabulary were used by journalists to tag their stories. The tagged stories were then aggregated automatically onto sports indexes. This allowed us to create many more indexes than would have been possible with manual management.
Overall the project was a great success but it raised some interesting questions. The design of the indexes was created by the user experience team. The algorithms were written by developers and informed the ordering of the stories. This left journalists to simply tag stories and watch their stories appear on indexes they had no control over. It certainly felt like their influence on part of the product had moved a step away from them. This was reflected in journalists’ feedback and the frequent questions about how to game the system to control the order of stories on indexes.
So the questions are:
- How do you enable the journalists to feel in control of the story telling?
- How to do this without introducing tags for value judgements?
- How do you ensure that the site has voice and feels editorialised – as opposed to being simply lists of dynamically aggregated data?
Tom Scott has convinced me the answer is the concept of the collection (and variations on this theme). The collection replicates the manually managed index of stories with a structured list of things. The Wildlife finder example is David Attenborough’s favourite moments. A very simple example for sport might be the best goals of the World Cup. Although this does not seem particularly radical, the beauty of it is that the curatorial layer is built on top of a domain modelled approach.
Because the things that live in our model are associated with assets and data, the journalist, in selecting a thing to include in a collection pulls data through the system.
Take the same example of the best goals of the World Cup. A journalist would select their top ten goals of the tournament. As the journalist identifies and pulls things through the system into the collection the context around those goals are pulled with them. So the game they were scored in, the importance it had and information about the goal scorers record in the tournament.
Why it is not tagging:
It is important to distinguish the process of creating a collection from the act of tagging. Tagging associates content with things in the domain model. Journalists tagging stories ensure we build up a consistent mapping of the editorial content to the things (and/or concepts) in our domain.
The process of creating collections is closely tied to the editorial judgement of those curating them. Tagging clips with the tag good goal and then anonymously aggregating them is not.
Why it empowers journalists:
The Guardian has found the balance in their topic pages by allowing an editor to pick a story to be displayed at the top of every automated page. But does this go far enough? This still sits very much within the document model of storytelling. What a collection (or similar) begins to allow is a true web adaptation of a news story.
It is the curatorial layer and the use of collections that will allow organisations to reflect voice, perspective and expertise. How this will improve the experience for the news reader will be the subject of this blog over the forthcoming months.
Could the means by which news organisations adapt their story telling using tools like collections be the key to their ongoing survival?
I recently spoke at the News Linked Data Summit, a pan-news industry event looking at the potential of Linked Data. Martin Belham and the Media Standards Trust have already blogged about aspects of the day but I wanted to add my slides and a perspective on the discussion.
A topic that interests me is the relationship between Linked Data and controlled vocabularies, to steal a phrase from Tom Coates (native to the web), and Linked Data’s call for vocabularies native to the web.
Let’s look at it this way – if you were asked to creating a web presence for an individual or organisation today you might propose the following:
- Make interesting documents public.
- Publish using web standards such as HTML.
- Provide useful information about the individual or organisation.
- Link to similar documents where you can.
- Then if the documents are useful and you are gracious in linking to others they will link back to you.
It is apparent that Linked Data asks the same of controlled vocabularies.
- Make your vocabularies public.
- Publish using the web standards of Linked Data.
- For each concept provide useful information for humans and machines.
- Link to other vocabularies (map concepts) where you can.
- If you have provided a useful set of concepts and relationships others will link back to you, increasing the value of your CV.
It could seem crazy at the moment to give away your taxonomy for free but it would have been a similarly difficult argument to convincing an organisation to have a web presence ten or fifteen years ago.
Linked Data is already showing the benefits of this approach. When we open-source vocabularies we can be much more ambitious in the richness of relationships and complexity of structures. In my talk I mentioned that the, wonderful, Wildlife Finder would not have been feasible had the ontologies not been publically available to use and build upon. A Wildlife Finder built on a far simpler BBC bespoke taxonomy of animals, habitats and behaviours would have been a far poorer and more costly proposition. Martin expands on this in his Guardian post.
Recently we have seen the likes of LCSH and New York Times vocabularies joining the Linked Data cloud and becoming web native vocabularies. I suspect the success and survival of many vocabularies will depend on how quickly their owners can grasp the importance of becoming open and native to the web.
This comment from Peter Krantz articulates the data publishing process and emphasises the role of vocabularies.
1. Publish whatever you have in whatever format it currently is in.
This provides data for people to start tinkering with and ask
2. While data is out there, start thinking about the context it lives
in. We are looking at harmonizing the way agencies publish their
vocabularies as a first step (e.g. OWL).
3. Gradually adapt your data to make it use common identifiers for
Having just recovered from last week’s London Linked data meet up. I thought it was time to collect together the talks and commentary from the day.
‘The day was a storming success, with talks and presentations from all over the Linked Data community: from academia to startups. I think the organisers were slightly overwhelmed, because in the end there were nearly 200 people there, making use of the Talis-sponsored bar well into the evening. Apart from being a good opportunity to catch up with people, this meetup had the feeling of a guild-meet of Linked Data professionals—with lots of different perspectives over similar problems.’
Here are links to the presentations so far and I will add the rest as they become avaliable:
Tom Scott / Yves Raimond (BBC)
The BBC, following the Linked Data principles, now publishes a URI for every TV and Radio programme it broadcasts this allows people to browser by schedule, genre, format and a-z.
More recently we have published URIs for music artists, animal species and habitats – these pages not only provide useful information in their own right but also allow us to re-contextualise the programme information helping users to discover new content and new patterns.
Leigh Dodds (Talis)
This talk will introduce the dataincubator.org project which, supported by the Talis Connected Commons scheme, provides an umbrella project for publishing public domain linked data, with the aim of demonstrating to the original publishers the benefits of Linked Data, as well as a means to build on the community’s efforts. The talk will review the project and some of the datasets that have currently been made available.
Andrew Walkingshaw (Timetric)
“Time to build: storing, sharing and analysing statistics with Timetric, a Web-native service for managing numbers”
Timetric is a Web service which lets users upload, download, visualize and set up calculations on over a hundred thousand different measurements, the values of all of which are tracked over time. But how would you build that, and when you have, who’d want it? In this talk, we’ll discuss the lessons we’ve learned in building a service for sharing open data on the Web and in building a business around that service.
Michael Smethurst, Matthew Wood (BBC)
Georgi Kobilarov (Freie Universität Berlin / DBpedia)
“Integrating Linked Data”
Nigel Shadbolt (University of Southampton)
“Hard Research Challenges in the Web Of Linked Data: The EPSRC EnAKTinG Project”
The UK Engineering and Physical Sciences Research Council (EPSRC) has funded a three year two million pound project at the University of Southampton to investigate the challenges represented by the Web of Linked Data. Nigel Shadbolt and Tim Berners-Lee are two of the Principal Investigators on this project. In this brief presentation the projects aims and ambition will be outlined – together with progress to date.
Libby Miller (BBC / NoTube)
“Beancounter – telling you about you”
Increasing automation means that lots of data is available about what you do, including what you watch and listen to. This means that companies or researchers can mine information about your activities and use them to make predictions about what you might like, and what they might be able to sell you. Beancounter uses attention data from multiple sources, enhanced by linked data, to tell you what you are *really* interested in – rather than what you *think* you are interested in. It puts the control about what sources can be mined in your hands, and limits what companies can do with the outputs. Beancounter is a product of the NoTube EU project.’
Richard Cyganiak (DERI Galway)
“Sig.ma – Live Views on the Web of Data”
Increasing amounts of high-quality data are being published on the web of data, but a lack of applications for searching and browsing it makes access and exploration difficult. Sig.ma is a new user interface that improves upon previous ones by offering fine-grained control over source selection, fuzzy entity matching, and schema and value consolidation. Sig.ma is online at http://sig.ma/… and provides the fastest way yet to get an overview about the data available on a given topic.
Jun Zhao (University Oxford)
“Linked Data for Connecting Medicine Knowledge”
Mischa Tuffield / Steve Harris (Garlik)
“Making FOAF useful: http://foaf.qdos.com/ “
Since the beginning of the Linked Data Movement, a fair chunk of the resolvable RDF found on the web has been FOAF data. This talk will involve a brief overview of what FOAF represents, a list of the services we provide, how we go about saving public and private FOAF data, whilst presenting insight into the technologies used to underpin the services on foaf.qdos.com.
Ian Millard (University of Southampton)
The RKBExplorer.com application provides a simple interface over multiple Linked Data sources to assist with the discovery and exploration of related activities with the academic research domain.
This talk will briefly summarise issues and experiences regarding interoperation of multiple sources, and outline some of the services we offer that can be used by all.
Panel: Government Data
Chair: Carol Tullo (Office of Public Sector Information)
Paul Miller (Cloud of Data)
Nigel Shadbolt (University of Southampton)
Mark Birbeck (webBackplane)
John Goodwin (Ordnance Survey)
‘It gave a good sense of what is happening at the moment with Linked Data and what the issues are. Tim Berners-Lee (inventor of the Web) and Nigel Shadbolt talked about the decision to prioritise UK government data within the Linked Data project – clearly it is of great value for a whole host of reasons, and a critical mass of data can be achieved if the government are on board, and also we should not forget that it is ‘our data’ so it should be opened up to us – public sector data touches all of us, businesses, institutions, individuals, groups, processes, etc.’
Thank you to Carol Tullo for doing such a good job of chairing the session.
Panel: Future of Journalism
Chair: Paul Bradshaw (Online Journalism)
Martin Belham (The Guardian)
John O’ Donovan (BBC)
Leigh Dodds (Talis)
Tom Heath’s “Linked Data – The Story So Far” was a fantastic way to finish the evening and really captured the challenges that lie ahead.
One point that I thought was particularly interesting was the potential role of Linked Data for SEO.
Consuming Open Linked Data (LOD) can help you publish more url’s for things (for example a music artist or country). These nodes act as topical points of aggregation for resources on your site but also increase the surface area ( the number of useful points of access) for search engines to get at. In addition Linked Data can also help in scaling cross-linking between nodes and resources. Which is really the subject of the paper.