Collections part 3: You as a collection

Posted on March 20, 2010
Filed Under information architecture, journalism | Leave a Comment

We express our identities through our collections. Online these collections take the form of Amazon wishlists, Last fm playlists and lists of friends on Facebook. Perhaps less consciously we have search histories, purchase profiles and a trail of cookies picked up from website visits.

In David Siegel’s book Pull he posits a future where our personal details are consolidated in a private space in the web:

Your personal data locker will store your personal ontology. It helps you find television shows and movies, it helps you learn about wines you might enjoy, it helps you find bargins online, plan a trip, find events you might want to attend, or spot a new restaurant, and it can help with dating life if you’re single. Hook it in to your everyday activities and you’ll build an ontology with millions of triples, all of which make your data locker into a ’smart’ virtual assistant that continues to learn as you go through the day.

Few news organisations have attempted to bridged this gap between the news story and our personal profiles.  The New York Times perhaps being the exception taking a users LinkedIn account, looking at your area of work and then serving contextual stories and ads related to your area of work.

In some respects SEO (and the optimising of keywords in story titles) could be considered a crude attempt by news organisations at mapping stories to the profiles (keyword search patterns) of their intended audience. We have recently seen a move away from SEO effort in the news industry in favour of building more meaningful relationships with loyal customers.

I suspect with time we will see a focus of effort on mapping the model of the news domain to the domain of the user (personal data locker). Relating the context of the story to the things of importance in our world; the topics, events, work, people and hobbies.

Collections part 2: Collections of things

Posted on March 19, 2010
Filed Under information architecture | Leave a Comment

The initial impetus for writing this series of posts was the increasing presence of information architectures driven by metadata and the impact this has on editorial curation.

How does moving from a document focused view of the world to a thing focused view change the role of the collection?

We took Wildlife Finder as our example. Wildlife Finder is built upon a domain modelled approach and dynamically aggregates content and data around the ‘things’ in the model. Collections can then be used to build editorial layers on top.  As Tom Scott points out:

Collections allow us to curate a set of resources – to group and sequence clips and other resources to tell stories like the plight of the tiger or the years work of the BBC’s natural history unit.

Tom goes on to say that by releasing the data for Wildlife Finder it means that “our audiences and ‘users’ could also build stories”.

Perhaps the most striking example of how the user creation of collections can be used to tell stories is by the use of data filtering tools such as Parallax and Microsoft Pivot.

In Pivot’s own words:

In short, datasets are organized as collections. Results can be as granular or as big-picture as the user desires, and correlations and patterns are easy to see and examine through powerful but simple visualizations. Imagine browsing through thumbnails representing Kiva loans, then sorting the loans by the different types of businesses they helped established.

In order for Pivot to work datasets need to be in a certain format. I suspect that Linked Data will lend itself to these types of tools and products like Wildlife Finder that have focused on curating context as opposed to curating content will benefit greatly.

Collections part 3: You as a collection

Collections part 1: Collections of links

Posted on March 17, 2010
Filed Under information architecture, journalism | Leave a Comment

In my last post I presented the case for the use of collections as an editorial layer on top of a metadata driven site.  One of the most common types of collection in online journalism are lists of links around a story – commonly referred to as link journalism.

Link journalism is linking to other reporting on the web to enhance, complement, source, or add more context to a journalist’s original reporting.

Scott Karp

How can these collections of links be best used to serve the core principles of journalism?

The BBC’s use of external links to cite sources has been criticised for not linking to the original source of a story.

Paul Bradshaw has written an excellent post on the subject and makes the following point:

In an online environment one of the biggest signals in how we build a picture of the trustworthiness of someone or something is the links surrounding it. Who is that person friends with? What does this website link to? Who gathers here? What do they say? What else does this person do? What is their background, their interests, their beliefs?

While we increasingly talk about the role of publishers as curators of content [caveat], we should perhaps start thinking about how publishers are also curators of context.”.

I find the distinction between the curation of content and the curation of context very useful.  Paul highlights the value of using links to place the story in its context as opposed to merely pointing to similar content about the same story.  In addition it also puts the source referenced by the BBC into context by saying something about how the BBC regards it.

BBC Journalism currently use several quite different strategies for linking to both related BBC stories and other sites on the web. The most common are the ’see alsos’ and the ‘related internet links’ that appear on stories.

related links on the BBC News site

These links are picked by the journalist as related in some way to the story.  Generally the links sit in a template that is reused for similar stories so they tend to be fairly non-specific, often linking to home pages of sites rather than deep links to sources.  They typically perform poorly in terms of click-throughs.

A different strategy is illustrated by the BBC Sport football gossip column. The column is created daily and provides an overview of the day’s football gossip. Short summaries of stories are written and then published with a link to the full story in the original source.

In comparison to the related internet links on story pages a significant amount of BBC Journalism’s external referrals go through this one page. The gossip column is a testament to how external links can be used in a meaningful and useful way.

Collections part 2: Collections of things

The importance of curation in a metadata driven information architecture

Posted on March 6, 2010
Filed Under Semantic Web, information architecture, journalism | 5 Comments

How do you retain a sense of editorial voice and craft as information architectures become increasingly metadata driven?

In my work with BBC Journalism we have been attempting to take the philosophy of Tom Scott’s Wildlife Finder and applying it to News and Sport. Our starting point has been the Winter Olympics.

The step change was in creating a populated domain model for the games. The things that made up this vocabulary were used by journalists to tag their stories. The tagged stories were then aggregated automatically onto sports indexes. This allowed us to create many more indexes than would have been possible with manual management.

Overall the project was a great success but it raised some interesting questions. The design of the indexes was created by the user experience team. The algorithms were written by developers and informed the ordering of the stories.  This left journalists to simply tag stories and watch their stories appear on indexes they had no control over. It certainly felt like their influence on part of the product had moved a step away from them. This was reflected in journalists’ feedback and the frequent questions about how to game the system to control the order of stories on indexes.

So the questions are:

  • How do you enable the journalists to feel in control of the story telling?
  • How to do this without introducing tags for value judgements?
  • How do you ensure that the site has voice and feels editorialised – as opposed to being simply lists of dynamically aggregated data?

Tom Scott has convinced me the answer is the concept of the collection (and variations on this theme). The collection replicates the manually managed index of stories with a structured list of things. The Wildlife finder example is David Attenborough’s favourite moments. A very simple example for sport might be the best goals of the World Cup. Although this does not seem particularly radical, the beauty of it is that the curatorial layer is built on top of a domain modelled approach.

Because the things that live in our model are associated with assets and data,  the journalist, in selecting a thing to include in a collection pulls data through the system.

Take the same example of the best goals of the World Cup. A journalist would select their top ten goals of the tournament. As the journalist identifies and pulls things through the system into the collection the context around those goals are pulled with them. So the game they were scored in, the importance it had and information about the goal scorers record in the tournament.

Why it is not tagging:

It is important to distinguish the process of creating a collection from the act of tagging. Tagging associates content with things in the domain model. Journalists tagging stories ensure we build up a consistent mapping of the editorial content to the things (and/or concepts) in our domain.

The process of creating collections is closely tied to the editorial judgement of those curating them. Tagging clips with the tag good goal and then anonymously aggregating them is not.

Why it empowers journalists:

The Guardian has found the balance in their topic pages by allowing an editor to pick a story to be displayed at the top of every automated page. But does this go far enough? This still sits very much within the document model of storytelling. What a collection (or similar) begins to allow is a true web adaptation of a news story.

It is the curatorial layer and the use of collections that will allow organisations to reflect voice, perspective and expertise.  How this will improve the experience for the news reader will be the subject of this blog over the forthcoming months.

Could the means by which news organisations adapt their story telling using tools like collections be the key to their ongoing survival?

Collections part 1: Collections of links

News Linked Data Summit and the call for native to the web vocabulries

Posted on January 25, 2010
Filed Under Semantic Web, information architecture, journalism | Comments Off

I recently spoke at the News Linked Data Summit, a pan-news industry event looking at the potential of Linked Data.  Martin Belham and the Media Standards Trust have already blogged about aspects of the day but I wanted to add my slides and a perspective on the discussion.

A topic that interests me is the relationship between Linked Data and controlled vocabularies, to steal a phrase from Tom Coates (native to the web), and Linked Data’s call for vocabularies native to the web.

Let’s look at it this way – if you were asked to creating a web presence for an individual or organisation today you might propose the following:

  1. Make interesting documents public.
  2. Publish using web standards such as HTML.
  3. Provide useful information about the individual or organisation.
  4. Link to similar documents where you can.
  5. Then if the documents are useful and you are gracious in linking to others they will link back to you.

It is apparent that Linked Data asks the same of controlled vocabularies.

  1. Make your vocabularies public.
  2. Publish using the web standards of Linked Data.
  3. For each concept provide useful information for humans and machines.
  4. Link to other vocabularies (map concepts) where you can.
  5. If you have provided a useful set of concepts and relationships others will link back to you, increasing the value of your CV.

It could seem crazy at the moment to give away your taxonomy for free but it would have been a similarly difficult argument to convincing an organisation to have a web presence ten or fifteen years ago.

Linked Data is already showing the benefits of this approach. When we open-source vocabularies we can be much more ambitious in the richness of relationships and complexity of structures.  In my talk I mentioned that the, wonderful, Wildlife Finder would not have been feasible had the ontologies not been publically available to use and build upon.  A Wildlife Finder built on a far simpler BBC bespoke taxonomy of animals, habitats and behaviours would have been a far poorer and more costly proposition. Martin expands on this in his Guardian post.

Recently we have seen the likes of LCSH and New York Times vocabularies joining the Linked Data cloud and becoming web native vocabularies. I suspect the success and survival of many vocabularies will depend on how quickly their owners can grasp the importance of becoming open and native to the web.

News Linked Data Summit – BBC News and Linked Data

View more presentations from silveroliver.
Update

This comment from Peter Krantz articulates the data publishing process and emphasises the role of vocabularies.

1. Publish whatever you have in whatever format it currently is in.
This provides data for people to start tinkering with and ask
questions about.
2. While data is out there, start thinking about the context it lives
in. We are looking at harmonizing the way agencies publish their
vocabularies as a first step (e.g. OWL).
3. Gradually adapt your data to make it use common identifiers for
common things.

keep looking »