In particular I found the following quote from Dan Conver compelling:
“[The] raw material of this information economy is essentially like oil shale: the latent value is obvious, but the cost of extracting these information resources from today’s existing deposits (think web archives) is so high given today’s technology that no one is going to spend a dime to start the project.”
Stijn comments further on this point::
“…Both approaches [emphasis on structured news formats, and rock solid metadata at the story level] wish to extract more value from journalism through structure and relationships. Both approaches have you trade a little hurt during content creation for yet-to-materialize advantages. That’s unavoidable — no such thing as a free lunch.”
Essentially the annotation of news articles with controlled vocabularies. I see the potentail impact of the semantic web slightly differently though do not disagree in principle that the annotation of journalist output is a useful activity. I think perhaps too much emphasis is placed on the extraction of knowledge from editorial assets. I believe the oil shale of journalism is the by product of the process itself.
Guardian is a case in point. I get the impression that the datablog started out with a hunch it might be of interest to publish some of the spreadsheets that Guardian journalists collected and curated in the process of writing stories. What has been particularly remarkable is that the success of the datablog has probably been greater than the Open Platform. Why? Because it gave access to something that had not been available before.
These to me are the true oil shale of journalism.
How the semantic web and linked data play a part in my opinion is no more than reducing the cost and ease of using these data sets in a useful way. I have written before about how we have used semantic web technologies at the BBC to build websites. The combining of BBC editorial assets, with commercial data and open data sources enabled the BBC to do things they would never have dreamed of doing with internally managed data sets or bespoke taxonomies.
By using linked data techniques and simple tools like Google Refine (with the Deri RDF extension) it would be relatively simple to map the datablog spreadsheets to common RDF vocabularies and identifiers. These data sets could then be used to add context, navigation and weave new narrative threads through the Guardian’s editorial output or anyone else’s for that matter. In much the same way Wildlife Finder has used open data sets like Dbpedia to support the delivery of BBC wildlife programmes.
The main issue for the semantic web/linked data is cost incurred due to the current lack of expertise, the barrier to learning new things (in places complicated and unintuitive things), and the relative immaturity of the technologies. With time this will change and savings made from the ease of integrating disparate data sets and the value of mining the ‘raw material of this information economy’ will justify the costs.