Media meets the Semantic Web
Posted on July 2, 2009
Filed Under Semantic Web | Leave a Comment
Georgi and I presented a jointly written (BBC, DBpedia and Rattle) paper at the European Semantic Web Conference a couple of weeks ago. My half of the presentation is avalible on slideshare.
One point that I thought was particularly interesting was the potential role of Linked Data for SEO.
Consuming Open Linked Data (LOD) can help you publish more url’s for things (for example a music artist or country). These nodes act as topical points of aggregation for resources on your site but also increase the surface area ( the number of useful points of access) for search engines to get at. In addition Linked Data can also help in scaling cross-linking between nodes and resources. Which is really the subject of the paper.
Web-scalable narratives
Posted on November 23, 2008
Filed Under Semantic Web, information architecture | 4 Comments
As we build larger and larger websites it becomes increasingly difficult to scale meaningful user journeys. Success is dependent on indentifying your key user journeys (narrative structures) and ensuring these can be dynamically populated as the site grows.
Some of the largest and most successful websites have taken simple narrative structures and made them scale successfully. In the mold of the fairytale “once upon a time” and “they all lived happily ever after” these sites have come to own their simple narrative structures and this has played a significant part in their success. Some familiar examples:
- Customers Who Bought This Item Also Bought - noun (book) verb (also bought) noun (book)
- Buy it now - noun (user) verb (buy) noun (item)
- Such and such wrote on your Wall - noun (friend) verb (wrote on) noun (wall)
These simple noun-verb-noun narratives should be familiar and are very much part of the brand of these sites. This is a result of them getting these narratives to scale and ensuring there is the quality of data to back them up.
Now in order to make sure these narratives are applied consistently as the site accumilates content these structures need to be understood by your application. This means the noun-verb-noun structures must be encoded into your domain model ( and so your database) from the outset. Designing the site in this way means that as new content, pages and data are added to the site these narrative structures will be automatically created. This guarantees new pages are incorporated into the site and automatically become a scene in the sites larger story.
Weak and strong narrative structures
As we move from flat published pages to large dynamically created sites we need to think more and more about the primary narrative structures. These user journeys will be encoded into the very core of the site and you will want to be confident you have selected the right ones and that there is the data to back them up.
One of the strengths of the BBC News site is its contextual navigation with strong narrative. For example a BBC News story about Kosovo will carry an explicit user journey to the background story of the independence of Kosovo. This is in contrast to tags. Tags help to open up new user journeys but are weak in narrative, taking the form ‘this content is about this tag’. Related links also often fall into this category of weak narrative. One of the problems with rich narrative structure is that they are difficult to scale, this poses a significant challenge.
Web-scale narratives
When George Lucas was looking for a narrative structure for the beginning of his Star Wars films he used a well understood simple narrative structure, ‘once upon a time’.
![]()
He knew that this would be something that his audience would immediately understand.
The dream of the Semantic Web project follows a similar logic. Take the simple narrative structures that have been so successful in creating user journeys within large scalable websites and apply them to the web at large. This means narratives (in the form of domain models and ontologies) that are not limited to a single site. Not just ‘people who bought this on Amazon also bought this’ rather ‘people who bought this on the web also bought this’ web-scale narrative structures. This will not only help create more coherent user journeys across the web but also provides more structure to help machine understanding.
Who killed the networked fridge?
Posted on November 8, 2008
Filed Under Semantic Web | 1 Comment
One of most memorable parts of the Euro IA conference was Adam Greenfield’s comment during his keynote regarding the networked fridge.
“Unless anyone here works for Philips, I’m fairly certain that nobody in this room wants or will ever buy a networked fridge.”
http://www.currybet.net/cbet_blog/2008/09/euroia2008_part1.php
Fair point but I wanted to revisit the concept with regard to the big challenge of this century; climate change and energy conservation. Thomas Friedman’s book Hot, Flat and Crowded is a nice summary of some of the issues and possible solutions.
The problem:
If we continue on our current path CO2 levels will double (to 560ppm) around the mid-century and will triple by 2075. A situation we have not been in for 650,000 years. We don’t know what it will be like to live in a 560ppm CO2 world let alone an 800ppm one.
“So now we have a target: We want to avoid the doubling of CO2 by mid-century, to do it we need to avoid emission of 200 billion tons of carbon as we grow between now and then.”
Thomas Friedman
Solutions:
Freidman identifies a number of targets that need to be met. One of them is to cut electricity use in homes, offices, and stores by 25%. A way that this might be achieved according to Friedman is to become more intelligent about energy use and the development of an Energy Internet. Energy distribution and consumption is currently stuck in the 50’s and has failed to embrace the IT revolution.
The concept of an Energy Internet was originally conceived in an Economist article:
“Energy visionaries imagine a “self-healing” grid with real-time sensors and “plug and play” software that can allow scattered generators or energy-storage devices to attach to it. In other words, an energy internet.”
http://www.economist.com/science/tq/displaystory.cfm?story_id=E1_NQSGJRR
Amongst other things this would mean more intelligent appliances in the home that can negotiate their energy needs with the grid as well as communicating to the homeowner the worst offenders in growing energy bills. Friedman imagines what it might be like to live with a smart grid.
“..an Energy Internet in which every device - from light switches to air conditioners, to basement boilers, to car batteries and power lines and power stations - incoporate microchips that could inform your utility of the energy level at which it was operating, take instructions from you or your utility as to when it should operate and at what level of power, and tell your utility when it wanted to purchase or sell electricity. You and your utility now have two-way communications.”
Thomas Freidman
So the smart fridge is not dead but it just won’t be doing the weekly shop for us, it will be helping save the planet (or at least your energy bill).
Clearly ubiquitous computing is closely tied to the Semantic Web. Until machines can parse the web on our behalf we are stuck with large screens so that we can parse the data for them. The smart fridge will need Semantic Web technologies and so link into a larger body of data about our energy use; where it comes from (clean or dirty), how it is being used in the home and the damage it is doing. Targeted advertising will illustrate how new, more efficient, appliances will impact our energy use and so on into the graph…
Currently we have little context to understand our energy use and context is king when it comes to education and driving real changes in behaviour. Perhaps the Energy Internet and the tackling of one of the big problems will be the making of the Semantic Web project.
Update: The Talis people have written a nice post about semweb and the home: http://blogs.talis.com/nodalities/2008/12/smart-stuff.php
URL’s for Information Architects
Posted on November 8, 2008
Filed Under information architecture | Comments Off
Deanna Marbeck and I presented recently at the EuroIA conference in Amsterdam. The research has come on since then building on the anecdotal evidence we presented it. We hope to publish a white paper later in the year with the full research findings.
Wikipedia as controlled vocabulary
Posted on July 28, 2008
Filed Under Semantic Web, information architecture | 2 Comments
Chris Sizemore and I gave this presentation a while back at the Essentials of Metadata and Taxonomy event. The presentation looked at the use of Wikipedia as a source of controlled vocabulary.
Chris covers most of the issues we discussed in his post. But one thing we did not cover is the interesting way Wikipedia handles categories. I have tried to find discussion about this approach with no success. It will be an interesting issue to raise in the ISKO mail group.
As opposed to making one entry a parent of another, as we might do in a taxonomy. In Wikipedia categories (groupings of concepts) are treated as a completely different type of entry. This means we can have groups like Dog-related_professions_and_professionals without falling into the trap of treating them like concepts (entries) in their own right.
The two activities of defining a concept and grouping associated concepts are separated. The result of this is Wikipedia entries remain topics of interest that are discrete and clearly defined. This make them ideal for sanity checking that our choice of topic aggregation page are discrete and clearly defined as discussed in the previous ‘what makes a good topic aggregation page?’ Post.
Update: Bob Bater has written a nice post on this topic on the ISKO UK blog.
Conclusion? The Wikipedia categorization system reflects but does not consistently apply the principles of KO as expounded in the formal literature. It is nevertheless interesting because it might well represent what results when folksonomy meets formal KO and agrees to a compromise.
http://iskouk.wordpress.com/2008/09/22/wikipedias-approach-to-categorization/
What makes a good topic aggregation page?
Posted on July 28, 2008
Filed Under information architecture, psychology and psychotherapy | Comments Off
Topic based aggregation pages are all the rage at the moment, David Weinberg comments on the addition of Huffington Post to the list of those adopting content aggregation. At the BBC we have also started to create are own Topic Pages. These pages are primarily designed to attract traffic from search engines or via browsing through contextual navigation.
In order to fulfill these goals I have been thinking about what makes a good topic for aggregation? To answer this question I went back to have a look at Donna Maurer’s discussion of basic level categories. Most of the introduction below comes from her presentation.
What are basic level categories?
How to spot a basic level category:
• A basic level category is somewhere in the middle of a hierarchy and is cognitively basic
• It is the level that is learned earliest
• Usually has a short name and is used frequently
• Highest level at which a single mental image can reflect the category
• There is no definitive basic level for a hierarchy it is dependent on the audience
Why are basic level categories relevant to Topic Pages?
Basic level categories have some characteristics that make them of interest to Topic Pages:
• Things are remembered more readily at basic level.
• People name things more readily at basic level.
• Languages have simpler names at basic level.
Donna explains…
In short, people naturally, at a deep cognitive level, deal easier with basic level categories.
It is important to understand that basic level categories are not just easier on a superficial level, because they are shorter or something. Cognitive scientists say that basic level categories are cognitively real. They seem to be ingrained in the human mind somehow, in a way that makes it easier for us to deal with basic level categories.
The fact that people name things more readily at basic level could lead us to hypothesize two things:
• When constructing a keyword search users are more likely to use basic level categories.
• Users are more likely to visit links labeled with basic level categories.
If this is the case then when selecting candidate Topic Pages we should look to select ones that sit at a basic level.
Basic level categories are good for SEO
Topic Pages are specifically designed to attract traffic from search engine queries and drive users to BBC content, ideally ranking highly for certain keywords in external engines. So they need to match users’ keywords as much as possible. The process of matching search keywords to pages is well documented in SEO and called keyword optimisation.
Selecting what level of granularity to create Topic Pages at is something we have been exploring in the range of Topics selected for the beta launch. A little research should help us identify the keywords that sit at the basic level of granularity. I attempted to do this by looking at the average UK keyword searches for a series of topics.
Taking a range of related concepts from different levels of a hierachy we might expect to find as we move up the tree the more we find people searching with the term, because it is more general and incorporates more concepts. The theory of basic level categories would challenge this and expect the middle basic level concept to be the most used.
Look at this example taken from Google UK on the 17 July 2008.
| Keyword | Average searches per month |
|---|---|
| animals | 1,830,000 |
| dogs | 3,350,000 |
| spaniel | 301,000 |
Here the keyword dogs holds the middle ground between abstraction and detail. I think this example illustrates that search logs could be used to assist in the identification of basic level categories.
Of course this is not always the case
| Keyword | Average searches per month |
|---|---|
| furniture | 16,600,000 |
| chairs | 2,740,000 |
| chippendale | 12,100 |
In this case the more abstract keyword is far more popular. This does not necessarily make it a better candidate for a Topic Page, for example it might be a difficult Topic to train due it being difficult to define.
As a rule popular search keywords indicate suitable Topics and in the majority of cases the more popular search keywords will sit at a basic level.
In the instances where super-categories are more popular consideration must be given to the extra problems this might cause the Topic Page, for example editorial overhead and awkward user navigation of the Topic as a link label.
Basic level categories are good for navigation
In addition to being good SEO practice I would also suggest that selecting basic-level categorises could improve user navigation. An example might be on contextual links where we present the name of a Topic as the link label. This assumption is based on information foraging theory.
Jared Spool has applied information foraging theory to web design with what he calls the “scent of information.” In order to efficiently and effectively forage for information on the web, seekers need to have a sense of where they are going. The design of a navigation system should provide users with an accurate “scent” that they can follow to their destination.
We have argued that basic level categories are more easily grasped than higher level categories. An example might be the categories cars (basic) as compared to vehicles (super). With the category vehicles you have a less clear sense, scent or image of what is included in this group. This could result in a lower click through rate of users from links to Topics.
Basic level catergories are good for editorial staff
Having monitored a number of Topic Pages for the last couple of months we are beginning to get a sense of what makes an easy to manage topic.
An easy to maintain Topic is:
• specific
• easily defined
• discrete
An example of an awkward concept would be crime. A Topic like this makes it hard for editorial staff to judge whether a piece of content should or should not be included. This is because it contains a list of concepts that will vary depending on who you ask and this list is liable to change over time. In the same way we talk about conceptually awkward categories for users a similar problem is presented to the editorial staff maintaining the pages.
This is apparent if you think about selecting a set of representative documents for training a Topic Page about dogs as opposed to a Topic Page about crime. As a Topic dogs is easy to train and easy to review. Content is fairly clearly about dogs or not. This is again an example of the benefits of selecting basic-level categories.
Conclusion
On reflection Topics like elected assemblies do not look like such a good idea. The individual elected assemblies as Topics would have made much better pages.
| Keyword | Average searches per month |
|---|---|
| Elected assemblies | 73 |
| welsh assembly | 2,740,000 |
Not only are they unlikely to be searched for but they are also confusing to users when shown as contextual links. In addition they are difficult to maintain for editorial staff, again because of their conceptual awkwardness.
The following recomendations should be considered when selecting a suitable topic for an aggregation page.
Pick Topics that sit at a basic level between detail and abstraction. These concepts can be identified through the following criteria:
• frequently used in keyword searches
• represent a discrete concept
• are easily defined
Wittgenstein’s Philsophical Investigations
Posted on February 10, 2008
Filed Under philosophy | Comments Off
The title of this blog comes from Wittgenstein’s Philosophical Investigations. In which he discusses the nature of language. I thought this quote was interesting particularly in the light of David Weinberg’s Every Thing is Miscellaneous, which calls for a rethink regarding our obsession with classification and hierarchy.
It will be possible to say: In language we have different kinds of word. For the functions of the word “slab” and the word “block” are more alike than those of “slab” and “d”. But how we group words into kinds will depend on the aim of the classification, and on our own inclination.
The types of associations and the members included in a group will depend on the ‘aim’ of this grouping. The classical model would suggest that we discover categories; they are in some way natural, necessary and objective. Wittgenstein challenges this model and encourages us to look at the context of the language use to understand the meaning of a categorisation.
For me this reinforces the point that any categorisation scheme is only really meaningful with consideration of its intended use and this is essentially what this blog is about.