Wikipedia as controlled vocabulary

Posted on July 28, 2008
Filed Under Semantic Web, information architecture | 2 Comments

Chris Sizemore and I gave this presentation a while back at the Essentials of Metadata and Taxonomy event. The presentation looked at the use of Wikipedia as a source of controlled vocabulary.

Chris covers most of the issues we discussed in his post. But one thing we did not cover is the interesting way Wikipedia handles categories. I have tried to find discussion about this approach with no success. It will be an interesting issue to raise in the ISKO mail group.

As opposed to making one entry a parent of another, as we might do in a taxonomy. In Wikipedia categories (groupings of concepts) are treated as a completely different type of entry. This means we can have groups like Dog-related_professions_and_professionals without falling into the trap of treating them like concepts (entries) in their own right.

The two activities of defining a concept and grouping associated concepts are separated. The result of this is Wikipedia entries remain topics of interest that are discrete and clearly defined. This make them ideal for sanity checking that our choice of topic aggregation page are discrete and clearly defined as discussed in the previous ‘what makes a good topic aggregation page?’ Post.

Update: Bob Bater has written a nice post on  this topic on the ISKO UK blog.

Conclusion? The Wikipedia categorization system reflects but does not consistently apply the principles of KO as expounded in the formal literature. It is nevertheless interesting because it might well represent what results when folksonomy meets formal KO and agrees to a compromise.

http://iskouk.wordpress.com/2008/09/22/wikipedias-approach-to-categorization/

Comments

2 Responses to “Wikipedia as controlled vocabulary

  1. Katharine on October 23rd, 2008 10:40 am

    This is fascinating – and engaging with information retrieval issues in a way that much academic work (at least as evidenced in peer reviewed journals) is not. However, I must say I am uncomfortable with the idea that wikipedia becomes a de facto standard controlled vocabulary on the web for a number of reasons:
    1. Should we be creating standards out of tools which were not intended to be used for that purpose?
    2. In my opinion, conceptual categories and the names we give to them are deeply connected to the world-views that underly them. By even accepting that there can be a universal controlled vocabulary, we are imposing a structure created out of what is actually not a huge community
    3. Which is my third point, there are a massive number of passive users on the web who are not wikipedia authors and do not link to their content (perhaps because they are not assured that social computing is a good means of ensuring accuracy and authority, a position I am in myself) – is it wise to accept a ‘universal controlled vocabulary’ which has been created by a small self-selecting group whose primary aim is not making information more easily available and better organised?

  2. Fran on November 4th, 2008 7:24 pm

    Being free to use means that people are going to use Wikipedia in any way they can, even when it only does the job averagely well. I agree that there is a real danger it becomes dominant and imposes its viewpoint in all sorts of ways. However, I am torn, because this sort of thing happens all the time (Microsoft, Google, the BBC itself) and if Wikipedia’s “freeness” helps people to be innovative, that is probably a good thing. I do worry that we may have reached “peak Wikipedia” – like “peak oil” – it may be a finite resource. If all the people willing to contribute charitably decide they have other more urgent things to attend to (like getting paid for their time now their pensions have been swallowed up in the credit crunch) it could grow less and less reliable while it becomes more and more embedded. The longer it keeps going, the harder it will be to replace it with anything bespoke or higher quality (i.e. more expensive), especially if it has prompted a total de-professionalisation of the reference/information provision industries.