12 thoughts on “Explicit and implicit metadata

  1. “but rather “controlled vocabularies+social tagging=better cataloging“.”

    So another possible solution is “better controlled vocabulary”, which is a bit different than “better cataloging” because while both are done by ‘catalogers’, CV maintenance happens at a different point in the process.

    If you’re doing social tagging from a CV, but there’s no term in the CV for “RDF”, then… where are we?

    Perhaps there should be a term for RDF. And/or perhaps the concept “Semantic Web” should have a lead-in term “RDF”, or a related term, or something to say that items on “Semantic Web” are likely to be on RDF too.

    Perhaps the controlled vocabulary itself should be maintained by a more ‘social’ method. (Ie, more open to non-certified volunteers). So someone could add a term for RDF, and/or add those relationships and lead-in terms. How do we keep the CV somewhat sensible while opening up it’s maintenance to more non-certified volunteers, or even the general public at large?

    (Incidentally, I’m still interested in the idea of considering the list of Wikipedia article topics to BE a controlled vocabulary of concepts, something I thought of before. What if you could tag an item in the library not with free text, but with any concept that exists in Wikipedia?)

    And of course software should get better at using these controlled vocabularies. If “Semantic Web” DID have a lead in term “RDF”, then when you enter RDF in the catalog, the catalog should (automatically or offering you the option to) expand your search to include “Semantic Web” as well.

    1. Jonathan, you are right again, with “better controlled vocabulary”. But you could also consider this a part of “better cataloging”. But, yes, maybe the controlled vocabularies aren’t flexible enough?

      Your point about “tagging an item in the library not with free text, but with any concept that exists in Wikipedia”, well, that is Linked Data!

  2. I assume that like social tagging, you are thinking of automatic metadata generation as a complement to cataloging by human specialists? Especially if the goal is “applying intelligent analysis to objects, of all types.”

    1. Shannon, good question. Well, yes, that would be an option, especially as long as the analysing software is not intelligent enough. But suppose the software was perfect, or let’s say better than human catalogers, we wouldn’t need humans anymore, would we?

    1. Bryan, thanks! So there are actually differences in controlled vocabularies worldwide. Apparently the LoC Subject Headings Authority file is better in this respect than the Dutch National Authority file.
      Note that your link points to the recent Linked Data/RDF interface to the LoC Authority files!

  3. What about catalogue enrichment with tables of content? We scan the tocs of all new books an build search indexes over the ocr. TOCs give more relevant information than a full text search over all the text of the book.

    1. Hi Anette, Interesting idea! a kind of “concise” full text index. Can we try that? What is the url to your catalogue?
      Works only if you have ToC’s of course 😉

  4. As a cataloger myself, I appreciate that you regard controlled vocabulary subject headings and social tagging as complementary. Too many want to replace one with the other — and I completely agree that social tagging could add to a record, but never truly replace controlled subject headings.

    I do think that there are greater problems with the use of social tagging than many people realize. The optimism is based on the assumption that people are interested in everything — yet what you need is a large enough group, interested in a specific topic, enough of whom are interested in tagging, to reach a large enough grouping of tags in order to actually add value. ONE person adding a tag to a record is not necessarily as valuable as ten persons adding the same or similar tags. Yet many items/records will have only that ONE tag.

    I actually attended a presentation at LC a few years ago — dates blur — by a librarian from the World Bank. She was cataloging a lot of electronic documents for their collection and they had developed, as a result, a piece of software that would search the text and suggest subject heading or keywords. That sounds a lot like your dream idea there at the end. And it takes care of the fulltext-is-not-keyword problem, as explained here: http://scienceblogs.com/bookoftrogool/2009/08/please_dont_do_this_a_word_abo.php

    I do know that LCSH is working toward the possibility of social tagging enriching LCSH subject headings. But I think the technicalities of it, as well as the level of human involvement, are still being discussed. My awareness of this is based on a recent chat, not thorough knowledge.

    1. Thanks mpol. I agree that, if “social tagging” is enabled in a catalog, you will probably see groups of experts (virtual communities, “tribes”) emerging who will be taking care of this, within their specific subject area. This could take care of the problem of “lagging” controlled vocabularies, and librarians unable to keep up with raid developments.

      But I also think that one person’s contribution may be as valuable as a large group’s. In my example of “semantic web”, if you would search for subject heading “semantic web” in a Dutch catalog, you would not find anything, because the official controlled vocabulary term is “Semantisch web”. Then if one end user would add the “semantic web” tag, this would benefit everybody.

      Also, if we want to attract more young people to our catalogs and libraries, then it definitely is time to allow social tagging on our systems. A lot of terms in the Dutch National Subject Headings controlled vocabulary appear very archaic, even to me! Next generations of ens users would not think of using words like that to search for information, because they are using completely different (modern) terminology. So it is either: allow social tagging or completely revolutionise the way libraries are usingcontrolled vocabularies, if we do not want our catalogs to loose it to Google.

  5. Automatic metadata generation. Great, but already very hard to do with text. Does enable context relevant indexing, when you will use a specialized thesaurus to do this for example.
    With other material this does not work very well. Unless you are interested in any picture containing a lot of red. (Could be interesting if you are looking for horror movies)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.