CommonPlace.Net

Library2.0 and beyond
RSS icon Home icon
  • Linked Data for Libraries

    Posted on June 19th, 2009 Lukas Koster 17 comments
    Linked Data and bibliographic metadata models

    ted

    © PhOtOnQuAnTiQuE

    Some time after I wrote “UMR – Unified Metadata Resources“, I came across Chris Keene’s post “Linked data & RDF : draft notes for comment“, “just a list of links and notes” about Linked Data, RDF and the Semantic Web, put together to start collecting information about “a topic that will greatly impact on the Library / Information management world“.

    While reading this post and working my way through the links on that page, I started realising that Linked Data is exactly what I tried to describe as One single web page as the single identifier of every book, author or subject. I did mention Semantic Web, URI’s and RDF, but the term “Linked Data” as a separate protocol had escaped me.

    The concept of Linked Data was described by Tim Berners Lee, the inventor of the World Wide Web. Whereas the World Wide Web links documents (pages, files, images), which are basically resources about things, (”Information Resources” in Semantic Web terms), Linked Data (or the Semantic Web) links raw data and real life things (”Non-Information Resources”).

    There are several definitions of Linked Data on the web, but here is my attempt to give a simple definition of it (loosely based on the definition in Structured Dynamics’ Linked Data FAQ):

    Linked Data is a methodology for providing relationships between things (data, concepts and documents) anywhere on the web, using URI’s for identifying, RDF for describing and HTTP for publishing these things and relationships, in a way that they can be interpreted and used by humans and software.

    I will try to illustrate the different aspects using some examples from the library world. The article is rather long, because of the nature of the subject, then again the individual sections are a bit short. But I do supply a lot of links for further reading.

    Data is relationships
    The important thing is that “data is relationships“, as Tim Berners Lee says in his recent presentation for TED.
    Before going into relationships between things, I have to point out the important distinction between abstract concepts and real life things, which are “manifestations” of the concepts. In Object modeling these are called “classes” (abstract concepts, types of things) and “objects” (real life things, or “instances” of “classes“).

    Examples:

    • the class book can have the instances/objects “Cloud Atlas“, “Moby Dick“, etc.
    • the class person can have the instances/objects “David Mitchell“, “Herman Melville“, etc.

    In the Semantic Web/RDF model the concept of triples is used to describe a relationship between two things: subject – predicate – object, meaning: a thing has a relation to another thing, in the broadest sense:

    • a book (subject) is written by (predicate) a person (object)

    You can also reverse this relationship:

    • a person (subject) is the author of (predicate) a book (object)
    Triple

    Triple

    The person in question is only an author because of his or her relationship to the book. The same person can also be a mother of three children, an employee of a library, and a speaker at a conference.
    Moreover, and this is important: there can be more than one relationship between the same two classes or types of things. A book (subject) can also be about (predicate) a person (object). In this case the person is a “subject” of the book, that can be described by a “keyword”, “subject heading”, or whatever term is used. A special case would be a book, written by someone about himself (an autobiography).

    The problem with most legacy systems, and library catalogues as an example of these, is that a record for let’s say a book contains one or more fields for the author (or at best a link to an entry in an authority file or thesaurus), and separately one or more fields for subjects. This way it is not possible to see books written by an author and books about the same author in one view, without using all kinds of workarounds, link resolvers or mash-ups.
    Using two different relationships that link to the same thing would provide for an actual view or representation of the real world situation.

    Another important option of Linked Data/RDF: a certain thing can have as a property a link to a concept (or “class”) , describing the nature of the thing: “object Cloud Atlas” has type “book“; “object David Mitchell” has type “person“; “object Cloud Atlas” is written by “object David Mitchell“.

    And of course, the property/relationship/predicate can also link to a concept describing the nature of the link.

    Anywhere on the web

    ERD

    ERD

    So far so good. But you may argue that this relationship theory is not very new. Absolutely right, but up until now this data-relationship concept has mainly been used with a view to the inside, focused on the area of the specific information system in question, because of the nature and the limitations of the available technology and infrastructure.

    The “triple” model is of course exactly the same as the long standing methodology of Entity Relationship Diagrams (ERD), with which relationships between entities (=”classes“) are described. An ERD is typically used to generate a database that contains data in a specific information system. But ERD’s could just as well be used to describe Linked Data on the web.

    Information systems, such as library catalogs, have been, and still are, for the greatest part closed containers of data, or “silos” without connections between them, as Tim Berners Lee also mentions in his TED presentation.

    Lots of these silo systems are accessible with web interfaces, but this does not mean that items in these closed systems with dedicated web front ends can be linked to items in other databases or web pages. Of course these systems can have API’s that allow system developers to create scripts to get related information from other systems and incorporate that external information in the search results of the calling system. This is what is being done in web 2.0 with so-called mash-ups.
    But in this situation you need developers who know how to make scripts using specific scripting languages for all the different proprietary API’s that are being supported for all the individual systems.
    If Linked Data was a global standard and all open and closed systems and websites supported RDF, then all these links would be available automatically to RDF enabled browser and client software, using SPARQL, the RDF Query Language.

    • Linked Data/RDF can be regarded as a universal API.

    The good thing about Linked Data is, that it is possible to use Linked Data mechanisms to link to legacy data in silo databases. You just need to provide an RDF wrapper for the legacy system, like has been done with the Library of Congress Subject Headings.

    Some examples of available tools for exposing legacy data as RDF:

    • Triplify – a web applications plugin that converts relational database structures into RDF triples
    • D2R Server – a tool for publishing relational databases on the Semantic Web
    • wp-RDFa – a wordpress plugin that adds some RDF information about Author and Title to Wordpress blog posts

    Of course, RDF that is generated like this will very probably only expose objects to link TO, not links to RDF objects external to the system.

    Also, Linked Data can be used within legacy systems, for mixing legacy and RDF data, open and closed access data, etc. In this case we have RDF triples that have a subject URI from one data source and an object URI from another data source. In a situation with interlinked systems it would for instance be possible to see that the author of a specific book (data from a library catalog) is also speaking at a specific conference (data from a conference website). Objects linked together on the web using RDF triples are also known as an “RDF graph”. With RDF-aware client software it is possible to navigate through all the links to retrieve additional information about an object.

    Linked Data

    Linked Data

    URI’s
    URI’s (”Uniform Resource Identifiers”) are necessary for uniquely identifying and linking to resources on the web. A URI is basically a string that identifies a thing or resource on the web. All “Information Resources”, or WWW pages, documents, etc. have a URI, which is commonly known as a URL (Uniform Resource Locator).

    With Linked Data we are looking at identifying “Non-information Resources” or “real world objects” (people, concepts, things, even imaginary things), not web pages that contain information about these real world objects. But it is a little more complicated than that. In order to honour the requirement that a thing and its relations can be interpreted and used by humans and software, we need at least 3 different representations of one resource (see: How to publish Linked Data on the web):

    • Resource identifier URI (identifies the real world object, the concept, as such)
    • RDF document URI (a document readable for semantic web applications, containing the real world object’s RDF data and relationships with other objects)
    • HTML document URI (a document readable for humans, with information about the real world object)
    rdfredir2

    Redirection

    For instance, there could be a Resource Identifier URI for a book called “Cloud Atlas“. The web resource at that URI can redirect an RDF enabled browser to the RDF document URI, which contains RDF data describing the book and its properties and relationships. A normal HTML web browser would be redirected to the HTML document URI, for instance a web page about the book at the publisher’s website.

    There are several methods of redirecting browsers and application to the required representation of the resource. See Cool URIs for the Semantic Web for technical details.

    There are also RDF enabled browsers that transform RDF into web pages readable by humans, like the FireFox addon “Tabulator“, or the web based Disco and Marbles browsers, both hosted at the Free University Berlin.

    RDF, vocabularies, ontologies
    RDF or Resource Description Framework, is, like the name suggests, just a framework. It uses XML (or a simpler non-XML method N3) to describe resources by means of relationships. RDF can be implemented in vocabularies or ontologies, which are sets of RDF classes describing objects and relationships for a given field.
    Basically, anybody can create an RDF vocabulary by publishing an RDF document defining the classes and properties of the vocabulary, at a URI on the web. The vocabulary can then be used in a resource by referring to the namespace (the URI) and the classes in that RDF document.

    A nice and useful feature of RDF is that more than one vocabularies can be mixed and used in one resource.
    Also, a vocabulary itself can reference other vocabularies and thereby inherit well established classes and properties from other RDF documents.
    Another very useful feature of RDF is that objects can be linked to similar object resources describing the same real world thing. This way confusion about which object we are talking about, can be avoided.

    A couple of existing and well used RDF vocabularies/ontologies:

    (By the way,  the links in the first column (to the RDF files themselves) may act as an illustration of the redirection mechanism described before. Some of them may link to either the RDF file with the vocabulary definition itself, or to a page about the vocabulary, depending on the type of browser you use: rdf-aware or not.)

    A special case is:

    • RDFa – a sort of microformat without a vocabulary of its own, which relies on other vocabularies for turning XHTML page attributes into RDF

    Example
    A shortened example for “Cloud Atlas” by David Mitchell from the RDF BookMashup at the Free University Berlin, which uses a number of different vocabularies:

    <?xml version=”1.0″ encoding=”UTF-8″ ?>
    <rdf:RDF
    xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

    xmlns:skos=”http://www.w3.org/2004/02/skos/core#”>

    <rdf:Description rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/books/0375507256″>
    <rev:hasReview rdf:resource=”http://www4.wiwiss.fu-berlin.de/bookmashup/reviews/0375507256_EditorialReview1″/>
    <dc:creator rdf:resource=”http://www4.wiwiss.fu-berlin.de/bookmashup/persons/David+Mitchell”/>
    <dc:format>Paperback</dc:format>
    <dc:identifier rdf:resource=”urn:ISBN:0375507256″/>
    <dc:publisher>Random House Trade Paperbacks</dc:publisher>
    <dc:title>Cloud Atlas: A Novel</dc:title>
    </rdf:Description>

    <scom:Book rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/books/0375507256″>
    <rdfs:label>Cloud Atlas: A Novel</rdfs:label>
    <skos:subject rdf:resource=”http://www4.wiwiss.fu-berlin.de/bookmashup/subject/Fantasy+fiction”/>
    <skos:subject rdf:resource=”http://www4.wiwiss.fu-berlin.de/bookmashup/subject/Fate+and+fatalism”/>

    <foaf:depiction rdf:resource=”http://ecx.images-amazon.com/images/I/51MIVHgJP%2BL.jpg”/>
    <foaf:thumbnail rdf:resource=”http://ecx.images-amazon.com/images/I/51MIVHgJP%2BL._SL75_.jpg”/>
    </scom:Book>

    <rdf:Description rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/doc/books/0375507256″>
    <dc:license rdf:resource=”http://www.amazon.com/AWS-License-home-page-Money/b/ref=sc_fe_c_0_12738641_12/102-8791790-9885755?ie=UTF8&amp;node=3440661&amp;no=12738641&amp;me=A36L942TSJ2AJA”/>
    <dc:license rdf:resource=”http://www.google.com/terms_of_service.html”/>
    </rdf:Description>

    <foaf:Document rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/doc/books/0375507256″>
    <rdfs:label>RDF document about the book: Cloud Atlas: A Novel</rdfs:label>
    <foaf:maker rdf:resource=”http://www4.wiwiss.fu-berlin.de/is-group/resource/projects/Project10″/>
    <foaf:primaryTopic rdf:resource=”http://www4.wiwiss.fu-berlin.de/bookmashup/books/0375507256″/>
    </foaf:Document>

    <rdf:Description rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/persons/David+Mitchell”>
    <rdfs:label>David Mitchell</rdfs:label>
    </rdf:Description>

    <rdf:Description rdf:about=”http://www4.wiwiss.fu-berlin.de/bookmashup/reviews/0375507256_EditorialReview1″>
    <rdfs:label>Review number 1 about: Cloud Atlas: A Novel</rdfs:label>
    </rdf:Description>

    <rdf:Description rdf:about=”http://www4.wiwiss.fu-berlin.de/is-group/resource/projects/Project10″>
    <rdfs:label>RDF Book Mashup</rdfs:label>
    </rdf:Description>

    </rdf:RDF>

    A partial view on this RDF file with the Marbles browser:

    RDF browser view

    RDF browser view

    See also the same example in the Disco RDF browser.

    Library implementations
    It seems obvious that Linked Data can be very useful in providing a generic infrastructure for linking data, metadata and objects, available in numerous types of data stores, in the online library world. With such a networked online data structure, it would be fairly easy to create all kinds of discovery interfaces for bibliographic data and objects. Moreover, it would also be possible to link to non-bibliographic data that might interest the users of these interfaces.

    A brief and incomplete list of some library related Linked Data projects, some of which already mentioned above:

    And what about MARC, AACR2 and RDA? Is there a role for them in the Linked Data environment? RDA is supposed to be the successor of AACR2 as a content standard that can be used with MARC, but also with other encoding standards like MODS or Dublin Core.
    The RDA Entity Relationship Diagram, that incorporates FRBR as well, can of course easily be implemented as an RDF vocabulary, that could be used to create a universal Linked Data library network. It really does not matter what kind of internal data format the connected systems use.

    • Share/Save/Bookmark
  • No future for libraries?

    Posted on May 24th, 2009 Lukas Koster 13 comments

    Will library buildings and library catalogs survive the web?

    © Moqub

    © Moqub

    Some weeks ago a couple of issues appeared in the twitter/blogosphere (or at least MY twitter/blogoshere) related to the future of the library in this digital era.

    • There was the Espresso book machine that prints books on demand on location, which led to questions like: “apart from influencing publishing and book shops, what does this mean for libraries?“.
    • There was a Twitter discussion about “will we still need library buildings?“.
    • There was another blog post about the future of library catalogs by Edwin Mijnsbergen (in Dutch) that asked the question of the value of library catalogs in relation to web2.0 and the new emerging semantic web.

    This made me start thinking about a question that concerns us all: is there a future for the library as we know it?

    To begin with, what is a library anyway?

    For ages, since the beginning of history, up until some 15 years ago, a library was an institution characterised by:

    © Mihai Bojin

    © Mihai Bojin

    • a physical collection of printed and handwritten material
    • a physical location, a building, to store the collection
    • a physical printed or handwritten on site catalog
    • on location searching and finding of information sources using the catalog
    • on site requesting, delivery, reading, lending and returning of material
    • a staff of trained librarians to catalog the collection and assist patrons

    The central concept here is of course the collection. That is the “raison d’être” of a library. The purpose of library building, catalog and librarians is to give people access to the collection, and provide them with the information they need.

    Clearly, because of the physical nature of the collection and the information transmission process the library needed to be a building with collection and catalog inside it. People had to go there to find and get the publications they needed.

    If collections and the transmission of information were completely digital, then the reason for a physical location to go to for finding and getting publications would not exist anymore. Currently one of these conditions has been met fully and the other one partly. The transmission of information can take place in a completely digital way. Most new scientific publications are born digital (e-Journals, e-Books), and a large number of digitisation projects are taking care of making digital copies of existing print material.
    Searching for items in a library’s collection is already taking place remotely through OPACs and other online tools almost everywhere. A large part of these collections can be accessed digitally. Only in case a patron wants to read or borrow a printed book or journal, he or she has to go the library building to fetch it.

    All this seems to lead to the conclusion that the library may be slowly moving away from a physical presence to a digital one.

    But there is something else to be considered here, that reaches beyond the limits of one library. In my view the crucial notion here is again the collection.
    In my post Collection 2.0 I argue that in this digital information age a library’s collection is everything a library has access to as opposed to the old concept of everything a library owns. This means in theory that every library could have access to the same digital objects of information available on the web, but also to each other’s print objects through ILL. There will be no physically limited collection only available in one library anymore, just one large global collection.

    In this case, there is not only no need for people to go to a specific library for an item in its collection, but also there is no need to search for items using a specific library’s catalog.

    Now you may say that people like going to a library building and browse through the stacks. That may still be true for some, but in general, as I argue in my post “Open Stack 2.0“, the new Open Stack is the Web.

    © Nicole C. Engard

    © Nicole C. Engard

    In the future there will be collections, but not physical ones (except of course for the existing ones with items that are not allowed to leave the library location). We will see virtual subject collections, determined by classifications and keywords assigned both by professionals and non-professionals.

    On a parallel level there will be virtual catalogs, which are views on virtual collections defined by subjects on different levels and in different locations: global, local, subject-oriented, etc. These virtual collections and catalogs will be determined and maintained by a great number of different groups of people and institutions (commercial and non-commercial). One of these groups can still be a library. As Patrick Vanhoucke observed on Twitter (in Dutch): “We have to let go of the idea of the library as a building; the ‘library’ is the network of librarians“. These virtual groups of people may be identical to what is getting known more and more as “tribes“.

    Having said all this, of course there will still be occurrences of libraries as buildings and as physical locations for collections. Institutions like the Library of Congress will not just vanish into thin air. Even if all print items have been digitised, print items will still be wanted for a number of reasons: research, art, among others. Libraries can have different functions, like archives, museums, etc. and still be named “libraries” too.
    Library buildings can transform into other types of locations: in universities they can become meeting places and study facilities, including free wifi and Starbucks coffee. Public libraries can shift focus to becoming centres of discovery and (educational) gaming. Anything is possible.

    It’s obvious that libraries obey the same laws of historical development as any other social institution or phenomenon. The way that information is found and processed is determined, or at least influenced, by the status of technological development. And I am not saying that all development is technology driven! This is not the place for a philosophy on history, economics and society.

    Some historical parallels to illustrate the situation that libraries are facing:

    • writing: inscribing clay tablets > scratching ink on paper > printing (multiplication, re-usability) > typewriter > computer/printer (digital multiplication and re-usability!) > digital only (computer files, blogs, e-journal, e-books)
    • consumption of music: attending live performance on location > listening to radio broadcast > playing purchased recordings (vinyl, cassettes, cd, dvd) > make home recordings > play digital music with mp3/personal audio > listen to digital music online

    From these examples it’s perfectly clear that new developments do not automatically make the old ways disappear! Prevailing practices can coexist with “outdated” ways of doing things. Libraries may still have a future.

    In the end it comes down to these questions:

    • Will libraries cease to exist, simply because they no longer serve the purpose of providing access to information?
    • Are libraries engaged in a rear guard fight?
    • Will libraries become tourist attractions?
    • Will libraries adapt to the changing world and shift focus to serve other, related purposes?
    • Are professional librarian skills useful in a digital information world?

    I do not know what will happen with libraries. What do you think?

    • Share/Save/Bookmark
  • Who needs MARC?

    Posted on May 15th, 2009 Lukas Koster 24 comments

    Why use a non-normalised metadata exchange format for suboptimal data storage?

    Catalog card

    © leah the librarian

    This week I had a nice chat with André Keyzer of Groningen University library and Peter van Boheemen of Wageningen University Library who attended OCLC’s Amsterdam Mashathon 2009. As can be expected from library technology geeks, we got talking about bibliographic metadata formats, very exciting of course. The question came up: what on earth could be the reason for storing bibliographic metadata in exchange formats like MARC?

    Being asked once at an ELAG conference about the bibliographic format Wageningen University was using in their home grown catalog system, Peter answered: “WDC” ….”we don’t care“.

    Exactly my idea! As a matter of fact I think I may have used the same words a couple of times in recent years, probably even at ELAG2008. The thing is: it really does not matter how you store bibliographic metadata in your database, as long as you can present and exchange the data in any format requested, be it MARC or Dublin Core or anything else.

    Of course the importance of using internationally accepted standards is beyond doubt, but there clearly exists widespread misunderstanding of the functions of certain standards, like for instance MARC. MARC is NOT a data storage format. In my opinion MARC is not even an exchange format, but merely a presentation format.

    St. Marc Express

    St. Marc Express

    With a background and experience in data modeling, database and systems design (among others), I was quite amazed about bibliographic metadata formats when I started working with library systems in libraries, not having a librarian training at all. Of course, MARC (”MAchine Readable Cataloging record“) was invented as a standard in order to facilitate exchange of library catalog records in a digital era.
    But I think MARC was invented by old school cataloguers who did not have a clue about data normalisation at all. A MARC record, especially if it corresponds to an official set of cataloging rules like AARC2, is nothing more than a digitised printed catalog card.

    In pre-computer times it made perfect sense to have a standardised uniform way of registering bibliographic metadata on a printed card in this way. The catalog card was simultaneously used as a medium for presenting AND storing metadata. This is where the confusion originates from!

    MARC record

    MARC record

    But when the Library of Congress saysIf a library were to develop a “home-grown” system that did not use MARC records, it would not be taking advantage of an industry-wide standard whose primary purpose is to foster communication of information” it is saying just plain nonsense.
    Actually it is better NOT to use something like MARC for other purposes than exchanging, or better, presenting data. To illustrate this I will give two examples of MARC tags that have been annoying me since my first day as a library employee:

    100 – Main Entry-Personal Name
    Besides storing an author’s name as a string in each individual bibliographic record instead of using a code, linking to a central authority table (”foreign key” in relational database terms), it is also a mistake to use a person’s name as one complete string in one field. Examples on the Library of Congress MARC website use forms like “Adams, Henry”, “Fowler, T. M.” and “Blackbeard, Author of”. To take only the simple first example, this author could also be registered as “Henry Adams”, “Adams, H.”, “H. Adams”. And don’t say that these forms are not according to the rules! They are out there! There is no way to match these variations as being actually one and the same.
    In a normalised relational database, this subfield $a would be stored something like this (simplified!):

    • Person
      • Surname=Adams
      • First name=Henry
      • Prefix=

    773 – Host Item Entry
    Subfield $g of this MARC tag is used for storing citation information for a journal article, volume, issue, year, start page, end page, all in one string, like: “Vol. 2, no. 2 (Feb. 1976), p. 195-230“. Again I have seen this used in many different ways. In a normalised format this would look something like this, using only the actual values:

    • Journal
      • Volume=2
      • Issue=2
      • Year=1976
      • Month=2
      • Day=
      • Start page=195
      • End page=230

    In a presentation of this normalised data record extra text can be added like “Vol.” or “Volume“, “Issue” or “No.“, brackets, replacing codes by descriptions (Month 2 = Feb.)  etc., according to the format required. So the stored values could be used to generate the text “Vol. 2, no. 2 (Feb. 1976), p. 195-230” on the fly, but also for instance “Volume 2, Issue 2, dated February 1976, pages 195-230“.

    The strange thing with this bibliographic format aimed at exchanging metadata is that it actually makes metadata exchange terribly complicated, especially with these two tags Author and Host Item. I can illustrate this with describing the way this exchange is handled between two digital library tools we use at the Library of the University of Amsterdam, MetaLib and SFX , both from the same vendor, Ex Libris.

    The metasearch tool MetaLib is using the described and preferred mechanism of on the fly conversion of received external metadata from any format to MARC for the purpose of presentation.
    But if we want to use the retrieved record to link to for instance a full text article using the SFX link resolver, the generated MARC data is used as a source and the non-normalised data in the 100 and 773 MARC tags has to be converted to the OpenURL format, which is actually normalised (example in simple OpenUrl 0.1):

    isbn=;issn=0927-3255;date=1976;
    volume=2;issue=2;spage=195;epage=230;
    aulast=Adams;aufirst=Henry;auinit=;

    In order to do this all kinds of regular expressions and scripting functions are needed to extract the correct values from the MARC author and citation strings. Wouldn’t it be convenient, if the record in MetaLib would already have been in OpenURL or any other normalised format?

    The point I am trying to make is of course that it does not matter how metadata is stored, as long as it is possible to get the data out of the database in any format appropriate for the occasion. The SRU/SRW protocol is particularly aimed at precisely this: getting data out of a database in the required format, like MARC, Dublin Core, or anything else. An SRU server is a piece of middleware that receives requests, gets the requested data, converts the data and then returns the data in the requested format.

    Currently at the Library of the University of Amsterdam we are migrating our ILS which also involves converting our data from one bibliographic metadata format (PICA+) to another (MARC). This is extremely complicated, especially because of the non-normalised structure of both formats. And I must say that in my opinion PICA+ is even the better one.
    Also all German and Austrian libraries are meant to migrate from the MAB format to MARC, which also seems to be a move away from a superior format.
    All because of the need to adhere to international standards, but with the wrong solution.

    Maybe the projected new standard for resource description and access RDA will be the solution, but that may take a while yet.

    • Share/Save/Bookmark
  • ReTweet @Reply – Twitter communities

    Posted on April 27th, 2009 Lukas Koster 2 comments

    twitterelag

    In my post “Tweeting Libraries” among other things I described my Personal Twitter experience as opposed to Institutional Twitter use. Since then I have discovered some new developments in my own Twitter behaviour and some trends in Twitter at large: individual versus social.

    There have been some discussions on the web about the pros and cons and the benefits and dangers of social networking tools like Twitter, focusing on “noise” (uninteresting trivial announcements) versus “signal” (meaningful content), but also on the risk of web 2.0 being about digital feudalism, and being a possible vehicle for fascism (as argumented by Andrew Keen).

    My kids say: “Twitter is for old people who think they’re cool“. According to them it’s nothing more than : “Just woke up; SEND”, “Having breakfast; SEND”; “Drinking coffee; SEND”; “Writing tweet; SEND”. For them Twitter is only about broadcasting trivialities, narcissistic exhibitionism, “noise”.
    For their own web communications they use chat (MSN/Messenger), SMS (mobile phone text messages), communities (Hyves, the Dutch counterpart of MySpace) and email. Basically I think young kids communicate online only within their groups of friends, with people they know.

    Just to get an overview: a tweet, or Twitter message, can basically be of three different types:

    • just plain messages, announcements
    • replies: reactions to tweets from others, characterised by the “@<twittername>” string
    • retweets: forwarding tweets from others, characterised by the letters “RT

    Although a lot of people use Twitter in the “exhibitionist” way, I don’t do that myself at all. If I look at my Twitter behaviour of the past weeks, I almost only see “retweets” and “replies”.

    Both “replies” and “retweets” obviously were not features of the original Twitter concept, they came into being because Twitter users needed conversation.
    A reply is becoming more and more a replacement for short emails or mobile phone text messages, at least for me. These Twitter replies are not “monologues”, but “dialogues”. If you don’t want everybody to read these, you can use a “Direct message” or “DM“.
    Retweets are used to forward interesting messages to the people who are following you, your “community” so to speak. No monologue, no dialogue, but sharing information with specific groups.
    The “@<twittername>” mechanism is also used to refer to another Twitter user in a tweet. In official Twitter terminology “replies” have been replaced by “mentions“.

    Retweets and replies are the building blocks of Twitter communities. My primary community consists of people and organisations related to libraries. Just a small number of these people I actually know in person. Most of them I have never met. The advantage of Twitter here is obvious: I get to know more people who are active in my professional area, I stay informed and up to date, I can discuss topics. This is all about “signal”. If issues are too big for twitter (more than 140 characters) we can use our blogs.
    But it’s not only retweets and replies that make Twitter communities work. Trivialities (”noise”) are equally important. They make you get to know people and in this way help create relationships built on trust.

    Another compelling example of a very positive social use of Twitter I experienced last week, when there were a number of very interesting Library 2.0 conferences, none of which I could attend in person because of our ILS project:

    All of these conferences were covered on Twitter by attendees using the hashtags #elag09, #csnr09 and #ugul09 . This phenomenon makes it possible for non-participants to follow all events and discussions at these conferences and even join in the discussions. Twitter at its best!

    Twitter is just a tool, a means to communicate in many different ways. It can be used for good and for bad, and of course what is “good” and what is “bad” is up to the individual to decide.

    • Share/Save/Bookmark
  • Replacing our ILS, business as usual

    Posted on April 24th, 2009 Lukas Koster 2 comments
    catalog

    © Peter Morville

    As you may have noticed from some of my tweets, the Library of the Unversity of Amsterdam, my place of work, is in the process of replacing its ILS (Integrated Library System). All in all this project, or better these two projects (one selecting a new ILS, the other one implementing it) will have taken 18 months or more from the decision to go ahead until STP (Switch to Production), planned for August 15 this year. My colleague Bert Zeeman blogged about this (in Dutch) recently.

    One thing that has become absolutely clear to me is that replacing an ILS is not just about replacing one information system by another. It is about replacing an entire organisational structure of work processes, with its huge impact on all people involved. And in our case it affects two organisations: besides the Library of the University of Amsterdam also the Media Library of the Hogeschool van Amsterdam. We have been managing library systems for both organisations in a mini consortial structure since a couple of years. So the Media Library is facing a second ILS replacement within two years.

    While the decision was made because of pressing technical reasons, also with an eye on preparing for future library 2.0 developments, it turned out to be of substantial consequence for the organisation.
    This is the first time that I am participating in such a radical library system project. I have done a couple of projects implementing and upgrading metasearching and OpenURL link resolver tools in the last six years, but these are nothing compared to the current project. With these “add-on” tools, that started as a means of extending the library’s primary stream of information, only a relatively limited number of people were involved. But with an ILS you are talking about the core business of a library (still!) and about day to day working life of everybody involved in acquisitions, cataloguing, circulation as well as system administrators and system librarians.

    To make it even more complicated, the University Library is also switching from the old system’s proprietary bibliographic format to MARC21, because that is what the new system is using. Personally I think that the old system’s format is better (just like our German colleagues think about their move from MAB to MARC), but of course the advantages of using an internationally accepted and used standard outweighs this, as always. Maybe food for another blog post later…

    Last but not least, the Library is simultaneously doing a project for the implementation of RFID for self check machines. The initial idea was to implement RFID in the old system and then just migrate everything to the new one. However, for various reasons, recently it was decided to postpone RFID implementation to shortly after our ILS STP. Some initial tests have shown that this probably will work.

    And while all this is going on, all normal work needs to be taken care of too: ” business as usual” .

    Now, looking at workflows: the way that our individual departments have organised their workflows, is partly dictated by the way the old system is designed. The new system obviously dictates workflows too, but in other areas. Although this new system is very flexible and highly configurable, there are still some local requirements that cannot be met by the new system.
    Of course this is NOT the way it should be! Systems should enable us to do what we want and how we want it! Hopefully new developments like Ex Libris’ URM and the very recently announced new OCLS WorldCat Web based ILS will take care of users better.

    Talking about “very flexible and highly configurable”: although a very big advantage, this also makes it much more complicated and time consuming to implement the new system. Fortunately there are a lot of other libraries in The Netherlands and around the world using the new system that are willing to help us in every possible way. And this is highly appreciated!

    Other isues that make this project complicated:

    • unexpected issues, bottlenecks: these keep on coming
    • migration of data from old system: conversion of old to new format
    • implementing links with external systems like student’s and staff database, financial system, national union catalogue

    I think we will make STP on the planned date, but I also think we need to postpone a number of issues until after that. There will still be a lot of work to be done for my department after the project has finished.

    To end with a positive note: the new OPAC wil be much nicer and more flexible than the old one. And in the end that is what we are doing this for: our patrons.

    • Share/Save/Bookmark
  • UMR – Unified Metadata Resources

    Posted on April 12th, 2009 Lukas Koster 13 comments

    One single web page as the single identifier of every book, author or subject

    openlibrary1

    I like the concept of “the web as common publication platform for libraries“, and “every book its own url“, as described by Owen Stephens in two blog posts:
    Its time to change library systems

    I’d suggest what we really need to think about is a common ‘publication’ platform – a way of all of our systems outputting records in a way that can then be easily accessed by a variety of search products – whether our own local ones, remote union ones, or even ones run by individual users. I’d go further and argue that platform already exists – it is the web!

    and “The Future is Analogue

    If every book in your catalogue had it’s own URL – essentially it’s own address on your web, you would have, in a single step, enabled anyone in the world to add metadata to the book – without making any changes to the record in your catalogue.

    This concept of identifying objects by URL:Unified Resource Locator (or maybe better URI: Unified Resource Identifier) is central to the Semantic Web, that uses RDF (resource Description Framework) as a metadata model.

    As a matter of fact at ELAG 2008 I saw Jeroen Hoppenbrouwers (”Rethinking Subject Access “) explaining his idea of doing the same for Subject Headings using the Semantic Web concept of triplets. Every subject its own URL or web page. He said: “It is very easy. You can start doing this right away“.

    elag_2008_hoppenbrouwers

    © Jeroen Hoppenbrouwers

    To make the picture complete we only need the third essential component: every author his or her or its own URL!

    This ideal situation would have to conform to the Open Access guidelines of course. One single web page serving as the single identifier of every book, author or subject, available for everyone to link their own holdings, subscriptions, local keywords and circulation data to.

    In real life we see a number of current initiatives on the web by commercial organisations and non commercial groups, mainly in the area of “books” (or rather “publications”) and “authors”. “Subjects” apparently is a less appealing area to start something like this, because obviously stand-alone “subjects” without anything to link them to are nothing at all, whereas you always have “publications” and “authors”, even without “subjects”. The only project I know of is MACS (Multilingual Acces to Subjects), which is hosted on Jeroen Hoppenbrouwers’ domain.

    For publications we have OCLC’s WorldCat, Librarything, Open Library, to name just a few. And of course these global initiatives have had their regional and local counterparts for many years already (Union Catalogues, Consortia models). But this is again a typical example of multiple parallel data stores of the same type of entities. The idea apparently is that you want to store everything in one single database aiming to be complete, instead of the ideal situation of single individual URI’s floating around anywhere on the web.
    Ex Libris’ new Unified Resource Management development (URM, and yes: the title of this blog post is an ironic allusion to that acronym), although it promotes sharing of metadata, it does this within another separate system into which metadata from other systems can be copied.

    The same goes for authors. We have WorldCat Identities, VIAF, local authority schemes like DAI, etc. Again, we see parallel silos instead of free floating entities.

    Of course, the ideal picture sketched above is much too simple. We have to be sure which version of a publication, which author and which translation of a subject for instance we are dealing with. For publications this means that we need to implement FRBR (in short: an original publication/work and all of its manifestations), for authors we need author names thesauri, for subjects multilingual access.

    I have tried to illustrate this in this simplified and incomplete diagram:

    © Lukas Koster

    © Lukas Koster

    In this model libraries can use their local URI-objects representing holdings and copies for their acquisitions and circulation management, while the bibliographic metadata stay out there in the global, open area. Libraries (and individuals of course) can also attach local keywords to the global metadata, which in turn can become available globally (”social tagging”).

    It is obvious that the current initiatives have dealt with these issues with various levels of success. Some examples to illustrate this:

    • Work: Desiderius ErasmusEncomium Moriae (Greek), Laus Stultitiae (Latin), Lof der Zotheid (Dutch), Praise of Folly (English)
    • Author: David Mitchell

    Authors
    Good:

    Medium:

    Bad:

    Publications
    Good:

    Bad:

    These findings seem to indicate that some level of coordination (which the commercial initiatives apparently have implemented better than the non-commercial ones) is necessary in order to achieve the goal of “one URI for each object”.

    Who wants to start?

    • Share/Save/Bookmark
  • Tweeting libraries

    Posted on March 30th, 2009 Lukas Koster 8 comments

    Should libraries use Twitter ? Some web2.0 librarians think so, other people say it’s just a childish hype. Alice de Jong of the Peace Palace Library in The Hague wrote an article recently in the Dutch magazine Informatieprofessional (in Dutch), saying libraries should use Twitter as a means of quick and direct communication with their patrons. The Peace Palace Library uses Twitter as an automatic newsfeed .

    Library of Congress Twitter

    An interesting question is: how can an in essence exhibitionist individual social networking tool be used in an institutional way?

    What is Twitter anyway?
    Wikipedia says: “Twitter is a social networking and micro-blogging service that enables its users to send and read other users’ updates known as tweets. Tweets are text-based posts of up to 140 bytes in length.” Basically a Twitter user broadcasts short messages to the web. Everybody can read these through that user’s personal Twitter page, or via an RSS feed on that page. Twitter users can subscribe to other Twitter users’ tweets by “following” them. In that case all followed tweets appear in their own Twitter stream. Twitter users can also reply to other tweets; this way it becomes a social networking environment. Tweets and replies are public, but there is also the option of “Direct messages”, that are private.
    Twitter can be used via the Twitter website, or applications on mobile phones (Like Twitterfon ), on PC’s (like Tweetdeck ), or through widgets in other websites, like TwitterGadget in iGoogle .

    Personal Twitter
    Exhibitionism: that’s what Twitter originally is of course. Twitter asks “What are you doing?”. You simply tell the whole world (or world wide web at least) what you’re up to. A symptom of the egocentricity of this decade.
    But somehow egocentric exhibitionism turned into professional cooperation and friendly conversation.

    I first heard of Twitter at ELAG2008, less than a year ago. Besides the tag to be used for blog posts about the conference, there was also an announcement for a Twitter hashtag to be used. (And this at a conference where social tagging was promoted against controlled vocabularies!). There were a number of library bloggers there, who were also on Twitter: digicmb, Wowter, PatrickD.

    Since then I started tweeting too, but I did not use it much until September 2008. Somehow I really took off in February 2009, as can be seen in my TweetStats page.

    Most people I follow or who are following me, are library people. Most of these also blog. So there is a kind of library2.0 community on Twitter, like there are all kinds of communities there. Some of my Twitter friends I know personally, I have met them, talked to them. Others I have only met on Twitter, but we do have agreeable, both professional and social talks. Remarkable: one of these Twitter friends I have never met “in the flesh”, is a colleague at the University of Amsterdam, but she works in the Medical Library, a long way from where I work.

    My subjects on Twitter:

    • football (soccer)
    • what I am watching on TV – music
    • what happens to me
    • politics
    • news

    but also:

    • metadata issues
    • day to day work issues
    • my IGeLU stuff
    • interesting blog posts
    • my new blog posts (like this one!)
    • interesting websites
    • library 2.0 news

    Twitter is also the “largest virtual expert helpdesk”, as digicmb recently experienced.

    My personal Twitter experience is like having chats and discussions with colleagues at work, or with friends in a bar, but with a much larger group; or attending some library systems conference, with professional discussions, and also with social events, but then a continuous, intermittent one, and without travelling.

    Institutional Twitter
    Now, how can an organisation, and in particular a library, use a tool, or rather a community, like Twitter for its own benefit?

    Twitter has been around for three years, but it is growing incredibly fast. In The Netherlands politicians use Twitter, like our Foreign Secretary. Well known people in all areas are on Twitter, example: British writer Ben Okri started publishing his new poem “I Sing a New Freedom” on Twitter, one line a day. Newspapers write about it, popular TV shows talk about it.
    So clearly, there is an ever growing audience. Libraries, as other organisations, should contact their audience where their audience is, so Twitter is another channel for communication.

    Organisations can use Twitter as an alternative for news items on websites, RSS feeds, blogs, etc. But is there an advantage in using Twitter instead of other web2.0 channels? I am not sure. Just like surfing to websites and subscribing to RSS feeds, people have to actively start “following” an institutional Twitter account. Organisations, libraries need to actively promote their Twitter channel for it to be a success. But they also need to actively maintain their Twitter channel, just like all other web2.0 activities, otherwise it will just fade away, as Meredith Farkas notices in her blog post “It’s not all about the tech – why 2.0 tech fails“.

    One advantage of using Twitter in libraries is the fact that it is getting beyond a hype. It will be one of the main channels of communication on the web.

    Another one might be the interactive possibilities of Twitter. Institutional use of Twitter is mostly one way traffic, a broadcast to whoever wants to “follow”, as opposed to personal Twitter. See for instance the Library of Congress and the Peace Palace Library.
    But as Alice de Jong points out, a number of libraries are choosing the “personal approach”. The tweeting librarian really communicates with patrons in order to promote closer contact between libaries and patrons.
    This approach might also be a replacement for current libary chat services.

    Personal institutional Twitter accounts could also be used as a means of representing the library as actual recognisable people, as has been promoted recently on a number of occasions. Patrons then will know library staff as experts in certain fields, instead of facing an anonymous organisation.

    Conclusion: yes, libraries should use Twitter, as long as they can get a reasonable “following”, and have an official policy and staff dedicated to maintaining it.

    • Share/Save/Bookmark
  • Collection 2.0

    Posted on February 15th, 2009 Lukas Koster No comments

    Henk Ellerman of Groningen University Library writes about the “Collection in the digital age” reacting to Mary Frances Casserly’s article “Developing a Concept of Collection for the Digital Age“. I haven’t read this 2002 article yet, but Henk Ellerman goes into the problem of finding a metaphor describing collections that for a large part consists of resources available on the internet.
    Henk says:
    …the collection (the one deemed relevant for… well whatever) is a subset that needs to be picked from the total set of available online resources.”
    “I find it quite remarkable that the
    new collection is seen as the result of a process of picking elements, a process similar to finding shells on a beach.”
    “What if we expand the notion of a collection in such a way that the sea becomes part of it?”
    “The main issue with any sensible collection is quality control. We don’t want ugly things in our collections.”
    “Then a collection is not a simple store of documents anymore, but a rather complex system of interrelated documents, controlled by a selected group of people.”
    “Librarians ‘just’ need to make the system searchable.”


    I have a couple of thoughts about collections myself that I would like to add to these.
    Originally, a collection is the total number of physical objects of a specific type that are in the possession of a person, or an organisation. Merriam-Webster says: “an accumulation of objects gathered for study, comparison, or exhibition or as a hobby“. People can collect Barbie dolls or miniature cars as a hobby, or rare books or monkey skulls for scientific reasons.
    (By the way, individual collectors of rare books are often described in movies as rich, old, excentric people with a small but very valuable collection of very old books about topics such as satanism, who end up being killed in a horrible way, and having there collections destroyed by fire, like I saw some time ago in Polanski’s “The ninth gate”.)

    When organisations have collections then it is almost always for study or exhibition, but also for practical reasons. We are talking mainly about museums and libraries. In the case of libraries there is a rough distinction between public libraries and libraries belonging to scientific and/or educational institutions. Let’s focus on educational libraries, or “university libraries” to make the picture a bit simple.
    University libraries have collected written and/or printed texts (books, journals, also containing images, maps, diagrams, etc.) in order to provide their staff and students with material to be able to teach and study. A library’s collection then describes all objects in the possession of the library. In the digital age, electronic journals and databases have been added to these collections, but in most cases this concerns only resources the library owns or for which the library pays money to gain access to. The collection then becomes the totality of objects (physical or digital) that the library owns or is granted access to by means of a contract. Freely available resources are explicitly not counted here.

    Now, here we have to make an important distinction between a library’s total collection (”the collection”), meaning “all items the library owns or has access to”), and a collection on a specific topic or for a specific subject (”subject collections”), meaning “all items that have been selected by professionals to be part of the material that is necessary for studying a specific topic”, for instance “the University of Amsterdam library’s Chess collection”. In the past, people would have to go to a specific library to consult a specific collection on a specific subject.
    “The collection” is merely the sum of all the library’s “subject collections”, nothing more.

    Before we go to the collection in the digital age, an interesting intermediate question is: what is the position of interlibrary loan in the concept of collection? Are books from other libraries that are available to a specific library’s patrons to be considered as part of that second library’s collection? In the strict sense of the collection concept (”all items the library owns”), the answer is “no”. But if we expand the notion of collection to mean: “everything a library has access to”, then the answer clearly would have to be “yes”.

    Now, in the digital age, the limitation that a collection’s objects should be available physically in a specific location, disappears. This means that anything can be part of “the collection” of a specific library, also objects or texts that have not been judged as scientific before, like blog posts. This is the “sea” that Henk Ellerman is talking about. A subject collection is also not limited by physical borders anymore. Subject collections can contain material, physical and digital, from anywhere. In this case, there is no reason that a subject collection should be a specific library’s subject collection, obviously. Key is “quality control”, or as Henk Ellerman puts it: “We don’t want ugly things in our collections“. Subject collections should be universal, global, virtual collections of physical and digital objects, “controlled by a selected group of people“.

    Now, the most important question: who decides who will be part of these selected groups of people? The answer to this question is still to be found. I guess we will see several types of “expert groups” emerge: coalitions between university libraries nationally or globally, but also between not-for-profit and commercial organisations, and of course also between individuals cooperating informally, like in the blogosphere, or in wikipedia .
    The collections that will be controlled by these coalitions will not have fixed boundaries, but will have more “professional” cores with several “less professional” spheres around it or intersecting with other collections.

    It is time we start building.

    • Share/Save/Bookmark
  • Unique authors

    Posted on February 4th, 2009 Lukas Koster 1 comment

    Jonathan Rochkind, in his post “How do name authorities work anyway?“, wonders if catalogers will confuse him with another writer of the same name that has an LC authority record, whereas he does not have one.

    I guess the relevance of this problem depends entirely on the question: do you think it’s important to know that an author of a specific work is the same as the author of another work? A former colleague of mine whom I respect very much, used to say that it does not matter, as long as the correct name appears with the work in question. This was only six years ago, before the emergence of web 2.0 and library 2.0 type services. It is just like looking at a printed book: you read the author’s name, and if there is no further information on the back cover, or a list of publications by the author inside, then that’s all there is to it. In normal life, if you read a book or an article for pleasure, or even for business, study or research, that is no problem. No need for author authority records at all.

    However, the picture is completely different from the point of view of the authors, especially in the case of professional scientific and research staff, where the exact number of publications and citations is crucial. For these authors it is vital that the correct authority record is used for their publications. Here we definitely need authority records with unique identifiers. But of course there are so many different systems in use: LC authority records , WorldCat Identities , national systems etc., they all use their own identifiers.

    There is the proposal to develop the UAI, Universal Author Identifier . This system depends on authors registering and maintaining their own personal information in a freely accessible web based database. There was a pilot system for a while, but it is not clear if any results were reached.

    In The Netherlands a similar project on a national scale has led to a live implementation: the DAI, Digital Author Identifier . The DAI is based on the identifier used for authors in the OCLC-PICA Dutch National Union Catalog /Common Catalog system “PPN”, and is assigned to every author who has been appointed to a position at a Dutch university or research institute or has some other relevant connection with one of these organisations. The DAI is used in the Dutch university repositories, the Dutch national Research Database and in the national integrated portal NARCIS .
    The difference with UAI is that DAI is assigned by catalogers in one of the participating organisations, whereas UAI depends on voluntary cooperation of the authors themselves.

    Of course a “universal author identifier” still does not solve Jonathan’s initial question: confusion is still possible if the authors do not have a clear interest in maintaining their information themselves.

    Another issue here, about which something more can be said in a future post, is that for a real universal system we should use URI’s, as for unique works (see Owen Stephens’ post “The Future is Analogue “) and subject headings.

    • Share/Save/Bookmark
  • Social networking high and low of the year

    Posted on December 16th, 2008 Lukas Koster 1 comment

    Last month the Dutch Advisory Committee on Library Innovation published its report “Innovation with Effect“. The report was commissioned by the Dutch Minister of Education, Culture and Science, the charge was to draw up a plan for library innovation for the period 2009-2012 including a number of required conditions. Priorities that had to be addressed were: provision of digital services, collection policy, marketing, HRM.
    The recommendations of the committee are classified in three main areas or “programmatic lines” under a more or less central direction and/or coordination:

    • Digital infrastructure (such as: one common information architecture, connection to nationwide and global information infrastructure, one national identity management system)
    • Innovation of digital services and products
    • Policy innovation

    Interesting report, but that is not what I want to point out here. What is very exciting: in the list of consulted sources, amidst official reports and publications, appears the social information professionals network Bibliotheek 2.0, the Dutch equivalent of http://library20.ning.com. This aroused much enthusiasm among the members of the Dutch library blogosphere.
    The Committee’s chairperson Josje Calff, deputy director of Leiden University Library, had started a discussion on the topic “One public library catalogue?” in this community, to which I am proud to say I also made a small contribution. The results of this discussion have been used by the committee in formulating their recommendations.

    In striking contrast to this success for web 2.0 social networking, there was a lot of outrage in the same Dutch library blogosphere last week about the ban of The Netherlands most popular social network Hyves and YouTube from one of the countries institutes for professional and adult education, reported on by one of its employees (in Dutch). Because of all the protests the school’s management is currently reconsidering their position and a new decision will be made beginning of 2009. Probably YouTube will continue to be permitted, because it is heavily used as a source of information in the lessons.

    • Share/Save/Bookmark