Data. The final frontier.
RSS icon Home icon
  • Local library data in the new global framework

    Posted on January 5th, 2012 Lukas Koster 33 comments

    2011 has in a sense been the year of library linked data. Not that libraries of all kinds are now publishing and consuming linked data in great numbers. No. But we have witnessed the publication of the final report of the W3C Library Linked Data Incubator Group, the Library of Congress announcement of the new Bibliographic Framework for the Digital Age based on Linked Data and RDF, the release by a number of large libraries and library consortia of their bibliographic metadata, many publications, sessions and presentations on the subject.

    All these events focus mainly on publishing library bibliographic metadata as linked open data. Personally I am not convinced that this is the most interesting type of data that libraries can provide. Bibliographic metadata as such describe publications, in the broadest sense, providing information about title, authors, subjects, editions, dates, urls, but also physical attributes like dimensions, number of pages, formats, etc. This type of information, in FRBR terms: Work, Expression and Manifestation metadata, is typically shared among a large number of libraries, publishers, booksellers, etc. ‘Shared’ in this case means ‘multiplied and redundantly stored in many different local systems‘. It doesn’t really make sense if all libraries in the world publish identical metadata side by side, does it?

    In essence only really unique data is worth publishing. You link to the rest.

    Currently, library data that is really unique and interesting is administrative information about holdings and circulation. After having found metadata about a potentially relevant publication it is very useful for someone to know how and where to get access to it, if it’s not freely available online. Do you need to go to a specific library location to get the physical item, or to have access to the online article? Do you have to be affiliated to a specific institution to be entitled to borrow or access it?

    Usage data about publications, both print and digital, can be very useful in establishing relevance and impact. This way information seekers can be supported in finding the best possible publications for their specific circumstances. There are some interesting projects dealing with circulation data already, such as the research project by Magnus Pfeffer and Kai Eckert as presented at the SWIB 11 conference, and the JISC funded Library Impact Data project at the University of Huddersfield. The Ex Libris bX service presents article recommendations based on SFX usage log analysis.

    The consequence of this assertion is that if libraries want to publish linked open data, they should focus on holdings and circulation data, and for the rest link to available bibliographic metadata as much as possible. It is to be expected that the Library of Congress’ New Bibliographic Framework will take care of that part one way or another.

    In order to achieve this libraries should join forces with each other and with publishers and aggregators to put their efforts into establishing shared global bibliographic metadata pools accessible through linked open data. We can think of already existing data sources like WorldCat, OpenLibrary, Summon, Primo Central and the like. We can only hope that commercial bibliographic metadata aggregators like OCLC, SerialsSolutions and Ex Libris will come to realise that it’s in everybody’s interest to contribute to the realisation of the new Bibliographic Framework. The recent disagreement between OCLC and the Swedish National Library seems to indicate that this may take some time. For a detailed analysis of this see the blog post ‘Can linked library data disrupt OCLC? Part one’.


    An interesting initiative in this respect is LibraryCloud, an open, multi-library data service that aggregates and delivers library metadata. And there is the HBZ LOBID project, which is targeted at ‘the conversion of existing bibliographic data and associated data to Linked Open Data‘.

    So what would the new bibliographic framework look like? If we take the FRBR model as a starting point, the new framework could look something like this. See also my slideshow “Linked Open Data for libraries”, slides 39-42.

    The basic metadata about a publication or a unit of content, on the FRBR Work level, would be an entry in a global datastore identified by a URI ( Uniform Resource Identifier). This datastore could for instance be WorldCat, or OpenLibrary, or even a publisher’s datastore. It doesn’t really matter. We don’t even have to assume it’s only one central datastore that contains all Work entries.

    The thing identified by the URI would have a text string field associated with it containing the original title, let’s say “The Da Vinci Code” as an example of a book. But also articles can and should be identified this way. The basic information we need to know about the Work would be attached to it using URIs to other things in the linked data web. A set of two things linked by a URI is called a ‘triple’. ‘Author’ could for instance be a link to OCLC’s VIAF ( = Dan Brown), which would then constitute a triple. If there are more authors, you simply add a URI for every person or institution. Subjects could be links to DBPedia/Wikipedia, Freebase, the Library of Congress Authority files, etc. There could be some more basic information, maybe a year, or a URI to a source describing the background of the work.

    At the Expression level, a Dutch translation would have it’s own URI, stored in the same or another datastore. I could imagine that the publisher who commissioned the translation would maintain a datastore with this information. Attached to the Expression there would be the URI of the original Work, a URI pointing to the language, a URI identifying the translator and a text string contaning the Dutch title, among others.

    Every individual edition of the work could have it’s own Manifestation level URI, with a link to the Expression (in this case the Dutch translation), a publisher URI, a year, etc. For articles published according to the long standing tradition of peer reviewed journals, there would also be information about the journal. On this level there should also be URIs to the actual content when dealing with digital objects like articles, ebooks, etc., no matter if access is free or restricted.

    So far we have everything we need to know about publications “in the cloud”, or better: in a number of datastores available on a number of servers connected to the world wide web. This is more or less the situation described by OCLC’s Lorcan Dempsey in his recent post ‘Linking not typing … knowledge organization at the network level’. The only thing we need now is software to present all linked information to the user.

    No libraries in sight yet. For accessing freely available digital content on the web you actually don’t need a library, unless you need professional assistance finding the correct and relevant information. Here we have identified a possible role of librarians in this new networked information model.

    Now we have reached the interesting part: how to link local library data to this global shared model? We immediately discover that the original FRBR model is inadequate in this networked environment, because it implies a specific local library situation. Individual copies of a work (the Items) are directly linked to the Manifestation, because FRBR refers to the old local catalogue which describes only the works/publications one library actually owns.

    In the global shared library linked data network we need an extra explicit level to link physical Items owned by the library or online subscriptions of the library to the appropriate shared network level. I suggest to use the “Holding” level. A Holding would have it’s own URI and contain URIs of the Manifestation and of the Library. A specific Holding in this way would indicate that a specific library has one or more copies (Items) of a specific edition of a work (Manifestation), or offers access to an online digital article by way of a subscription.


    If a Holding refers to physical copies (print books or journal issues for instance) then we also need the Item level. An Item would have it’s own URI and the URI of the Holding. For each Item, extra information can be provided, for instance ‘availability’, ‘location’, etc. Local circulation administration data can be registered for all Holdings and Items. For online digital content we don’t need Items, only subscription information directly attached to the Holding.

    Local Holding and Item information can reside on local servers within the library’s domain or just as well on some external server ‘in the cloud’.

    It’s on the level of the Holding that usage statistics per library can be collected and aggregated, both for physical items and for digital material.

    Now, this networked linked library data model still allows libraries to present a local traditional catalogue type interface, showing only information about the library’s own print and digital holdings. What’s needed is software to do this using the local Holdings as entry level.

    But the nice thing about the model is that there will also be a lot of other options. It will also be possible to start at the other end and search all bibliographic metadata available in the shared global network, and then find the most appropriate library to get access to a specific publication, much like WorldCat does, but on an even larger scale.

    Another nice thing of using triples, URIs and linked data, is that it allows for adding all kinds of other, non-traditional bibliographic links to the old inward looking library world, making it into a flexible and open model, ready for future developments. It will for instance be possible for people to discover links to publications and library holdings from any other location on the web, for instance a Wikipedia page or a museum website. And the other way around, from an item in local library holdings to let’s say a recorded theatre performance on YouTube.

    When this new data and metadata framework will be in place, there will be two important issues to be solved:

    • Getting new software, systems and tools for both back end administrative functions and front end information finding needs. For this we need efforts from traditional library systems vendors but also from developers in libraries.
    • Establishing future roles for libraries, librarians and information professionals in the new framework. This may turn out to be the most important issue.
  • User experience in public and academic libraries

    Posted on April 9th, 2010 Lukas Koster 5 comments

    User experience treasure map © Peter Morville

    A couple of recent events got me thinking about differences and similarities of public and academic libraries in the digital age. I used to think that current and future digital developments would in the end result in public and academic libraries moving closer to each other, but now I’m not so sure anymore. Let me explain how this happened.

    On April 1st I attended the UgameUlearn 2010 symposium organised by Delft University Library and DOK, “The library Concept Center“, a public library. Two of the speakers there were David Lee King, Digital Branch & Services Manager at Topeka & Shawnee County Public Library, and Michael Stephens, Assistant Professor Library and Information Science at Dominican University. Have a look at David’s and Michael’s slides.
    A week before that I had the opportunity to see Helene Blowers speak at DB Update 2010 about “Reality check 2.010 – 5 trends shaping libraries“. She is Digital Strategy Director for the Columbus Metropolitan Library and the inventor of 23 Things. Also John Blyberg was there, Assistant Director for Innovation and User Experience of Darien Library, another well known innovative public library. He presented SOPAC, “the Social OPAC”.

    © UgameUlearn - Geert van den Boogaard

    The central topic of the presentations of these four people, all working for, or mainly dealing with, public libraries, was and is “connecting with the public“, both in material and digital ways: by creating inviting, welcoming and collaborative spaces, both in physical library locations and online.
    What particularly hit me was the fact that David Lee King’s official title is “digital branch manager“, meaning that the library’s online activities, or web presence, constitute just another branch beside and equal to the physical branch locations, which has to be managed as one front office in a coherent way.

    Judging by all four people’s job descriptions and presentations, public libraries appear to be very much involved in improving user experience (UX) and web presence. How is this in academic libraries? I think it’s different. I started to think that university and public libraries are moving in different directions the day before UgameUlearn, when I participated in the final session of a brainstorming and discussion track about the future of the library, more particularly of the Library of the University of Amsterdam, the institution I work for. Some of our conclusions are:

    • 90% of the university library’s material will be digital
    • Physical library buildings will disappear
    • Library tasks and services will be more closely tied to the university’s core business, education and research.

    Only the Heritage and Special Collections departments will still have lots of physical books, journals and other objects, and become a separate museum-like entity. I think they should have a look at Michael Edson‘s UgameUlearn slides about the efforts that are being done at the Smithsonian Institute to engage their audience.

    The “digital branch” concept was not identified as a separate development in the future of the library discussion, probably because effectively we will see separate branches developing per subject area. Currently, at the Library of the University of Amsterdam, managing the Library’s web presence is the shared responsibility of the three main central divisions: Acquisition and Metadata Services, Public Services and Electronic Services, and a number of Faculty/Departmental Libraries. Still, the digital branch idea is an interesting concept to investigate, at least for the short term. But at university libraries a digital branch should extend its tasks beyond user experience alone.

    Anyway, the day after UgameUlearn Helene Blowers, David Lee King and Michael Stephens were guests in the live stream show “This week in libraries”, organised by Jaap van de Geer and Erik Boekesteijn, both DOK, and also two of the four UgameUlearn organisers.

    I took my chance and asked them the question: “What should a digital branch in an academic library look like? Different from public library?” If you’re interested, you will find the question and answers towards the very end of the stream.
    Helene, David and Michael agreed more or less that there probably isn’t much difference, that you should ask your community what they want, and that, just like with public libraries, every community has different needs. A clear distinction is that in universities there are three groups of customers: students, faculty and staff, with somewhat different needs. University libraries should at least support the learning process, for instance by creating spaces for collaboration.

    Since then I had the chance of giving this some more thought, and I came to the conclusion that there are probably more differences than similarities between public and academic libraries. Not in the least because Aad Janson of the Peace Palace Library commented (in Dutch) that their library is an academic library too but without students and scientists. This made me realise that there are several different types of libraries, distinguished by a number of important characteristics (audience, subscription, collection type, funding) that influence their position in digital developments. I have tried to compose a, definitely not complete, summary of these library types in a table. I guess the Peace Palace Library would qualify as a “research/scientific library” in this classification scheme.

    Library type
    local community voluntary local, mostly physical subscriptions, public (local)
    national, global voluntary national physical + digital public (national)
    specific professions, students voluntary local physical, remote digital pubic, private
    Museums/archives global community voluntary local physical + digital public, private
    Special explicitly defined voluntary, automatic local physical + digital public, private
    staff automatic local physical + digital, remote digital private
    Governmental bodies
    staff automatic local physical + digital, remote digital public
    International organisations
    staff automatic local physical + digital, remote digital public, private
    University/higher education students, faculty, staff automatic local physical + digital, remote digital public, private

    I presume that libraries that are dependent on voluntary subscriptions, like public and research/scientific libraries, will put more effort into improving “user experience”. This will be reinforced if the library’s collection is not a unique selling point, and funding is partly based on patron fees. Public libraries have to compete for customers (and not with their collections) and at the same time satisfy local city councils.

    On the other end of the scale we see university libraries that get there customers “into the bargain”, customers who need their affiliation to get access to restricted databases and e-journal articles. Contrary to public libraries, the collections of university/higher education libraries consist of more than the local catalogue: numerous local and remote repositories, databases, e-journals, etc. Consequently, these libraries will put relatively more effort into consolidating and linking all these databases, especially when they have the technical staff to do so. The contributions by academic libraries, and also some national and museum libraries, to linked data and mashup developments for instance seem to confirm this.

    Libraries between these two extremes will probably merge both approaches in various ways, depending on the actual mix of audience, subscription, collection and funding type.

    This is not to say that academic libraries are not interested in improving user experience at all. But it’s just different. Unlike public libraries, academic libraries don’t have to attract new customers with staff recommendations, themes of the month, etc., because students, teachers and researchers each have their own fairly well described subject areas. They just have to provide them with the right finding aids. And these finding aids do not necessarily have to be provided by the libraries themselves, as long as they do offer their patrons efficient delivery mechanisms.

    Of course all types of libraries can learn and benefit from each other’s work and even cooperate. After all, good data structures and relations are indispensable for an optimal user experience.

  • Just in time or just in case?

    Posted on October 16th, 2009 Lukas Koster 4 comments

    Metasearch vs. harvesting & indexing

    The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.

    A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.

    I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.

    Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.

    The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.

    Anyway,  we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.

    With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.

    Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.

    In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.

    But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.

    We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.

    The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.

  • Social networking high and low of the year

    Posted on December 16th, 2008 Lukas Koster 1 comment

    Last month the Dutch Advisory Committee on Library Innovation published its report “Innovation with Effect“. The report was commissioned by the Dutch Minister of Education, Culture and Science, the charge was to draw up a plan for library innovation for the period 2009-2012 including a number of required conditions. Priorities that had to be addressed were: provision of digital services, collection policy, marketing, HRM.
    The recommendations of the committee are classified in three main areas or “programmatic lines” under a more or less central direction and/or coordination:

    • Digital infrastructure (such as: one common information architecture, connection to nationwide and global information infrastructure, one national identity management system)
    • Innovation of digital services and products
    • Policy innovation

    Interesting report, but that is not what I want to point out here. What is very exciting: in the list of consulted sources, amidst official reports and publications, appears the social information professionals network Bibliotheek 2.0, the Dutch equivalent of This aroused much enthusiasm among the members of the Dutch library blogosphere.
    The Committee’s chairperson Josje Calff, deputy director of Leiden University Library, had started a discussion on the topic “One public library catalogue?” in this community, to which I am proud to say I also made a small contribution. The results of this discussion have been used by the committee in formulating their recommendations.

    In striking contrast to this success for web 2.0 social networking, there was a lot of outrage in the same Dutch library blogosphere last week about the ban of The Netherlands most popular social network Hyves and YouTube from one of the countries institutes for professional and adult education, reported on by one of its employees (in Dutch). Because of all the protests the school’s management is currently reconsidering their position and a new decision will be made beginning of 2009. Probably YouTube will continue to be permitted, because it is heavily used as a source of information in the lessons.