Library2.0 and beyond
RSS icon Home icon
  • The poor person’s linked open data workbench

    Posted on November 11th, 2013 Lukas Koster 4 comments

    Using discovery tools for presenting integrated information

    There has been a lot of discussion in recent years about library discovery tools. Basically, a library discovery tool provides a centrally maintained shared scholarly material metadata index, a system for searching and an option for adding a local metadata index. Academic libraries use it for providing a unified access platform to subscribed and open access databases and ejournals as well as their own local print and digital holdings.

    workbenchoutside

    © vlashton

    I would like to put forward that, despite their shortcomings, library discovery tools can also be used for finding and presenting other scholarly information in the broadest sense. Libraries should look beyond the narrow focus on limitations and turn imperfection into benefits.

    The two main points of discussion regarding discovery tools are the coverage of the central shared index and relevance ranking. For a number of reasons of a practical, technical and competitive nature, none of the commercial central indexes cover all the content that academic libraries may subscribe to. Relevance ranking of search results depends on so many factors that it is a science in itself to satisfy each and every end user with their own specific background and context. Discovery tool vendors spend a lot of energy in improving coverage and relevance ranking.

    These two problems are the reason that not many academic libraries have been able to achieve the one-stop unified scholarly information portals for their staff and students that discovery tool providers promised them. In most cases the institutional discovery portal is just one of the solutions for finding scholarly publications that are offered by the library. A number of libraries are reconsidering their attitude towards discovery tools, or have even decided to renounce these tools altogether and focus on delivery instead, leaving discovery to external parties like Google Scholar.

    legoworkbench

    © derletzteschrei

    I fully support the idea that libraries should reconsider their attitude towards discovery tools, but I would like to stress that they should do so with a much broader perspective than just the traditional library responsibility of providing access to scholarly publications. Libraries must not throw away the baby with the bathwater. They should realise that a discovery tool can be used as a platform for presenting connected scholarly information, for instance publications with related research project information and research datasets, based on linked open data principles. You could call this the “poor person’s linked open data platform”, because the library has already paid the license fee for the discovery platform, and it does not have to spend a lot of extra money on additional linked open data tools and facilities.

    Of course this presupposes a number of things: the content to be connected should have identifiers, preferably in the form of URIs, and should be openly available for reuse, preferably via RDF. The discovery tools should be able to process URIs and RDF and present the resolved content in their user interfaces. We all know that this is not the case yet. Long term strategies are needed.

    Content providers must be convinced of the added value of adding identifiers and URIs to their metadata and providing RDF entry points. In the case of publishers of scholarly publications this means identifiers/URIs for the publications themselves, but also for authors, contributors, organisations, related research projects and datasets. A number of international associations and initiatives are already active in lobbying for these developments: OpenAIRE, Research Data Alliance, DataCite, the W3C Research Object for Scholarly Communication Community Group, etc. Universities themselves can contribute by adding URIs and RDF to their own institutional repositories and research information systems. Some universities are implementing special tools for providing integrated views on research information based on linked data, such as VIVO.
    There are also many other interesting data sources that can be used to integrate information in discovery tools, for instance in the government and cultural heritage domain. Many institutions in these areas already provide linked open data entry points. And then there is WikiPedia with its linked open data interface DBpedia.

    On the other side of the scale discovery tool providers must be convinced of the added value of providing procedures for resolving URIs and processing RDF in order to integrate information from internal and external data sources into new knowledge. I don’t know of any plans for implementing linked open data features in any of the main commercial or open source discovery tools, except for Ex LibrisPrimo. OCLC provides a linked data section for each WorldCat search result, but that is mainly focused on publishing their own bibliographic metadata in linked data format, using links to external subject and author authority files. This is a positive development, but it’s not consumption and reuse of external information in order to create new integrated knowledge beyond the bibliographic domain.

    With the joint IGeLU/ELUNA Linked Open Data Special Interest Working Group the independent Ex Libris user groups have been communicating with Ex Libris strategy and technology management on the best ways to implement much needed linked open data features in their products. The Primo discovery tool (with the Primo Central shared metadata index) is one of the main platforms in focus. Ex Libris is very keen on getting actual use cases and scenarios in order to identify priorities in going forward. We have been providing these for some time now through publications, presentations at user group conferences, monthly calls and face to face meetings. Ex Libris is also exploring best practices for the technical infrastructure to be used and is planning pilots with selected customers.

    While this may take some time to mature, in the mean time libraries who have access to their discovery tool’s back office and user interface HTML files can start experimenting with and implementing add-ons for integration of the tool’s metadata index with external information. This should be possible with open source discovery tools like VuFind and local or hosted installations with back office access of commercial products. The only commercial product that offers that option, as far as I know, is Primo. Creating local linked open data add-ons can be done by applying a combination of manipulation of local index metadata fields, JavaScript/jQuery in the front end HTML and the use of any open APIs available for the tool.
    The Austrian national library service OBVSG for instance has integrated WikiPedia/DBpedia information about authors in their Primo results.
    The Saxon State and University Library Dresden (SLUB) has implemented a multilingual semantic search tool for subjects based on DBpedia in their Primo installation.
    At the University of Amsterdam I have been experimenting myself with linking publications from our Institutional Repository (UvA DARE) in Primo with related research project information. This has for now resulted in adding extra external links to that information in the Dutch National Research portal NARCIS, because NARCIS doesn’t provide RDF yet. We are communicating with DANS, the NARCIS provider, about extending their linked open data features for this purpose.
    Of course all these local implementations can serve as use cases for discovery tool providers.

    I have only talked about the options of using discovery tools as a platform for consuming, reusing and presenting external linked open data, but I can imagine that a discovery tool can also be used as a platform for publishing linked open data. It shouldn’t be too hard to add extra RDF options besides the existing HTML and internal record format output formats. That way libraries could have a full linked open data consumption and publishing workbench at their disposal at minimal cost. Library discovery tools would from then on be known as information discovery tools.

    Share

  • PhD, the final frontier – Part two: Reconnaissance

    Posted on November 9th, 2013 Lukas Koster 2 comments

    Tools and methods for PhD research and writing

    After my decision was made to attempt a scholarly career and aim for a PhD (see my first post in this series), I started writing a one page proposal. When I finished I had a working title, a working subtitle, a table of content with 6 chapter titles, and a very general and broad text about some contradictions between the state of technology and the integration of information. I can’t really go into any details, because the first thing you learn when you aim for a scholarly publication, is to keep your subject to yourself as much as possible in order to prevent someone else from hijacking it and get there first.

    Probe

    I used Google Docs/Drive for writing this proposal. I have been using Google Docs since a couple of years for all my writing, both personal and professional, because that way I have access to my documents wherever I go on any device I want, I can easily share and collaborate, and it keeps previous versions.

    I shared my proposal with my advisor Frank Huysmans, using the Google Docs sharing options. We initially communicated about it through Twitter DMs, Google Docs comments and email. Frank’s first reaction was that it looked like a good starting point for further exploration, and he saw two related perspectives to explore: Friedrich Kittler’s media ontology and Niklas Luhmann’s system theory. I had studied a book by Luhmann on the sociology of legal trials during my years in university, but he only became one of the major theoretical sociologists after that. I hadn’t heard of Kittler at all. He is a controversial major media and communications philosopher. Both are (or rather were) Germans.

    Next step: I needed to get hold of literature, books and articles, both by and about Luhmann and Kittler, preferably in English, although I read German very well. Some background information about the scholars and their work would also be useful.
    An important decision I had to make was whether I was going to use print, digital or both formats for my literature collection. I didn’t have to think long. I decided to try and get everything in digital format, either as online material, or downloadable PDFs, EPUBs etc. The main reason is that this way I can store the publications in an online storage facility like Dropbox, and have access to all my literature wherever I am, at work, at home and on the road (either on an ereader or my smartphone).

    Frank, who is a Luhmann expert, gave me some pointers on publications by and about Luhmann. Kittler I had to find on my own. Of course, working for the Library of the University of Amsterdam and being responsible for our Primo discovery tool, I tried finding relevant Luhmann and Kittler publications there. I also tried Google and Google Scholar. I used my own new position as a library consumer to perform a basic comparison test between library discovery and Google. My initial conclusions were that I got better results using Google than our own Primo. This was in March 2013. But in the mean time both Ex Libris and the Library have made some important adjustments to the Primo indexing procedures and relevance ranking algorithm. Repeating similar searches in November 2013 provided much better results.
    Anyway, I mostly ignored print publications. As a staff member of the University of Amsterdam I have access to all subscription online content, whether I find it through our own discovery tools or via Google. What I can’t find in our library discovery tools are ‘unofficial’ digital versions of print books. Here Google can help. For instance I found a complete PDF version of Luhmann’s main work “Die Gesellschaft der Gesellschaft” (“Society of society”, in German). Also Frank was so kind to digitise/copy for me some chapters of relevant print books about Luhmann.

    I discovered that I needed some sort of mechanism to categorise or catalogue my literature in such a way that this is available to me wherever I am, other than just put files in a Dropbox folder. After some research I decided on Mendeley, which was originally presented as a reference manager, but is now more a collaboration, publication sharing and annotation environment. I am not using Mendeley as a reference manager for now, only for categorising digital local and online publications and making annotations attached to the publications.
    An important feature that I need in research is to have access to my notes everywhere. With Mendeley I can attach notes to PDFs in the Mendeley PDF reader which are synchronised with my other Mendeley installations on other workstations, if the ‘synchronise files’ option is checked everywhere. Actually Mendeley distinguishes between “notes” and “annotations”. Notes are made and attached on the level of the publication, annotations are linked to specific text sections in PDFs. Annotations don’t work with EPUB files, because there is no embedded Mendeley EPUB reader, or with online URLs. I can add notes to specific EPUB text sections in my ereader which is synchronised between my ereader software instances. So I still need an independent annotation tool that can link my notes to all my research objects and synchronises them between my devices independent of format or sofware.

    A final note about digital formats. PDFs are the norm in digital scholarly dissemination of publications. PDFs are fine to read on a PC, and attaching notes in Mendeley, but they’re horrible to read on my ereader. There I prefer EPUB. I would really like to have more standard options for downloading publications to select from, at least PDF and EPUB.

    Summarising, the functions I need for my PhD research and writing in this stage, and the tools and methods I am currently using:

     

    • Find publications, information: library discovery tools, Google Scholar, the web
    • Acquire digital copies of publications: access/authorisation; PDF, EPUB
    • Store material independent of location and device: Dropbox, Mendeley
    • Save references: Mendeley
    • Notes/annotations: Mendeley, Kobo ereader software
    • Write and share documents independent of location and device: Google Docs
    • Communicate: Twitter, email, Google Docs comments
    Share

  • PhD, the final frontier – Part one: To boldly go

    Posted on August 18th, 2013 Lukas Koster 1 comment

    My final career? Facing the challenge

    This is the first in hopefully a series of posts about a new phase in my professional life, in which I will try to pursue a scholarly career with a PhD as my first goal. My intention with this series is to document in detail the steps and activities needed to reach that goal. In this introduction I will describe the circumstances and considerations that finally led to my decision to take the plunge.

    ToBoldlyGo

    First some personal background. I graduated in Sociology at the University of Amsterdam in 1987. I received the traditional Dutch academic title of “Doctorandus” (“Drs.”) which in the current international higher education Bachelor/Masters system is equivalent to a Master of Science (MSc). I specialised in sociology of organisations, labour and industrial relations, with minors in economics and social science “informatics”. I wrote my thesis “Arbeid onder druk” (“Labour under pressure”) on automation, the quality of labour and workers’ influence in the changing printing industry in the 20th century.
    The job market for sociologists in the 1980s and 1990s was virtually non-existent, so I had to find an alternative career. One of the components of the social science informatics minor was computer programming. I learned to program in Pascal on a mainframe using character based monochrome terminals. I actually liked doing that, and I decided to join the PION IT Retraining programme for unemployed (or underemployed) academics organised in the 1980s by the Dutch State to overcome the growing shortage of IT professionals. After a test I was accepted and I finished the course, during which I learned programming in COBOL, with success in 1988. However, it took me another two years to finally find a job. From 1990 I worked as a systems designer and developer (programming in PL/1, Fortran, Java among others) for a number of institutions in the area of higher education and scholarly information until 2002. Then I was sent away with a year’s salary from the prematurely born Institute for Scholarly Information Services NIWI, which was terminated in 2005. From its ruins the now successful Data Archiving and Networking Services institute (DANS) emerged.
    It was then that I tried to take up a new academic career for the first time, and I enrolled in Cultural Studies at the Open University. I enjoyed that very much, I took a lot of courses and even managed to pass a number of exams.
    By the end of that year, after a chance meeting with a former colleague in the tram on my way to take an exam, I found myself working at the Dutch Royal Library in The Hague as temporary project staff member for implementing the new federated search and OpenURL tools MetaLib and SFX from Ex Libris. This marked the start of my career in library technology. It was then that I first learned about bibliographic metadata formats, cataloguing rules and MARC madness.
    I soon discovered that it’s very hard to combine work and studying, and because I started to like working for libraries and networking and exchanging knowledge I silently dropped out of Open University.
    After three years on temporary contracts I moved to the Library of the University of Amsterdam to do the same work as I did at the Royal Library. I got involved in the International Ex Libris User Group IGeLU and started trying to make a difference in library systems, working together with an enthusiastic bunch of people all over the world. I made it to head of department for a while, until an internal reorganisation gave me the opportunity to say goodbye to that world of meetings, bureaucracy and conflicts. Now I am Library Systems Coordinator, which doesn’t mean that I am coordinating library systems, by the way. My main responsibility at the moment is our Ex Libris Primo discovery tool. The most important task with that is coordinating data and metadata streams. A lot of time, money and effort is spent on streamlining the moving around of metadata between a large number of internal and external systems. Since a couple of years I have been looking into, reading about, writing and presenting on linked open data and metadata infrastructures, with now and then a small pilot project. But academic libraries are so slow in realising that they should move away from thinking in new systems for new challenges and investing in metadata and data infrastructures instead, that I am not overly enthusiastic about working in libraries anymore. It was about a year ago that I suddenly realised that with Primo I was still doing exactly the same detailed configuration stuff and vendor communications that I was doing ten years earlier with MetaLib, and nothing had changed.
    It was then that I said to myself: I need to do something new and challenging. If this doesn’t happen at work soon, then I have to find something satisfying to do besides that.

    I can’t reproduce the actual moment anymore, but I think it happened when I was working with getting scholarly publications (dissertations among others) from our institutional repository into Primo. Somehow I thought: I can do that too! Write a dissertation, get a PhD! My topic could be something in the field of data, information, knowledge integration, from a sociological perspective.
    So I started looking into the official PhD rules and regulations at the University of Amsterdam, to find out what possibilities there are for people with a job getting a PhD. It turns out there are options, even with time/money compensation for University staff. But still not everything was clear to me. So I decided to ask Frank Huysmans, part time Library Science professor at the University of Amsterdam, and also active on twitter and in the open data and public libraries movement, if he could help me and explain the options and pros and cons of writing a dissertation. He agreed and we met in a pub in Amsterdam to discuss my ideas over a couple of nice beers.

    The good thing was that Frank thought that I should be able to pull this off, looking at my background, work experience and writing. The bad thing is that he asked me “Are you sure that you don’t just want to write a nice popular science book?”. Apparently scholarly writing is subject to a large amount of rules and formats, and not meant for a pleasant read.
    An encouraging thing that Frank told me is that it is possible to compile a dissertation from a number of earlier published scholarly peer reviewed articles. Now this really appealed to me, because this means that I can attempt to write a scholarly article and try to get it published first, and get the feel of the art of scholarly research, writing and publication, in a relatively short period. This way I can leave the options open to leave it at that or to continue with the PhD procedure later. I agreed to write a short dissertation proposal and send it to Frank and to discuss that in a next meeting.

    My decision was made. Although in the meantime a couple of interesting perspectives at work appeared on the horizon involving research information and linked data, I was going to try and start a scholarly career.

    Next time: the first steps – reading, thinking, writing a draft proposal and how to keep track of everything.

    Share

  • A day between the stacks

    Posted on August 9th, 2013 Lukas Koster 2 comments

    Connecting real books, metadata and people

    I spent one day in the stacks of the off-site central storage facility of the Library of the University of Amsterdam as one of the volunteers helping the library perform a huge stock control operation which will take years. Goal of this project is to get a complete overview of the discrepancy between what’s on the shelves and what is in the catalogue. We’re talking about 65 kilometer of shelves, approximately 2.5 million items, in this central storage facility alone.


    To be honest, I volunteered mainly for personal reasons. I wanted to see what is going on behind the scenes with all these physical objects that my department is providing logistics, discovery and circulation systems for. Is it really true that cataloguing the height of a book (MARC 300$c) is only used for determining which shelf to put it on?

    The practical details: I received a book cart with a pencil, a marker, a stack of orange sheets of paper with the text “Book absent on account of stock control” and a printed list of 1000 items from the catalogue that should be located in one stack. I stood on my feet between 9 AM and 4 PM in a space of around 2-3 meter in one aisle between two stacks, with one hour in total of coffee and lunch breaks, in a huge building in the late 20th century suburbs of Amsterdam without mobile phone and internet coverage. I must say, I don’t envy the people working there. I’m happy to sit behind my desk and pc in my own office in the city centre.

    Most of my books were indeed in a specific size range, 18-22 cm. approximately, with a couple of shorter ones. I found approximately 25-30 books on the shelves that were not on my list, and therefore not in the catalogue. I put these on my cart, replacing them with one of the orange sheets on which I pencilled the shelfmark of the book. There were approximately 5-10 books on my list missing from the shelves, which I marked on the list. One book had a shelfmark on the spine that was identical to the one of the book next to it. Inside was a different code, which seemed to be the correct one (it was on the list, 10 places down). I put 10 books on the cart of which I thought the title on the list didn’t match the title on the book correctly, but this is a tricky thing, as I will explain.

    Title metadata
    The title printed on the list was the “main title”, or MARC 245$a. It is very interesting to see how many differences there are between the ways that main and subtitles have been catalogued by different people through the ages. For instance, I had two editions on my list (1976 and 1980) of a German textbook on psychiatry, with almost identical titles and subtitles (title descriptions taken from the catalogue):

    - Psychiatrie, Psychosomatik, Psychotherapie : Einführung in die Praktika nach der neuen Approbationsordnung für Ärzte mit 193 Prüfungsfragen : Vorbereitungstexte zur klinischen und Sozialpsychiatrie, psychiatrischen Epidemiologie, Psychosomatik,Psychotherapie und Gruppenarbeit

    - Psychiatrie : Psychosomatik, Psychotherapie ; Einf. in d. Praktika nach d. neuen Approbationsordnung für Ärzte mit Schlüssel zum Gegenstandskatalog u.e. Sammlung von Fragen u. Antworten für systemat. lernende Leser ; Vorbereitungstexte zur klin. u. Sozialpsychiatrie, psychiatr. Epidemiologie, Psychosomatik, Psychotherapie u. Gruppenarbeit

    The first book, from 1976 (which actually has ‘196’ instead of ‘193’ on the cover), is on the list and in the catalogue with the main title (MARC 245$a) “Psychiatrie, Psychosomatik, Psychotherapie :”.
    The second book, from 1980, is on the list with the main title “Psychiatrie :”.
    Evidently it is not clear without a doubt what is to be catalogued as main title and subtitle just by looking at the cover and/or title page.

    I have seen a lot of these cases in my batch of 1000 books in which it is questionable what constitutes the main and subtitle. Sometimes the main title consists of only the initial part, sometimes it consists of what looks like main and subtitle taken together. At first I put all parts of a serial on my cart because in my view the printed titles were incorrect. They only contained the title of the specific part of the serial, whereas in my non-librarian view the title should consist of the serial title + part title. On the other hand I also found serials for which only the serial title was entered as main title (5 items “Gesammelte Werke”, which means “Collected Works” in German). No consistency at all.
    What became clear to me is that in a lot of cases it is impossible to identify a book by the catalogued main title alone.

    Another example of problematic interpretation I came across: a Spanish book, with main title “Teatro de Jacinto Benavente” on my list, and on the cover the author name “Jacinto Benavente” and title “Teatro: Rosas de Otoño – Al Natural – Los Intereses Creados”. On the title page: “Teatro de Jacinto Benavente”.

     

     

     

     

     

     

     

     

     

     

     

     

     

    In the catalogue there are two other books with plays by the same author, just titled “Teatro”. All three have as author “Jacinto Benavente”. All three are books containing a number of theatre plays by the author Jacinto Benavente. There were a lot of similar books with as recorded main title ‘Theatre‘ in a number of languages.

    A lot of older books on my shelves (pre 20th century mainly, but also more recent ones) have different titles and subtitles on their spine, front and title page. Different variations depending on the available print space I guess. It’s hard to determine what the actual title and subtitles are. The title page is obviously the main source, but even then it looks difficult to me. Now I understand cataloguers a little better.

    Works on the shelves
    So much for the metadata. What about the actual works? There were all kinds of different types mixed with each other, mostly in batches apparently from the same collection. In my 1000 items there were theatre related books, both theoretical works and texts of plays, Russian and Bulgarian books of a communist/marxist/leninist nature, Arab language books of which I could not check the titles, some Swedish books, a large number of 19th century German language tourist guides for Italian regions and cities, medical, psychological and physics textbooks, old art history works, and a whole bunch of social science textbooks from the eighties of which we have at least half at our home (my wife and I both studied at the University of Amsterdam during that period ). I can honestly say that most of the textbooks in my section of the stacks are out of date and will never be used for teaching again. The rest was at least 100 years old. Most of these old books should be considered as cultural heritage and part of the Special Collections. I am not entirely sure that a university library should keep most of these works in the stacks.

    Apart from this neutral economical perspective, there were also a number of very interesting discoveries from a book lover’s viewpoint, of which I will describe a few.

    A small book about Japanese colour prints containing one very nice Hokusai print of Mount Fuji.

     

    A handwritten, and therefore unique, item with the title (in Dutch) a “Bibliography of works about Michel Angelo, compiled by Mr. Jelle Hingst”, containing handwritten catalogue type cards with one entry each.

     

    A case with one shelfmark containing two items: a printed description and a what looks like a facsimile of an old illustrated manuscript.

     

    An Italian book with illustrations of ornaments from the cathedral of Siena, tied together with two cords.

     

    And my greatest discovery: an English catalogue of an exhibition at the Royal Academy from 1908: “Exhibition of works by the old masters and by deceased masters of the British school including a collection of water colours” (yes, this is one big main title).

    But the book itself is not the discovery. It’s what is hidden inside the book. A handwritten folded sheet of paper, with letterhead “Hewell Grange. Bromsgrove.” (which is a 19th century country house, seat of the Earls of Plymouth, now a prison), dated Nov. 23. 192. Yes, there seems to be a digit missing there. Or is it “/92”? Which would not be logical in an exhibition catalogue from 1908. It definitely looks like a fountain pen was used. It also has some kind of diagonal stamp in the upper left corner “TELEGRAPH OFFICE FINSTALL”. Finstall is a village 3 km from Hewell Grange.

    The paper also has a pencil sketch of a group of people, probably a copy of a painting. At first I thought it was a letter, but looking more closely it seems to be a personal impression and description of a painting. There are similar handwritings on the pages of the book itself.
    I left the handwritten note where I found it. It’s still there. You can request the book for consultation and see for yourself.

     

    Conclusions
    End users, patrons, customers or whatever you want to call them, can’t find books that the library owns if they are not catalogued. They can find bibliographic descriptions of the books elsewhere, but not the information needed to get a copy at their own institution. This confirms the assertion that holdings information is very important, especially in a library linked open data environment.

    The majority of books in an academic library are never requested, consulted or borrowed. Most outdated textbooks can be removed without any problem.

    There are a lot of cultural heritage treasures hidden in the stacks that should be made accessible to the general public and humanities researchers in a more convenient way.

    In the absence of open stacks and full text search for printed books and journals it is crucial that the content of books, and articles too, is described in a concise, yet complete way. Not only formal cataloguing rules and classification schemes should be used, but definitely also expert summaries and end user generated tags.

    Even with cataloguing rules it can be very hard for cataloguers to decide what the actual titles, subtitles and authors of a book are. The best source for correct title metadata are obviously the authors, editors and publishers themselves.

    Book storage staff can’t find requested books with incorrect shelfmarks on the spine.

    Storing, locating, fetching and transporting books does not require librarian skills.

    All in all, a very informative experience. 

    Share

  • Meeting people vs. meeting deadlines

    Posted on June 23rd, 2013 Lukas Koster 1 comment

    Lessons from Cycling for Libraries

    DSC09913

    As I am writing this, more than 100 people working in and for libraries from all over the world are cycling from Amsterdam to Brussels in the Cycling for Libraries 2013 event, defying heat, cold, wind and rain. And other cyclists ;-). Cycling for Libraries is an independent unconference on wheels that aims to promote and defend the role of libraries, mainly public libraries, in and for society. This year it’s the third time the trip is organised. In 2011 (Copenhagen-Berlin) I was only able to attend the last two days in Berlin. In 2012 (Vilnius-Tallinn) I could not attend at all. This year I was honoured and pleased to be able to contribute to the organisation of and to actively participate in the first two days of the tour, in the area where I live (Haarlem) and work (Amsterdam).
    I really like the Cycling for Libraries concept and the people involved, and I will tell you why, because it is not so obvious in my case. You may know that I am rather critical of libraries and their slowness in adapting to rapidly changing circumstances. And also that I am more involved with academic and research libraries than with public libraries. Moreover I have become a bit “conference tired” lately. There are so many library and related conferences where there is a lot of talk which doesn’t lead to any practical consequences.

    The things I like about Cycling for Libraries are: the cycling, the passion, the open-mindedness, the camaraderie, the networking, the un-organisedness, the flexibility, the determination and the art of achieving goals and more than those.

    I like cycling trips very much. I have done a few long and far away ones in my time, and I know that this is the best way to visit places you would never see otherwise. While cycling around you have unexpected meetings and experiences, you get a clear mind, and in the end there is the overwhelming satisfaction of having overcome all obstacles and having reached your goal. It is fun, although sometimes you wonder why on earth you thought you were up to this.

    The organisers and participants of Cycling for Libraries are all passionate about and proud of their library and information profession, without being defensive and introverted. As you can see from their “homework assignments” they’re all working on innovative ways, on different levels and with varying scopes, to make sure the information profession stays relevant in the ever changing society where traditional libraries are more and more undervalued and threatened. So, the participants really try to make a difference and they’re willing to cycle around Europe in order to attract attention and spread the message.

    Open minds are a necessity if you embark on an adventure that involves hundreds of people from different backgrounds. For collaborating both in decentralised organisation teams and in the core group of 120 people on the move, you need to appreciate and respect everybody’s ideas and contributions. Working towards one big overall goal requires giving and taking. Especially in the group of 120 people on wheels doing the hard work this leads to an intense form of camaraderie. These comrades on wheels are depending on each other to get where they’re going. A refreshing alternative for the everyday practice of competition and struggle between vendors, publishers and libraries.

    Zandvoort Public Library Bar

    Zandvoort Public Library Bar

    The event offers an unparallelled opportunity for networking. With ordinary conferences the official programme offers a lot of interesting information and sometimes discussion, but let’s be honest, the most useful parts are the informal meetings during lunches, coffee breaks and most importantly in the pubs at night. It is then that relevant information is exchanged, new insights are born and valuable connections are made. Cycling for Libraries turns this model completely upside down and inside out. It is one long networking event with some official sessions in between.

    Cycling for Libraries is an un-conference. As I have learned, this specific type on wheels depends on un-organisation and flexibility. It is impossible to organise an event like this following a strict and centralised coordination model where everybody has to agree on everything. For us Dutch people this can be uneasy. Historically we have depended on talking and agreeing on details in order to win the struggle against the water. The cyclists have ridden along the visible physical results of this struggle between Delft and Brugge, the dykes, dams and bridges of the Delta Works. The need to agree led to what became known as the Polder Model. On the other hand we also have a history of decentralised administration. “The Netherlands” is not a plural name for nothing, the country was formed out of a loose coalition of autonomous counties, cities and social groups, united against a common enemy.

    OBA start event

    OBA start event

    Anyway, the organisation of Cycling for Libraries 2013 started in February with a meeting in The Hague with a number of representatives and volunteers from The Netherlands and Belgium (or Flanders I should say), when Jukka Pennanen, Mace Ojala and Tuomas Lipponen visited the area. After that the preparations were carried out by local volunteers and institutions without any official central coordination whatsoever. This worked quite well, with some challenges of course. I myself happened to end up coordinating events in the Amsterdam-Haarlem-Zandvoort area. I soon learned to take things as they came, delegate as much as possible and rely on time and chance. In the end almost everything worked out fine. In Amsterdam we planned the start event at OBA Central Public Library and the kick-off party at the University of Amsterdam Special Collections building. I only learned about the afternoon visit to KIT Royal Tropical Institute a couple of weeks before. I had nothing to do with that, but it turned out be a successful part of day one. Actually the day after KIT staff told the Cycling for Libraries participants that the museum and library faced closing down, a solution was reached and museum and library were saved. Coincidence?

    Haarlem Station Library

    Haarlem Station Library

    The next day, the first actual cycling day between Amsterdam and The Hague, I cycled along for part of the route, from my home town Haarlem to halfway to The Hague. The visit of the Haarlem Station Library started an hour later than planned, and during lunch in Zandvoort on the coast we received word that the local public library were waiting for us to visit them. This was a surprise for me and Gert-Jan van Velzen (who helped plot the route from Amsterdam to The Hague). But we decided to go there anyway, and we were welcomed with free drinks and presents by the friendly librarians in their brand new building. At 4 o’clock we were expected to arrive at Noordwijk Public Library, but we were still in Zandvoort. No problem for Jeanine Deckers (airport librarian and regional librarian) who was waiting in Noordwijk with stroopwafels. Unfortunately I didn’t make it to Noordwijk, because I had to go back home and work the next day. But it was great to experience one day of actual cycling for libraries.

    This loose, distributed and flexible organisation might be seen as an example of resilience, the concept that was introduced by Beate Rusch in her talk about the future of German regional library service centres at the recent ELAG 2013 conference in Ghent. Resilience means something like “the ability of someone or something to return to its original state after being subjected to a severe disturbance”, or simply put “something doesn’t break, but adapts under unexpected serious outside influences”. I completely agree that it would be better if organisations and infrastructures in the library and information profession were more loosely organised and connected. By the way, Beate was also involved in organising the Berlin part of the first Cycling for Libraries.

    One final thing I want to say is that I admire the way in which Cycling for Libraries manages to reach their goals and more by means of this loose, distributed and flexible organisation. Depending on local coordination teams they succeeded in meeting Dutch members of parliament in The Hague and the European Parliament in Brussels to promote their cause. Which is a remarkable result for a small group of crazy Fins.

    Share

  • (Discover AND deliver) OR else

    Posted on January 7th, 2013 Lukas Koster 98 comments

    The future of the academic library as a data services hub

    © KaCey97007

    Is there a future for libraries, or more specifically: is there a future for academic libraries? This has been the topic of lots of articles, blog posts, books and conferences. See for instance Aaron Tay’s recent post about his favourite “future of libraries” articles. But the question needs to be addressed over and over again, because libraries, and particularly academic libraries, continue to persevere in their belief that they will stay relevant in the future. I’m not so sure.

    I will focus here on academic libraries. I work for one, the Library of the University of Amsterdam. Academic libraries in my view are completely different from public libraries in audience, content, funding and mission. As far as I’m concerned, they only have the name in common. For a vision on the future of public libraries, see Ed Summer’s excellent post “The inside out library”. As for research and special libraries, some of what I am about to say will apply to these libraries as well.

    So, is there a future for academic libraries? Personally I think in the near future we will see the end of the academic library as we know it. Let’s start with looking at what are perceived to be the core functions of libraries: discovery and delivery, of books and articles.
    For a complete overview of the current library ecosystem you should read Lorcan Dempsey’s excellent article “Thirteen Ways of Looking at Libraries, Discovery, and the Catalog: Scale, Workflow, Attention”.

     

    Discovery

    Discovery happens elsewhere”. Lorcan Dempsey said this already in 2007 . What this means is that the audience the library aims at, primarily searches for and finds information via other platforms than the library’s website and search interfaces. Several studies (for instance OCLC’s “Perceptions of libraries, 2010“) show that the most popular platforms are general search engines like Google and Wikipedia but also specific databases. And of course, if you’re looking for instant information, you don’t go to the library catalogue, because it only points you to items that you have to read in order to ascertain that they may or may not contain the information you need.

    © bibliovox

    And if you are indeed looking for publications (books, articles, etc.) you could of course search your library’s catalogue and discovery interface. But you can find exactly the same and probably even more results elsewhere: in other libraries’ search interfaces, or aggregators that collect bibliographic metadata from all over the world. Moreover, academic libraries are doing their best to get their local holdings metadata in WorldCat and their journal holdings in Google Scholar. As I said in my EMTACL12 talk: you can’t find all you need with one local discovery tool.
    Also, the traditional way of discovery through browsing the shelves is disappearing rapidly. The physical copies at the University of Amsterdam Library for instance are all stored in a storage facility in a suburb. Apart from some reference works and latest journal issues there is nothing to find in the library buildings. There is no need for a university library building for discovery purposes anymore.

    Utrecht University Library has taken the logical next step: they decided not to acquire a new discovery tool, discontinue their local homegrown article search index and focus on delivery. See the article “Thinking the unthinkable: a library without a catalogue” .

     

    Delivery

    So, if discovery is something that academic libraries should not invest in anymore, is delivery really the only core responsibility left? Let’s have a closer look.
    Delivery in the traditional academic library sense means: giving the customer access to the publications he or she selected, both in print and digital form. In the case of subscription based e-journal articles, delivery consists of taking a subscription and leading the customer to the appropriate provider website to obtain the online article. Taking subscriptions is an administrative and financial activity. For historical reasons the university library has been taking care of this task. Because they handled the print subscriptions, they also started taking care of the digital versions. But actually it’s not the library that holds the subscription, it’s the university. And it really does not require librarian skills to handle subscriptions. This could very well be taken care of by the central university administration. For free and open access journals you don’t even need that.
    The selection and procurement of journal packages from a large number of publishers and content providers is a different issue. Specific expertise is required for this. I will come to that later.
    The task of leading the customer to the appropriate online copy is only a technical procedure, involving setting up link resolvers. Again, no librarian skills needed. This task could be done by some central university agency, maybe even using an external global linking registry.

    Delivery

    As for the delivery of physical print copies, this is obviously nothing more than a logistics workflow, no different from delivery of furniture, tools, food, or any other physical business. The item is ordered, it is fetched from the shelf, sometimes by huge industrial robot installations, put in a van or cart, transported to the desired location and put in the customer’s locker or something similar. Again: no librarian skills whatsoever. Physical delivery only needs a separate internal or external logistics unit.

     

    What else?

    So, if discovery and delivery will cease to be core activities of the central university library organisation, what else is there?

    Selection
    Selection of print and digital material was already mentioned. It is evident that the selection of printed and digital books and journal subscriptions needs to be governed by expert knowledge and decisions in order to provide staff and students with the best possible material, because there is a lot of money involved. Typically this task is carried out by subject specialists (also called subject librarians), not by generalists. These ‘faculty liaisons’ usually have had an education in the disciplines they are responsible for, and they work closely together with their customers (academic staff and students). Many universities have semiautonomous discipline oriented sublibraries. The recent development of Patron Driven Acquisition (PDA) also fits into this construction.
    The actual comparison, selection and procurement of journal packages from a large number of publishers and content providers requires a certain generic specific expertise which is not discipline dependent. This is a task that could well continue to be the responsibility of some central organisational unit, which may or may not be called the university library.

    Cataloguing
    And what about cataloguing, a definite librarian skill? If discovery happens elsewhere, and libraries don’t need to maintain their own local catalogues, then it seems obvious that libraries don’t need to catalogue anything anymore. In fact, in the current situation most libraries don’t catalogue that much already. All the main bibliographical metadata for books (title, author, date, etc.) are already provided by publishers, by external central library service centres, or by other libraries in a shared cataloguing environment. And libraries have never catalogued journal articles anyway, only journals and issues. Article metadata are provided by the publishers or aggregators. Libraries pay for these services.
    It is usual for libraries to add their own subject headings and classification terms to the already existing ones. But as Karen Coyle said at EMTACL12: “Library classification is a knowledge prevention system“, because it offers only one specific object oriented view on the information world. So maybe libraries should stop doing this, which would be in line with the “discovery happens elsewhere” argument anyway.
    What remains of cataloguing is adding local holdings, items and subscription information. This is very useful information for library customers, but again this doesn’t seem to require very detailed librarian skills. As a matter of fact most of these metadata are already provided in the selection and acquisition process by acquisition staff and vendors.
    The recent Library of Congress BIBFRAME initiative developments in theory make it possible to replace all local cataloguing efforts by linking local holdings information to global metadata.
    There is still one area that may require the full local cataloguing range: the university’s own scientific output, as long as it is not published in journals or as books. The fulltext material is made available through institutional repositories, which obviously requires metadata to make the publications findable. However, the majority of the institutional publications are made available through other channels as well, as mentioned, so the need for local cataloguing in these cases is absent.

    Reading rooms
    More and more students are coming to the library buildings every day, that’s what you hear all the time. Large amounts of money are spent on creating new study centres and meeting places in existing library buildings, even on new buildings. But that’s exactly the point: students don’t come to the library for discovery anymore, because the building no longer provides that. They come for places to study, use network pc’s or the university wifi, meet with fellow students, pick up their print items on loan, or view not-for-loan material. The physical locations are nothing more or less than study centres. There’s absolutely nothing wrong with that, they are very important, but they do not have to be associated with the university library, but can be provided by the university, on any location.

    Reference desk

    © Ohio University Libraries

    The reference desk, or its online counterpart, is a weird phenomenon. It seems to emphasise the fact that if you want instant information, books are of no use. On the other hand, it suggests that you should come to the library if you need specific information right now. In my view, although the reference desk partly embodies the actual original objective of a library, namely giving access to information, this could function very well outside the library context.
    The reference desk service is also somewhat ambiguous. In some cases subject specialist expertise is needed, other cases require a more general knowledge of how to search and find information.

    Usage statistics
    Statistics of the use of library holdings, both print and electronic, are an important source of information for making decisions on acquisitions and subscriptions. These statistics are provided by local and remote delivery systems and vendors. Usage statistics can also be used for other purposes, like identifying certain trends in scholarly processes, mapping of information sources to specific user groups, etc. Administering and providing statistics once again is not a librarian task, but can be done by internal or external service providers.

    Special collections
    Special Collections are a Special Case. Most university libraries have a Special Collections division, for historical reasons. But of course Special Collections divisions are nothing less than a Museum and Archive division with specific skills, expertise and procedures. Most of the time they are autonomous units within the university anyway.

     

    New services?

    Now, if the traditional library tasks of selection, cataloguing, discovery and delivery will increasingly be carried out by non-librarian staff and units inside and outside the university, is there still a valid reason for maintaining an autonomous central university library organisation? Should academic libraries shift focus? There are a number of possible new services and responsibilities for the library that are being discussed or already being implemented.

    © Joshua Kaufman

    © Joshua Kaufman

    Content curation
    Content curation can be seen as the task of bringing together information on a specific subject, of all kinds, from different sources on the web to be consumed by people in an easy way. This is something that can be done and is already done by all kinds of organisations and people. Libraries, academic, public and other types, can and should play a bigger role in this area. This involves looking at other units and sources of information than just the traditional library ones: books and journals. This new service type evidently is closely related to the traditional reference desk service.
    Obviously this can best be taken care of by subject specialists. To do this, they need tools and infrastructure. These tools and infrastructure are of a generic nature and can be provided by technical specialists inside or outside the libraries or universities.
    Techniques are often referred to as “mashups” or “linked data”, depending on the background of the people involved.

    Linked data
    Linked data deserves its own section here, because it has been an ever widening movement since a number of years. It finally reached the library world the last couple of years with developments like the W3C Library Linked Data Incubator Group, the Library of Congress BIBFRAME initiative and the IFLA Semantic Web Special Interest Group. Linked data is a special type of data source mashup infrastructure. It requires the use of URIs for all separately usable data entities, and triples as the format for the actual linking (subject-predicate-object), mostly using the RDF structure.
    There are two sides to linked data: the publishing of data in RDF and consequently the consumption of data elsewhere. A special case is the linked data based infrastructure, combining both publication and consumption in a specific way, as is the objective of the above mentioned BIBFRAME project.
    Again, we need both subject specialists and generic technology experts to make this work in libraries, both academic and public ones.

    Research support
    University libraries are more and more expected to increase the level of support for researchers. It’s not only about providing access to scholarly publications anymore, but also about maintaining research information systems, virtual research environments, and long term preservation, availability and reusability of research data sets.
    Again, here we see the need for discipline specific support because the needs of researchers for communication, collaboration and data varies greatly per discipline. And again, for the technical and organisational infrastructure we need internal or external generic technology experts and services. Apart from metadata expertise there are no traditional librarian skills required.

    Publishing
    The Final Frontier: the library turning 180 degrees and switching from consumption to production of publications. According to some people university libraries are very suitable and qualified to become scholarly publishers (see for instance Björn Brembs‘ “Libraries Are Better Than Corporate Publishers Because…”). I am not sure that this is actually the case. Publishing as it currently exists requires a number of specific skills that have nothing to do with librarian expertise. A number of universities already have dedicated university press publishing agencies. But of course the publishing process can and probably will change. There is the open access movement, there is the rebellion against large scientific publishers, and last but not least, there is the slow rise of nanopublications, which could revolutionise the form that scholarly publishing will take. In the future publishing can originate at the source, making use of all kinds of new technologies of linking different types of data into new forms of non-static publications. Universities or university libraries could play a role here. Again we see here the need for both subject specialists and generic technology.

     

    Special and general

    So what is the overall picture? Of the current academic library tasks, only a few may still be around in the university in the future: selection, acquisition, cataloguing (if any), reference desk, usage statistics, and only a small part actually requires traditional librarian skills. Together with the new service areas of content curation, linked data, research support and publishing, this is rather an odd collection of very different fields of expertise. There does not seem to be a nice matching set of tasks for one central university division, let alone a library.

    But what all these areas have in common is that they depend on linking and coordination of data from different sources.

    And another interesting conclusion is that virtually all of these areas have two distinct components:

    • Discipline or subject specific expertise
    • Generic technical and organisational data infrastructure

    I see a new duality in the realm of information management in universities. Selection, content curation, reference desk, linking data, cataloguing and research support will all be the domain of subject specialists directly connected to departments responsible for teaching and research in specific disciplines. These discipline related services will depend on generic technological and organisational infrastructures, available inside and outside the university, maintained by generic technical specialists.
    These generic infrastructures could function completely separately, or they could somehow be interlinked and coordinated by some central university organisational unit. This would make sense, because there is a lot of overlap in information between these areas. Some kind of central data coordination unit would make it possible to provide a lot more useful data services than can be imagined now. Also, usage statistics, acquisition and the potential new publishing framework, yes even the special collections, could benefit from a central data services unit.

    © HawkinsThiel

    © HawkinsThiel

    Such a unit would be different from the existing university ICT department. The latter mainly provides generic hardware, network, storage and security, and is focused on the internal infrastructure, trying to keep out as much external traffic as possible.
    The new unit would be targeted at providing data services, possibly built on top of the internal technical infrastructure, but mainly using existing external ones. And it is obvious that there is added value in cooperation with similar bodies outside the university.

    “Data services” then stands for providing storage, use, reuse, creation and linking of internal and external metadata and datasets by means of system administration, tools selection and implementation, and explicitly also programming when needed.
    Such a unit would up to a point resemble current library service providers like the German regional library consortia and service centres such as hbz, KOBV or GBV, or high level organisations like the Dutch National Library Catalogue project.

    Paraphrasing the conclusion of my own SWIB12 talk: it is time to stop thinking publications and start thinking data. This way the academic library could transform itself into a new central data services hub.

    (Subject expertise AND data infrastructure) OR else!

    Share

  • Mainframe to mobile

    Posted on February 16th, 2010 Lukas Koster 11 comments

    The connection between information technology and library information systems

    This is the first post in a series of three

    [1. Mainframe to mobile - 2. Mobile app or mobile web? - 3. Mobile library services]

    The functions, services and audience of library information systems, as is the case with all information systems, have always been dependent on and determined by the existing level of information technology. Mobile devices are the latest step in this development.

    © sainz

    In the beginning there was a computer, a mainframe. The only way to communicate with it was to feed it punchcards with holes that represented characters.

    © Mirandala

    If you made a typo (puncho?), you were not informed until a day later when you collected the printout, and you could start again. System and data files could be stored externally on large tape reels or small tape cassettes, identical to music tapes. Tapes were also used for sharing and copying data between systems by means of physical transportation.

    © ajmexico

    Suddenly there was a human operable terminal, consisting of a monitor and keyboard, connected to the central computer. Now you could type in your code and save it as a file on the remote server (no local processing or storage at all). If you were lucky you had a full screen editor, if not there was the line editor. No graphics. Output and errors were shown on screen almost immediately, depending on the capacity of the CPU (central processing unit) and the number of other batch jobs in the queue. The computer was a multi-user time sharing device, a bit like the “cloud”, but every computer was a little cloud of its own.
    There was no email. There were no end users other than systems administrators, programmers and some staff. Communication with customers was carried out by sending them printouts on paper by snail mail.

    I guess this was the first time that some libraries, probably mainly in academic and scientific institutions, started creating digital catalogs, for staff use only of course.

    © n.kahlua72

    © RaeA

    Then came the PC (Personal Computer). Terminal and keyboard were now connected to the computer (or system unit) on your desk. You had the thing entirely to yourself! Input and output consisted of lines of text only, one colour (green or white on black), and still no graphics. Files could be stored on floppy disks, 5¼-inch magnetic things that you could twist and bend, but if you did that you lost your data. There was no internal storage. File sharing was accomplished by moving the floppy from one PC to another and/or copy files from one floppy to another (on the same floppy drive).

    © suburbanslice

    Later we got smaller disks, 3½-inch, in protective cases. The PC was mainly used for early word processing (WordStar, WordPerfect) and games. Finally there was a hard disk (as opposed to “floppy” disk) inside the PC system unit, which held the operating system (mainly MS-DOS), and on which you could store your files, which became larger. Time for stand-alone database applications (dBase).

    Client server GUI

    Then there was Windows, a mouse, and graphics. And of course the Internet! You could connect your PC to the Internet with a modem that occupied your telephone line and made phone calls impossible during your online session. At first there was Gopher, a kind of text based web.
    Then came the World Wide Web (web 0.0), consisting of static web pages with links to other static web pages that you could read on your PC. Not suitable for interactive systems. Libraries could publish addresses and opening hours.
    But fortunately we got client-server architecture, combining the best of both worlds. Powerful servers were good at processing, storing and sharing data. PC’s were good at presenting and collecting data in a “user friendly” graphical user interface (GUI), making use of local programming and scripting languages. So you had to install an application on the local PC which then connected to the remote server database engine. The only bad thing was that the application was tied to the specific PC, with local Windows configuration settings. And it was not possible to move the thing around.

    Now we had multi-user digital catalogs with a shared central database and remote access points with the client application installed, available to staff and customers.

    Luckily dynamic creation of HTML pages came along, so we were able to move the client part of client-server applications to the web as well. With web applications we were able to use the same applications anywhere on any computer linked to the world wide web. You only needed a browser to display the server side pages on the local PC.

    Now everybody could browse through the library catalog any time, anywhere (where there was a computer with an internet connection and a web browser). The library OPAC (Online Public Access Catalog) was born.

    Web OPAC

    The only disadvantage was that every page change had to be generated by the server again, so performance was not optimal.
    But that changed with browser based scripting technology like JavaScript, AJAX, Flash, etc. Application bits are sent to the local browser on the PC at runtime, to be executed there. So actually this is client server “on the fly”, without the need to install a specific application locally.

    © nxtiak

    In the meantime the portable PC appeared, system unit, monitor and keyboard all in one. At first you needed some physical power to move the thing around, but later we got laptops, notebooks, netbooks, getting smaller, lighter and more powerful all the time. And wifi of course, no need to plug the device in to the physical network anymore. And USB-sticks.

    Access to OPAC and online databases became available anytime, anywhere (where you carried your computer).

    The latest development of course is the rise of mobile phones with wireless web access, or rather mobile web devices which can also be used for making phone calls. Mobile devices are small and light enough to carry with you in your pocket all the time. It’s a tiny PC.

    Finally you can access library related information literally any time, anywhere, even in your bedroom and bathroom.

    Mobile library app

    It’s getting boring, but yes, there is a drawback. Web applications are not really accommodated for use in mobile browsers: pages are too large, browser technology is not really compatible, connections are too slow.

    Available options are:

    • creating a special “dumbed down” version of a website for use on mobile devices only: smaller text based pages with links
    • creating a new HTML5/CSS3 website, targeted at mobile devices and “traditional” PC’s alike
    • creating “apps”, to be installed on mobile devices and connect to a database system in the cloud; basically this is the old client-server model all over again.

    A comparison of mobile apps and mobile web architecture is the topic of another post.

    Share

  • Who needs MARC?

    Posted on May 15th, 2009 Lukas Koster 22 comments

    Why use a non-normalised metadata exchange format for suboptimal data storage?

    Catalog card

    © leah the librarian

    This week I had a nice chat with André Keyzer of Groningen University library and Peter van Boheemen of Wageningen University Library who attended OCLC’s Amsterdam Mashathon 2009. As can be expected from library technology geeks, we got talking about bibliographic metadata formats, very exciting of course. The question came up: what on earth could be the reason for storing bibliographic metadata in exchange formats like MARC?

    Being asked once at an ELAG conference about the bibliographic format Wageningen University was using in their home grown catalog system, Peter answered: “WDC” ….”we don’t care“.

    Exactly my idea! As a matter of fact I think I may have used the same words a couple of times in recent years, probably even at ELAG2008. The thing is: it really does not matter how you store bibliographic metadata in your database, as long as you can present and exchange the data in any format requested, be it MARC or Dublin Core or anything else.

    Of course the importance of using internationally accepted standards is beyond doubt, but there clearly exists widespread misunderstanding of the functions of certain standards, like for instance MARC. MARC is NOT a data storage format. In my opinion MARC is not even an exchange format, but merely a presentation format.

    St. Marc Express

    St. Marc Express

    With a background and experience in data modeling, database and systems design (among others), I was quite amazed about bibliographic metadata formats when I started working with library systems in libraries, not having a librarian training at all. Of course, MARC (“MAchine Readable Cataloging record“) was invented as a standard in order to facilitate exchange of library catalog records in a digital era.
    But I think MARC was invented by old school cataloguers who did not have a clue about data normalisation at all. A MARC record, especially if it corresponds to an official set of cataloging rules like AARC2, is nothing more than a digitised printed catalog card.

    In pre-computer times it made perfect sense to have a standardised uniform way of registering bibliographic metadata on a printed card in this way. The catalog card was simultaneously used as a medium for presenting AND storing metadata. This is where the confusion originates from!

    MARC record

    MARC record

    But when the Library of Congress saysIf a library were to develop a “home-grown” system that did not use MARC records, it would not be taking advantage of an industry-wide standard whose primary purpose is to foster communication of information” it is saying just plain nonsense.
    Actually it is better NOT to use something like MARC for other purposes than exchanging, or better, presenting data. To illustrate this I will give two examples of MARC tags that have been annoying me since my first day as a library employee:

    100 – Main Entry-Personal Name
    Besides storing an author’s name as a string in each individual bibliographic record instead of using a code, linking to a central authority table (“foreign key” in relational database terms), it is also a mistake to use a person’s name as one complete string in one field. Examples on the Library of Congress MARC website use forms like “Adams, Henry”, “Fowler, T. M.” and “Blackbeard, Author of”. To take only the simple first example, this author could also be registered as “Henry Adams”, “Adams, H.”, “H. Adams”. And don’t say that these forms are not according to the rules! They are out there! There is no way to match these variations as being actually one and the same.
    In a normalised relational database, this subfield $a would be stored something like this (simplified!):

    • Person
      • Surname=Adams
      • First name=Henry
      • Prefix=

    773 – Host Item Entry
    Subfield $g of this MARC tag is used for storing citation information for a journal article, volume, issue, year, start page, end page, all in one string, like: “Vol. 2, no. 2 (Feb. 1976), p. 195-230“. Again I have seen this used in many different ways. In a normalised format this would look something like this, using only the actual values:

    • Journal
      • Volume=2
      • Issue=2
      • Year=1976
      • Month=2
      • Day=
      • Start page=195
      • End page=230

    In a presentation of this normalised data record extra text can be added like “Vol.” or “Volume“, “Issue” or “No.“, brackets, replacing codes by descriptions (Month 2 = Feb.)  etc., according to the format required. So the stored values could be used to generate the text “Vol. 2, no. 2 (Feb. 1976), p. 195-230” on the fly, but also for instance “Volume 2, Issue 2, dated February 1976, pages 195-230“.

    The strange thing with this bibliographic format aimed at exchanging metadata is that it actually makes metadata exchange terribly complicated, especially with these two tags Author and Host Item. I can illustrate this with describing the way this exchange is handled between two digital library tools we use at the Library of the University of Amsterdam, MetaLib and SFX , both from the same vendor, Ex Libris.

    The metasearch tool MetaLib is using the described and preferred mechanism of on the fly conversion of received external metadata from any format to MARC for the purpose of presentation.
    But if we want to use the retrieved record to link to for instance a full text article using the SFX link resolver, the generated MARC data is used as a source and the non-normalised data in the 100 and 773 MARC tags has to be converted to the OpenURL format, which is actually normalised (example in simple OpenUrl 0.1):

    isbn=;issn=0927-3255;date=1976;
    volume=2;issue=2;spage=195;epage=230;
    aulast=Adams;aufirst=Henry;auinit=;

    In order to do this all kinds of regular expressions and scripting functions are needed to extract the correct values from the MARC author and citation strings. Wouldn’t it be convenient, if the record in MetaLib would already have been in OpenURL or any other normalised format?

    The point I am trying to make is of course that it does not matter how metadata is stored, as long as it is possible to get the data out of the database in any format appropriate for the occasion. The SRU/SRW protocol is particularly aimed at precisely this: getting data out of a database in the required format, like MARC, Dublin Core, or anything else. An SRU server is a piece of middleware that receives requests, gets the requested data, converts the data and then returns the data in the requested format.

    Currently at the Library of the University of Amsterdam we are migrating our ILS which also involves converting our data from one bibliographic metadata format (PICA+) to another (MARC). This is extremely complicated, especially because of the non-normalised structure of both formats. And I must say that in my opinion PICA+ is even the better one.
    Also all German and Austrian libraries are meant to migrate from the MAB format to MARC, which also seems to be a move away from a superior format.
    All because of the need to adhere to international standards, but with the wrong solution.

    Maybe the projected new standard for resource description and access RDA will be the solution, but that may take a while yet.

    Share

  • ReTweet @Reply – Twitter communities

    Posted on April 27th, 2009 Lukas Koster 1 comment

    twitterelag

    In my post “Tweeting Libraries” among other things I described my Personal Twitter experience as opposed to Institutional Twitter use. Since then I have discovered some new developments in my own Twitter behaviour and some trends in Twitter at large: individual versus social.

    There have been some discussions on the web about the pros and cons and the benefits and dangers of social networking tools like Twitter, focusing on “noise” (uninteresting trivial announcements) versus “signal” (meaningful content), but also on the risk of web 2.0 being about digital feudalism, and being a possible vehicle for fascism (as argumented by Andrew Keen).

    My kids say: “Twitter is for old people who think they’re cool“. According to them it’s nothing more than : “Just woke up; SEND”, “Having breakfast; SEND”; “Drinking coffee; SEND”; “Writing tweet; SEND”. For them Twitter is only about broadcasting trivialities, narcissistic exhibitionism, “noise”.
    For their own web communications they use chat (MSN/Messenger), SMS (mobile phone text messages), communities (Hyves, the Dutch counterpart of MySpace) and email. Basically I think young kids communicate online only within their groups of friends, with people they know.

    Just to get an overview: a tweet, or Twitter message, can basically be of three different types:

    • just plain messages, announcements
    • replies: reactions to tweets from others, characterised by the “@<twittername>” string
    • retweets: forwarding tweets from others, characterised by the letters “RT

    Although a lot of people use Twitter in the “exhibitionist” way, I don’t do that myself at all. If I look at my Twitter behaviour of the past weeks, I almost only see “retweets” and “replies”.

    Both “replies” and “retweets” obviously were not features of the original Twitter concept, they came into being because Twitter users needed conversation.
    A reply is becoming more and more a replacement for short emails or mobile phone text messages, at least for me. These Twitter replies are not “monologues”, but “dialogues”. If you don’t want everybody to read these, you can use a “Direct message” or “DM“.
    Retweets are used to forward interesting messages to the people who are following you, your “community” so to speak. No monologue, no dialogue, but sharing information with specific groups.
    The “@<twittername>” mechanism is also used to refer to another Twitter user in a tweet. In official Twitter terminology “replies” have been replaced by “mentions“.

    Retweets and replies are the building blocks of Twitter communities. My primary community consists of people and organisations related to libraries. Just a small number of these people I actually know in person. Most of them I have never met. The advantage of Twitter here is obvious: I get to know more people who are active in my professional area, I stay informed and up to date, I can discuss topics. This is all about “signal”. If issues are too big for twitter (more than 140 characters) we can use our blogs.
    But it’s not only retweets and replies that make Twitter communities work. Trivialities (“noise”) are equally important. They make you get to know people and in this way help create relationships built on trust.

    Another compelling example of a very positive social use of Twitter I experienced last week, when there were a number of very interesting Library 2.0 conferences, none of which I could attend in person because of our ILS project:

    All of these conferences were covered on Twitter by attendees using the hashtags #elag09, #csnr09 and #ugul09 . This phenomenon makes it possible for non-participants to follow all events and discussions at these conferences and even join in the discussions. Twitter at its best!

    Twitter is just a tool, a means to communicate in many different ways. It can be used for good and for bad, and of course what is “good” and what is “bad” is up to the individual to decide.

    Share

  • Replacing our ILS, business as usual

    Posted on April 24th, 2009 Lukas Koster 2 comments
    catalog

    © Peter Morville

    As you may have noticed from some of my tweets, the Library of the Unversity of Amsterdam, my place of work, is in the process of replacing its ILS (Integrated Library System). All in all this project, or better these two projects (one selecting a new ILS, the other one implementing it) will have taken 18 months or more from the decision to go ahead until STP (Switch to Production), planned for August 15 this year. My colleague Bert Zeeman blogged about this (in Dutch) recently.

    One thing that has become absolutely clear to me is that replacing an ILS is not just about replacing one information system by another. It is about replacing an entire organisational structure of work processes, with its huge impact on all people involved. And in our case it affects two organisations: besides the Library of the University of Amsterdam also the Media Library of the Hogeschool van Amsterdam. We have been managing library systems for both organisations in a mini consortial structure since a couple of years. So the Media Library is facing a second ILS replacement within two years.

    While the decision was made because of pressing technical reasons, also with an eye on preparing for future library 2.0 developments, it turned out to be of substantial consequence for the organisation.
    This is the first time that I am participating in such a radical library system project. I have done a couple of projects implementing and upgrading metasearching and OpenURL link resolver tools in the last six years, but these are nothing compared to the current project. With these “add-on” tools, that started as a means of extending the library’s primary stream of information, only a relatively limited number of people were involved. But with an ILS you are talking about the core business of a library (still!) and about day to day working life of everybody involved in acquisitions, cataloguing, circulation as well as system administrators and system librarians.

    To make it even more complicated, the University Library is also switching from the old system’s proprietary bibliographic format to MARC21, because that is what the new system is using. Personally I think that the old system’s format is better (just like our German colleagues think about their move from MAB to MARC), but of course the advantages of using an internationally accepted and used standard outweighs this, as always. Maybe food for another blog post later…

    Last but not least, the Library is simultaneously doing a project for the implementation of RFID for self check machines. The initial idea was to implement RFID in the old system and then just migrate everything to the new one. However, for various reasons, recently it was decided to postpone RFID implementation to shortly after our ILS STP. Some initial tests have shown that this probably will work.

    And while all this is going on, all normal work needs to be taken care of too: ” business as usual” .

    Now, looking at workflows: the way that our individual departments have organised their workflows, is partly dictated by the way the old system is designed. The new system obviously dictates workflows too, but in other areas. Although this new system is very flexible and highly configurable, there are still some local requirements that cannot be met by the new system.
    Of course this is NOT the way it should be! Systems should enable us to do what we want and how we want it! Hopefully new developments like Ex Libris’ URM and the very recently announced new OCLS WorldCat Web based ILS will take care of users better.

    Talking about “very flexible and highly configurable”: although a very big advantage, this also makes it much more complicated and time consuming to implement the new system. Fortunately there are a lot of other libraries in The Netherlands and around the world using the new system that are willing to help us in every possible way. And this is highly appreciated!

    Other isues that make this project complicated:

    • unexpected issues, bottlenecks: these keep on coming
    • migration of data from old system: conversion of old to new format
    • implementing links with external systems like student’s and staff database, financial system, national union catalogue

    I think we will make STP on the planned date, but I also think we need to postpone a number of issues until after that. There will still be a lot of work to be done for my department after the project has finished.

    To end with a positive note: the new OPAC wil be much nicer and more flexible than the old one. And in the end that is what we are doing this for: our patrons.

    Share