Data. The final frontier.
RSS icon Home icon
  • A day between the stacks

    Posted on August 9th, 2013 Lukas Koster 2 comments

    Connecting real books, metadata and people

    I spent one day in the stacks of the off-site central storage facility of the Library of the University of Amsterdam as one of the volunteers helping the library perform a huge stock control operation which will take years. Goal of this project is to get a complete overview of the discrepancy between what’s on the shelves and what is in the catalogue. We’re talking about 65 kilometer of shelves, approximately 2.5 million items, in this central storage facility alone.

    To be honest, I volunteered mainly for personal reasons. I wanted to see what is going on behind the scenes with all these physical objects that my department is providing logistics, discovery and circulation systems for. Is it really true that cataloguing the height of a book (MARC 300$c) is only used for determining which shelf to put it on?

    The practical details: I received a book cart with a pencil, a marker, a stack of orange sheets of paper with the text “Book absent on account of stock control” and a printed list of 1000 items from the catalogue that should be located in one stack. I stood on my feet between 9 AM and 4 PM in a space of around 2-3 meter in one aisle between two stacks, with one hour in total of coffee and lunch breaks, in a huge building in the late 20th century suburbs of Amsterdam without mobile phone and internet coverage. I must say, I don’t envy the people working there. I’m happy to sit behind my desk and pc in my own office in the city centre.

    Most of my books were indeed in a specific size range, 18-22 cm. approximately, with a couple of shorter ones. I found approximately 25-30 books on the shelves that were not on my list, and therefore not in the catalogue. I put these on my cart, replacing them with one of the orange sheets on which I pencilled the shelfmark of the book. There were approximately 5-10 books on my list missing from the shelves, which I marked on the list. One book had a shelfmark on the spine that was identical to the one of the book next to it. Inside was a different code, which seemed to be the correct one (it was on the list, 10 places down). I put 10 books on the cart of which I thought the title on the list didn’t match the title on the book correctly, but this is a tricky thing, as I will explain.

    Title metadata
    The title printed on the list was the “main title”, or MARC 245$a. It is very interesting to see how many differences there are between the ways that main and subtitles have been catalogued by different people through the ages. For instance, I had two editions on my list (1976 and 1980) of a German textbook on psychiatry, with almost identical titles and subtitles (title descriptions taken from the catalogue):

    Psychiatrie, Psychosomatik, Psychotherapie : Einführung in die Praktika nach der neuen Approbationsordnung für Ärzte mit 193 Prüfungsfragen : Vorbereitungstexte zur klinischen und Sozialpsychiatrie, psychiatrischen Epidemiologie, Psychosomatik,Psychotherapie und Gruppenarbeit

    Psychiatrie : Psychosomatik, Psychotherapie ; Einf. in d. Praktika nach d. neuen Approbationsordnung für Ärzte mit Schlüssel zum Gegenstandskatalog u.e. Sammlung von Fragen u. Antworten für systemat. lernende Leser ; Vorbereitungstexte zur klin. u. Sozialpsychiatrie, psychiatr. Epidemiologie, Psychosomatik, Psychotherapie u. Gruppenarbeit

    The first book, from 1976 (which actually has ‘196’ instead of ‘193’ on the cover), is on the list and in the catalogue with the main title (MARC 245$a) “Psychiatrie, Psychosomatik, Psychotherapie :”.
    The second book, from 1980, is on the list with the main title “Psychiatrie :”.
    Evidently it is not clear without a doubt what is to be catalogued as main title and subtitle just by looking at the cover and/or title page.

    I have seen a lot of these cases in my batch of 1000 books in which it is questionable what constitutes the main and subtitle. Sometimes the main title consists of only the initial part, sometimes it consists of what looks like main and subtitle taken together. At first I put all parts of a serial on my cart because in my view the printed titles were incorrect. They only contained the title of the specific part of the serial, whereas in my non-librarian view the title should consist of the serial title + part title. On the other hand I also found serials for which only the serial title was entered as main title (5 items “Gesammelte Werke”, which means “Collected Works” in German). No consistency at all.
    What became clear to me is that in a lot of cases it is impossible to identify a book by the catalogued main title alone.

    Another example of problematic interpretation I came across: a Spanish book, with main title “Teatro de Jacinto Benavente” on my list, and on the cover the author name “Jacinto Benavente” and title “Teatro: Rosas de Otoño – Al Natural – Los Intereses Creados”. On the title page: “Teatro de Jacinto Benavente”.














    In the catalogue there are two other books with plays by the same author, just titled “Teatro”. All three have as author “Jacinto Benavente”. All three are books containing a number of theatre plays by the author Jacinto Benavente. There were a lot of similar books with as recorded main title ‘Theatre‘ in a number of languages.

    A lot of older books on my shelves (pre 20th century mainly, but also more recent ones) have different titles and subtitles on their spine, front and title page. Different variations depending on the available print space I guess. It’s hard to determine what the actual title and subtitles are. The title page is obviously the main source, but even then it looks difficult to me. Now I understand cataloguers a little better.

    Works on the shelves
    So much for the metadata. What about the actual works? There were all kinds of different types mixed with each other, mostly in batches apparently from the same collection. In my 1000 items there were theatre related books, both theoretical works and texts of plays, Russian and Bulgarian books of a communist/marxist/leninist nature, Arab language books of which I could not check the titles, some Swedish books, a large number of 19th century German language tourist guides for Italian regions and cities, medical, psychological and physics textbooks, old art history works, and a whole bunch of social science textbooks from the eighties of which we have at least half at our home (my wife and I both studied at the University of Amsterdam during that period ). I can honestly say that most of the textbooks in my section of the stacks are out of date and will never be used for teaching again. The rest was at least 100 years old. Most of these old books should be considered as cultural heritage and part of the Special Collections. I am not entirely sure that a university library should keep most of these works in the stacks.

    Apart from this neutral economical perspective, there were also a number of very interesting discoveries from a book lover’s viewpoint, of which I will describe a few.

    A small book about Japanese colour prints containing one very nice Hokusai print of Mount Fuji.


    A handwritten, and therefore unique, item with the title (in Dutch) a “Bibliography of works about Michel Angelo, compiled by Mr. Jelle Hingst”, containing handwritten catalogue type cards with one entry each.


    A case with one shelfmark containing two items: a printed description and a what looks like a facsimile of an old illustrated manuscript.


    An Italian book with illustrations of ornaments from the cathedral of Siena, tied together with two cords.


    And my greatest discovery: an English catalogue of an exhibition at the Royal Academy from 1908: “Exhibition of works by the old masters and by deceased masters of the British school including a collection of water colours” (yes, this is one big main title).

    But the book itself is not the discovery. It’s what is hidden inside the book. A handwritten folded sheet of paper, with letterhead “Hewell Grange. Bromsgrove.” (which is a 19th century country house, seat of the Earls of Plymouth, now a prison), dated Nov. 23. 192. Yes, there seems to be a digit missing there. Or is it “/92”? Which would not be logical in an exhibition catalogue from 1908. It definitely looks like a fountain pen was used. It also has some kind of diagonal stamp in the upper left corner “TELEGRAPH OFFICE FINSTALL”. Finstall is a village 3 km from Hewell Grange.

    The paper also has a pencil sketch of a group of people, probably a copy of a painting. At first I thought it was a letter, but looking more closely it seems to be a personal impression and description of a painting. There are similar handwritings on the pages of the book itself.
    I left the handwritten note where I found it. It’s still there. You can request the book for consultation and see for yourself.


    End users, patrons, customers or whatever you want to call them, can’t find books that the library owns if they are not catalogued. They can find bibliographic descriptions of the books elsewhere, but not the information needed to get a copy at their own institution. This confirms the assertion that holdings information is very important, especially in a library linked open data environment.

    The majority of books in an academic library are never requested, consulted or borrowed. Most outdated textbooks can be removed without any problem.

    There are a lot of cultural heritage treasures hidden in the stacks that should be made accessible to the general public and humanities researchers in a more convenient way.

    In the absence of open stacks and full text search for printed books and journals it is crucial that the content of books, and articles too, is described in a concise, yet complete way. Not only formal cataloguing rules and classification schemes should be used, but definitely also expert summaries and end user generated tags.

    Even with cataloguing rules it can be very hard for cataloguers to decide what the actual titles, subtitles and authors of a book are. The best source for correct title metadata are obviously the authors, editors and publishers themselves.

    Book storage staff can’t find requested books with incorrect shelfmarks on the spine.

    Storing, locating, fetching and transporting books does not require librarian skills.

    All in all, a very informative experience. 

  • Old library, new library

    Posted on December 29th, 2009 Lukas Koster 5 comments

    Teylers museum library © Dymphie

    On Sunday December 27, 2009 I was in the opportunity to visit the, otherwise closed, library of The Netherlands’ oldest museum Teylers museum in my home town Haarlem, together with a small group of Dutch library twitter people. We were very kindly shown around by librarian Marijn van Hoorn, who explained to us the library’s history and collection.

    Now I’m not going to say something about the pleasant real life consequences of getting to know people in the virtual world (that has been done by @PeterMEvers already, in Dutch), or about the guided tour (already described very well by @underdutchskies in English and by @festinaatje and @ecobibl in Dutch). Also, none of the photos I made with my G1 phone are presentable; but you can have a look at the photos made by @Dymphie, @underdutchskies and @wbk500).

    Instead, I will try to make a comparison between the old library’s course of life and the developments that modern libraries are going through, because I see some parallels there.

    The museum was built in 1784 with money from the legacy of the wealthy banker and merchant Pieter Teyler van der Hulst, to preserve his collections and advance the arts and sciences. The museum’s library was established in 1826 to house a separate collection of books and journals in the field of natural history (botany, zoology, paleontology and geology).

    One of the objectives for the library was to have a complete collection of all journals in the area of natural history. In the beginning the library was only accessible by invitation, and the honoured guests were welcomed and assisted by the “caretaker” or “landlord” of the museum.
    By the middle of the 19th century the library opened up to a more general public, that is to say teaching and research staff members of the emerging universities.
    But from 1870 the importance of the Teylers library for university staff declined drastically, because the universities in The Netherlands started to organise academic libraries of their own. So the library closed its doors for regular visitors. The collection continued to be maintained and expanded until 1987, when it was no longer realistic to pursue completeness.
    During the 1970’s the privately funded museum and library faced the threat of closing down because of the cost of preserving the historical buildings and collections. In the written library catalog (created over time by a large number of volunteers and employees) all items were annotated with an estimated value in case of forced sale of the collection.

    Teylers library catalog © Dymphie

    Fortunately the Dutch state decided to subsidise the historically and culturally valuable museum, and now Teylers is a very popular place, with a new wing with a large hall for temporary exhibitions, an educational section and a cafe.
    The museum library is only open for visitors on request and on special occasions. The collection is not expanded anymore, but it is a very complete and valuable historical natural history collection, which is, among other things, used to organise temporary thematic exhibitions in the museum. Besides the natural history items there are also old maps and atlases and travel journals, like the James Cook journals by Sydney Parkinson that @jaapvandegeer drew my attention to.
    The museum and the library are also looking to the future. Both museum objects and library items are being digitised, there is a European project for creating a website on ornithology that uses the library’s birds images, there is a new thematic website that combines documents, images, metadata from the museum, the library and external sources, the library catalog has been migrated to an Adlib system, and there is a Ning social network.

    So, what are the parallels with modern libraries? First of all, it is clear that the influence of external developments on libraries is not something that is limited to the modern digital web age of Google. Just like Teylers library, modern public, academic and special libraries were at first targeted at a limited, well defined audience, and only accessible on location on specific times, after which their target audience and accessibility widened substantially. Catalogs and varying parts of the collection are available online to a global audience.
    The external influence from competing university libraries is currently mirrored by the world wide web itself, with Google as one of the main external threats. I have written about this in my post “No future for libraries?“.
    The two important issues here are: the effects on modern library collections and audience. Teylers library decided to stop building its own collection, but they keep using it in a number of ways: temporary physical thematic exhibitions, but also in new digital “mashed up” ways. This might be a good example for modern libraries to follow: make use of modern technologies to reuse existing collections to create virtual online thematic aggregations of data, texts, images, etc. See also my post “Collection 2.0“.
    As for modern libraries’ response to changing audiences: proceeding with new ways of using their collections will draw new customers anyway. But it is equally important to find other ways to “go where your users are”, like being on social networks like the Teylers Ning site. One of the most important moves in the near future will be mobile presence.

    Teylers library shows us that there may be a new life for old libraries.

  • No future for libraries?

    Posted on May 24th, 2009 Lukas Koster 11 comments

    Will library buildings and library catalogs survive the web?

    © Moqub

    © Moqub

    Some weeks ago a couple of issues appeared in the twitter/blogosphere (or at least MY twitter/blogoshere) related to the future of the library in this digital era.

    • There was the Espresso book machine that prints books on demand on location, which led to questions like: “apart from influencing publishing and book shops, what does this mean for libraries?“.
    • There was a Twitter discussion about “will we still need library buildings?“.
    • There was another blog post about the future of library catalogs by Edwin Mijnsbergen (in Dutch) that asked the question of the value of library catalogs in relation to web2.0 and the new emerging semantic web.

    This made me start thinking about a question that concerns us all: is there a future for the library as we know it?

    To begin with, what is a library anyway?

    For ages, since the beginning of history, up until some 15 years ago, a library was an institution characterised by:

    © Mihai Bojin

    © Mihai Bojin

    • a physical collection of printed and handwritten material
    • a physical location, a building, to store the collection
    • a physical printed or handwritten on site catalog
    • on location searching and finding of information sources using the catalog
    • on site requesting, delivery, reading, lending and returning of material
    • a staff of trained librarians to catalog the collection and assist patrons

    The central concept here is of course the collection. That is the “raison d’être” of a library. The purpose of library building, catalog and librarians is to give people access to the collection, and provide them with the information they need.

    Clearly, because of the physical nature of the collection and the information transmission process the library needed to be a building with collection and catalog inside it. People had to go there to find and get the publications they needed.

    If collections and the transmission of information were completely digital, then the reason for a physical location to go to for finding and getting publications would not exist anymore. Currently one of these conditions has been met fully and the other one partly. The transmission of information can take place in a completely digital way. Most new scientific publications are born digital (e-Journals, e-Books), and a large number of digitisation projects are taking care of making digital copies of existing print material.
    Searching for items in a library’s collection is already taking place remotely through OPACs and other online tools almost everywhere. A large part of these collections can be accessed digitally. Only in case a patron wants to read or borrow a printed book or journal, he or she has to go the library building to fetch it.

    All this seems to lead to the conclusion that the library may be slowly moving away from a physical presence to a digital one.

    But there is something else to be considered here, that reaches beyond the limits of one library. In my view the crucial notion here is again the collection.
    In my post Collection 2.0 I argue that in this digital information age a library’s collection is everything a library has access to as opposed to the old concept of everything a library owns. This means in theory that every library could have access to the same digital objects of information available on the web, but also to each other’s print objects through ILL. There will be no physically limited collection only available in one library anymore, just one large global collection.

    In this case, there is not only no need for people to go to a specific library for an item in its collection, but also there is no need to search for items using a specific library’s catalog.

    Now you may say that people like going to a library building and browse through the stacks. That may still be true for some, but in general, as I argue in my post “Open Stack 2.0“, the new Open Stack is the Web.

    © Nicole C. Engard

    © Nicole C. Engard

    In the future there will be collections, but not physical ones (except of course for the existing ones with items that are not allowed to leave the library location). We will see virtual subject collections, determined by classifications and keywords assigned both by professionals and non-professionals.

    On a parallel level there will be virtual catalogs, which are views on virtual collections defined by subjects on different levels and in different locations: global, local, subject-oriented, etc. These virtual collections and catalogs will be determined and maintained by a great number of different groups of people and institutions (commercial and non-commercial). One of these groups can still be a library. As Patrick Vanhoucke observed on Twitter (in Dutch): “We have to let go of the idea of the library as a building; the ‘library’ is the network of librarians“. These virtual groups of people may be identical to what is getting known more and more as “tribes“.

    Having said all this, of course there will still be occurrences of libraries as buildings and as physical locations for collections. Institutions like the Library of Congress will not just vanish into thin air. Even if all print items have been digitised, print items will still be wanted for a number of reasons: research, art, among others. Libraries can have different functions, like archives, museums, etc. and still be named “libraries” too.
    Library buildings can transform into other types of locations: in universities they can become meeting places and study facilities, including free wifi and Starbucks coffee. Public libraries can shift focus to becoming centres of discovery and (educational) gaming. Anything is possible.

    It’s obvious that libraries obey the same laws of historical development as any other social institution or phenomenon. The way that information is found and processed is determined, or at least influenced, by the status of technological development. And I am not saying that all development is technology driven! This is not the place for a philosophy on history, economics and society.

    Some historical parallels to illustrate the situation that libraries are facing:

    • writing: inscribing clay tablets > scratching ink on paper > printing (multiplication, re-usability) > typewriter > computer/printer (digital multiplication and re-usability!) > digital only (computer files, blogs, e-journal, e-books)
    • consumption of music: attending live performance on location > listening to radio broadcast > playing purchased recordings (vinyl, cassettes, cd, dvd) > make home recordings > play digital music with mp3/personal audio > listen to digital music online

    From these examples it’s perfectly clear that new developments do not automatically make the old ways disappear! Prevailing practices can coexist with “outdated” ways of doing things. Libraries may still have a future.

    In the end it comes down to these questions:

    • Will libraries cease to exist, simply because they no longer serve the purpose of providing access to information?
    • Are libraries engaged in a rear guard fight?
    • Will libraries become tourist attractions?
    • Will libraries adapt to the changing world and shift focus to serve other, related purposes?
    • Are professional librarian skills useful in a digital information world?

    I do not know what will happen with libraries. What do you think?

  • Collection 2.0

    Posted on February 15th, 2009 Lukas Koster No comments

    Henk Ellerman of Groningen University Library writes about the “Collection in the digital age” reacting to Mary Frances Casserly’s article “Developing a Concept of Collection for the Digital Age“. I haven’t read this 2002 article yet, but Henk Ellerman goes into the problem of finding a metaphor describing collections that for a large part consists of resources available on the internet.
    Henk says:
    …the collection (the one deemed relevant for… well whatever) is a subset that needs to be picked from the total set of available online resources.”
    “I find it quite remarkable that the
    new collection is seen as the result of a process of picking elements, a process similar to finding shells on a beach.”
    “What if we expand the notion of a collection in such a way that the sea becomes part of it?”
    “The main issue with any sensible collection is quality control. We don’t want ugly things in our collections.”
    “Then a collection is not a simple store of documents anymore, but a rather complex system of interrelated documents, controlled by a selected group of people.”
    “Librarians ‘just’ need to make the system searchable.”

    I have a couple of thoughts about collections myself that I would like to add to these.
    Originally, a collection is the total number of physical objects of a specific type that are in the possession of a person, or an organisation. Merriam-Webster says: “an accumulation of objects gathered for study, comparison, or exhibition or as a hobby“. People can collect Barbie dolls or miniature cars as a hobby, or rare books or monkey skulls for scientific reasons.
    (By the way, individual collectors of rare books are often described in movies as rich, old, excentric people with a small but very valuable collection of very old books about topics such as satanism, who end up being killed in a horrible way, and having there collections destroyed by fire, like I saw some time ago in Polanski’s “The ninth gate”.)

    When organisations have collections then it is almost always for study or exhibition, but also for practical reasons. We are talking mainly about museums and libraries. In the case of libraries there is a rough distinction between public libraries and libraries belonging to scientific and/or educational institutions. Let’s focus on educational libraries, or “university libraries” to make the picture a bit simple.
    University libraries have collected written and/or printed texts (books, journals, also containing images, maps, diagrams, etc.) in order to provide their staff and students with material to be able to teach and study. A library’s collection then describes all objects in the possession of the library. In the digital age, electronic journals and databases have been added to these collections, but in most cases this concerns only resources the library owns or for which the library pays money to gain access to. The collection then becomes the totality of objects (physical or digital) that the library owns or is granted access to by means of a contract. Freely available resources are explicitly not counted here.

    Now, here we have to make an important distinction between a library’s total collection (“the collection”), meaning “all items the library owns or has access to”), and a collection on a specific topic or for a specific subject (“subject collections”), meaning “all items that have been selected by professionals to be part of the material that is necessary for studying a specific topic”, for instance “the University of Amsterdam library’s Chess collection”. In the past, people would have to go to a specific library to consult a specific collection on a specific subject.
    “The collection” is merely the sum of all the library’s “subject collections”, nothing more.

    Before we go to the collection in the digital age, an interesting intermediate question is: what is the position of interlibrary loan in the concept of collection? Are books from other libraries that are available to a specific library’s patrons to be considered as part of that second library’s collection? In the strict sense of the collection concept (“all items the library owns”), the answer is “no”. But if we expand the notion of collection to mean: “everything a library has access to”, then the answer clearly would have to be “yes”.

    Now, in the digital age, the limitation that a collection’s objects should be available physically in a specific location, disappears. This means that anything can be part of “the collection” of a specific library, also objects or texts that have not been judged as scientific before, like blog posts. This is the “sea” that Henk Ellerman is talking about. A subject collection is also not limited by physical borders anymore. Subject collections can contain material, physical and digital, from anywhere. In this case, there is no reason that a subject collection should be a specific library’s subject collection, obviously. Key is “quality control”, or as Henk Ellerman puts it: “We don’t want ugly things in our collections“. Subject collections should be universal, global, virtual collections of physical and digital objects, “controlled by a selected group of people“.

    Now, the most important question: who decides who will be part of these selected groups of people? The answer to this question is still to be found. I guess we will see several types of “expert groups” emerge: coalitions between university libraries nationally or globally, but also between not-for-profit and commercial organisations, and of course also between individuals cooperating informally, like in the blogosphere, or in wikipedia .
    The collections that will be controlled by these coalitions will not have fixed boundaries, but will have more “professional” cores with several “less professional” spheres around it or intersecting with other collections.

    It is time we start building.