Data. The final frontier.
RSS icon Home icon
  • PhD, the final frontier – Part two: Reconnaissance

    Posted on November 9th, 2013 Lukas Koster 2 comments

    Tools and methods for PhD research and writing

    After my decision was made to attempt a scholarly career and aim for a PhD (see my first post in this series), I started writing a one page proposal. When I finished I had a working title, a working subtitle, a table of content with 6 chapter titles, and a very general and broad text about some contradictions between the state of technology and the integration of information. I can’t really go into any details, because the first thing you learn when you aim for a scholarly publication, is to keep your subject to yourself as much as possible in order to prevent someone else from hijacking it and get there first.


    I used Google Docs/Drive for writing this proposal. I have been using Google Docs since a couple of years for all my writing, both personal and professional, because that way I have access to my documents wherever I go on any device I want, I can easily share and collaborate, and it keeps previous versions.

    I shared my proposal with my advisor Frank Huysmans, using the Google Docs sharing options. We initially communicated about it through Twitter DMs, Google Docs comments and email. Frank’s first reaction was that it looked like a good starting point for further exploration, and he saw two related perspectives to explore: Friedrich Kittler’s media ontology and Niklas Luhmann’s system theory. I had studied a book by Luhmann on the sociology of legal trials during my years in university, but he only became one of the major theoretical sociologists after that. I hadn’t heard of Kittler at all. He is a controversial major media and communications philosopher. Both are (or rather were) Germans.

    Next step: I needed to get hold of literature, books and articles, both by and about Luhmann and Kittler, preferably in English, although I read German very well. Some background information about the scholars and their work would also be useful.
    An important decision I had to make was whether I was going to use print, digital or both formats for my literature collection. I didn’t have to think long. I decided to try and get everything in digital format, either as online material, or downloadable PDFs, EPUBs etc. The main reason is that this way I can store the publications in an online storage facility like Dropbox, and have access to all my literature wherever I am, at work, at home and on the road (either on an ereader or my smartphone).

    Frank, who is a Luhmann expert, gave me some pointers on publications by and about Luhmann. Kittler I had to find on my own. Of course, working for the Library of the University of Amsterdam and being responsible for our Primo discovery tool, I tried finding relevant Luhmann and Kittler publications there. I also tried Google and Google Scholar. I used my own new position as a library consumer to perform a basic comparison test between library discovery and Google. My initial conclusions were that I got better results using Google than our own Primo. This was in March 2013. But in the mean time both Ex Libris and the Library have made some important adjustments to the Primo indexing procedures and relevance ranking algorithm. Repeating similar searches in November 2013 provided much better results.
    Anyway, I mostly ignored print publications. As a staff member of the University of Amsterdam I have access to all subscription online content, whether I find it through our own discovery tools or via Google. What I can’t find in our library discovery tools are ‘unofficial’ digital versions of print books. Here Google can help. For instance I found a complete PDF version of Luhmann’s main work “Die Gesellschaft der Gesellschaft” (“Society of society”, in German). Also Frank was so kind to digitise/copy for me some chapters of relevant print books about Luhmann.

    I discovered that I needed some sort of mechanism to categorise or catalogue my literature in such a way that this is available to me wherever I am, other than just put files in a Dropbox folder. After some research I decided on Mendeley, which was originally presented as a reference manager, but is now more a collaboration, publication sharing and annotation environment. I am not using Mendeley as a reference manager for now, only for categorising digital local and online publications and making annotations attached to the publications.
    An important feature that I need in research is to have access to my notes everywhere. With Mendeley I can attach notes to PDFs in the Mendeley PDF reader which are synchronised with my other Mendeley installations on other workstations, if the ‘synchronise files’ option is checked everywhere. Actually Mendeley distinguishes between “notes” and “annotations”. Notes are made and attached on the level of the publication, annotations are linked to specific text sections in PDFs. Annotations don’t work with EPUB files, because there is no embedded Mendeley EPUB reader, or with online URLs. I can add notes to specific EPUB text sections in my ereader which is synchronised between my ereader software instances. So I still need an independent annotation tool that can link my notes to all my research objects and synchronises them between my devices independent of format or sofware.

    A final note about digital formats. PDFs are the norm in digital scholarly dissemination of publications. PDFs are fine to read on a PC, and attaching notes in Mendeley, but they’re horrible to read on my ereader. There I prefer EPUB. I would really like to have more standard options for downloading publications to select from, at least PDF and EPUB.

    Summarising, the functions I need for my PhD research and writing in this stage, and the tools and methods I am currently using:


    • Find publications, information: library discovery tools, Google Scholar, the web
    • Acquire digital copies of publications: access/authorisation; PDF, EPUB
    • Store material independent of location and device: Dropbox, Mendeley
    • Save references: Mendeley
    • Notes/annotations: Mendeley, Kobo ereader software
    • Write and share documents independent of location and device: Google Docs
    • Communicate: Twitter, email, Google Docs comments
  • PhD, the final frontier – Part one: To boldly go

    Posted on August 18th, 2013 Lukas Koster 1 comment

    My final career? Facing the challenge

    This is the first in hopefully a series of posts about a new phase in my professional life, in which I will try to pursue a scholarly career with a PhD as my first goal. My intention with this series is to document in detail the steps and activities needed to reach that goal. In this introduction I will describe the circumstances and considerations that finally led to my decision to take the plunge.


    First some personal background. I graduated in Sociology at the University of Amsterdam in 1987. I received the traditional Dutch academic title of “Doctorandus” (“Drs.”) which in the current international higher education Bachelor/Masters system is equivalent to a Master of Science (MSc). I specialised in sociology of organisations, labour and industrial relations, with minors in economics and social science “informatics”. I wrote my thesis “Arbeid onder druk” (“Labour under pressure”) on automation, the quality of labour and workers’ influence in the changing printing industry in the 20th century.
    The job market for sociologists in the 1980s and 1990s was virtually non-existent, so I had to find an alternative career. One of the components of the social science informatics minor was computer programming. I learned to program in Pascal on a mainframe using character based monochrome terminals. I actually liked doing that, and I decided to join the PION IT Retraining programme for unemployed (or underemployed) academics organised in the 1980s by the Dutch State to overcome the growing shortage of IT professionals. After a test I was accepted and I finished the course, during which I learned programming in COBOL, with success in 1988. However, it took me another two years to finally find a job. From 1990 I worked as a systems designer and developer (programming in PL/1, Fortran, Java among others) for a number of institutions in the area of higher education and scholarly information until 2002. Then I was sent away with a year’s salary from the prematurely born Institute for Scholarly Information Services NIWI, which was terminated in 2005. From its ruins the now successful Data Archiving and Networking Services institute (DANS) emerged.
    It was then that I tried to take up a new academic career for the first time, and I enrolled in Cultural Studies at the Open University. I enjoyed that very much, I took a lot of courses and even managed to pass a number of exams.
    By the end of that year, after a chance meeting with a former colleague in the tram on my way to take an exam, I found myself working at the Dutch Royal Library in The Hague as temporary project staff member for implementing the new federated search and OpenURL tools MetaLib and SFX from Ex Libris. This marked the start of my career in library technology. It was then that I first learned about bibliographic metadata formats, cataloguing rules and MARC madness.
    I soon discovered that it’s very hard to combine work and studying, and because I started to like working for libraries and networking and exchanging knowledge I silently dropped out of Open University.
    After three years on temporary contracts I moved to the Library of the University of Amsterdam to do the same work as I did at the Royal Library. I got involved in the International Ex Libris User Group IGeLU and started trying to make a difference in library systems, working together with an enthusiastic bunch of people all over the world. I made it to head of department for a while, until an internal reorganisation gave me the opportunity to say goodbye to that world of meetings, bureaucracy and conflicts. Now I am Library Systems Coordinator, which doesn’t mean that I am coordinating library systems, by the way. My main responsibility at the moment is our Ex Libris Primo discovery tool. The most important task with that is coordinating data and metadata streams. A lot of time, money and effort is spent on streamlining the moving around of metadata between a large number of internal and external systems. Since a couple of years I have been looking into, reading about, writing and presenting on linked open data and metadata infrastructures, with now and then a small pilot project. But academic libraries are so slow in realising that they should move away from thinking in new systems for new challenges and investing in metadata and data infrastructures instead, that I am not overly enthusiastic about working in libraries anymore. It was about a year ago that I suddenly realised that with Primo I was still doing exactly the same detailed configuration stuff and vendor communications that I was doing ten years earlier with MetaLib, and nothing had changed.
    It was then that I said to myself: I need to do something new and challenging. If this doesn’t happen at work soon, then I have to find something satisfying to do besides that.

    I can’t reproduce the actual moment anymore, but I think it happened when I was working with getting scholarly publications (dissertations among others) from our institutional repository into Primo. Somehow I thought: I can do that too! Write a dissertation, get a PhD! My topic could be something in the field of data, information, knowledge integration, from a sociological perspective.
    So I started looking into the official PhD rules and regulations at the University of Amsterdam, to find out what possibilities there are for people with a job getting a PhD. It turns out there are options, even with time/money compensation for University staff. But still not everything was clear to me. So I decided to ask Frank Huysmans, part time Library Science professor at the University of Amsterdam, and also active on twitter and in the open data and public libraries movement, if he could help me and explain the options and pros and cons of writing a dissertation. He agreed and we met in a pub in Amsterdam to discuss my ideas over a couple of nice beers.

    The good thing was that Frank thought that I should be able to pull this off, looking at my background, work experience and writing. The bad thing is that he asked me “Are you sure that you don’t just want to write a nice popular science book?”. Apparently scholarly writing is subject to a large amount of rules and formats, and not meant for a pleasant read.
    An encouraging thing that Frank told me is that it is possible to compile a dissertation from a number of earlier published scholarly peer reviewed articles. Now this really appealed to me, because this means that I can attempt to write a scholarly article and try to get it published first, and get the feel of the art of scholarly research, writing and publication, in a relatively short period. This way I can leave the options open to leave it at that or to continue with the PhD procedure later. I agreed to write a short dissertation proposal and send it to Frank and to discuss that in a next meeting.

    My decision was made. Although in the meantime a couple of interesting perspectives at work appeared on the horizon involving research information and linked data, I was going to try and start a scholarly career.

    Next time: the first steps – reading, thinking, writing a draft proposal and how to keep track of everything.

  • A day between the stacks

    Posted on August 9th, 2013 Lukas Koster 2 comments

    Connecting real books, metadata and people

    I spent one day in the stacks of the off-site central storage facility of the Library of the University of Amsterdam as one of the volunteers helping the library perform a huge stock control operation which will take years. Goal of this project is to get a complete overview of the discrepancy between what’s on the shelves and what is in the catalogue. We’re talking about 65 kilometer of shelves, approximately 2.5 million items, in this central storage facility alone.

    To be honest, I volunteered mainly for personal reasons. I wanted to see what is going on behind the scenes with all these physical objects that my department is providing logistics, discovery and circulation systems for. Is it really true that cataloguing the height of a book (MARC 300$c) is only used for determining which shelf to put it on?

    The practical details: I received a book cart with a pencil, a marker, a stack of orange sheets of paper with the text “Book absent on account of stock control” and a printed list of 1000 items from the catalogue that should be located in one stack. I stood on my feet between 9 AM and 4 PM in a space of around 2-3 meter in one aisle between two stacks, with one hour in total of coffee and lunch breaks, in a huge building in the late 20th century suburbs of Amsterdam without mobile phone and internet coverage. I must say, I don’t envy the people working there. I’m happy to sit behind my desk and pc in my own office in the city centre.

    Most of my books were indeed in a specific size range, 18-22 cm. approximately, with a couple of shorter ones. I found approximately 25-30 books on the shelves that were not on my list, and therefore not in the catalogue. I put these on my cart, replacing them with one of the orange sheets on which I pencilled the shelfmark of the book. There were approximately 5-10 books on my list missing from the shelves, which I marked on the list. One book had a shelfmark on the spine that was identical to the one of the book next to it. Inside was a different code, which seemed to be the correct one (it was on the list, 10 places down). I put 10 books on the cart of which I thought the title on the list didn’t match the title on the book correctly, but this is a tricky thing, as I will explain.

    Title metadata
    The title printed on the list was the “main title”, or MARC 245$a. It is very interesting to see how many differences there are between the ways that main and subtitles have been catalogued by different people through the ages. For instance, I had two editions on my list (1976 and 1980) of a German textbook on psychiatry, with almost identical titles and subtitles (title descriptions taken from the catalogue):

    Psychiatrie, Psychosomatik, Psychotherapie : Einführung in die Praktika nach der neuen Approbationsordnung für Ärzte mit 193 Prüfungsfragen : Vorbereitungstexte zur klinischen und Sozialpsychiatrie, psychiatrischen Epidemiologie, Psychosomatik,Psychotherapie und Gruppenarbeit

    Psychiatrie : Psychosomatik, Psychotherapie ; Einf. in d. Praktika nach d. neuen Approbationsordnung für Ärzte mit Schlüssel zum Gegenstandskatalog u.e. Sammlung von Fragen u. Antworten für systemat. lernende Leser ; Vorbereitungstexte zur klin. u. Sozialpsychiatrie, psychiatr. Epidemiologie, Psychosomatik, Psychotherapie u. Gruppenarbeit

    The first book, from 1976 (which actually has ‘196’ instead of ‘193’ on the cover), is on the list and in the catalogue with the main title (MARC 245$a) “Psychiatrie, Psychosomatik, Psychotherapie :”.
    The second book, from 1980, is on the list with the main title “Psychiatrie :”.
    Evidently it is not clear without a doubt what is to be catalogued as main title and subtitle just by looking at the cover and/or title page.

    I have seen a lot of these cases in my batch of 1000 books in which it is questionable what constitutes the main and subtitle. Sometimes the main title consists of only the initial part, sometimes it consists of what looks like main and subtitle taken together. At first I put all parts of a serial on my cart because in my view the printed titles were incorrect. They only contained the title of the specific part of the serial, whereas in my non-librarian view the title should consist of the serial title + part title. On the other hand I also found serials for which only the serial title was entered as main title (5 items “Gesammelte Werke”, which means “Collected Works” in German). No consistency at all.
    What became clear to me is that in a lot of cases it is impossible to identify a book by the catalogued main title alone.

    Another example of problematic interpretation I came across: a Spanish book, with main title “Teatro de Jacinto Benavente” on my list, and on the cover the author name “Jacinto Benavente” and title “Teatro: Rosas de Otoño – Al Natural – Los Intereses Creados”. On the title page: “Teatro de Jacinto Benavente”.














    In the catalogue there are two other books with plays by the same author, just titled “Teatro”. All three have as author “Jacinto Benavente”. All three are books containing a number of theatre plays by the author Jacinto Benavente. There were a lot of similar books with as recorded main title ‘Theatre‘ in a number of languages.

    A lot of older books on my shelves (pre 20th century mainly, but also more recent ones) have different titles and subtitles on their spine, front and title page. Different variations depending on the available print space I guess. It’s hard to determine what the actual title and subtitles are. The title page is obviously the main source, but even then it looks difficult to me. Now I understand cataloguers a little better.

    Works on the shelves
    So much for the metadata. What about the actual works? There were all kinds of different types mixed with each other, mostly in batches apparently from the same collection. In my 1000 items there were theatre related books, both theoretical works and texts of plays, Russian and Bulgarian books of a communist/marxist/leninist nature, Arab language books of which I could not check the titles, some Swedish books, a large number of 19th century German language tourist guides for Italian regions and cities, medical, psychological and physics textbooks, old art history works, and a whole bunch of social science textbooks from the eighties of which we have at least half at our home (my wife and I both studied at the University of Amsterdam during that period ). I can honestly say that most of the textbooks in my section of the stacks are out of date and will never be used for teaching again. The rest was at least 100 years old. Most of these old books should be considered as cultural heritage and part of the Special Collections. I am not entirely sure that a university library should keep most of these works in the stacks.

    Apart from this neutral economical perspective, there were also a number of very interesting discoveries from a book lover’s viewpoint, of which I will describe a few.

    A small book about Japanese colour prints containing one very nice Hokusai print of Mount Fuji.


    A handwritten, and therefore unique, item with the title (in Dutch) a “Bibliography of works about Michel Angelo, compiled by Mr. Jelle Hingst”, containing handwritten catalogue type cards with one entry each.


    A case with one shelfmark containing two items: a printed description and a what looks like a facsimile of an old illustrated manuscript.


    An Italian book with illustrations of ornaments from the cathedral of Siena, tied together with two cords.


    And my greatest discovery: an English catalogue of an exhibition at the Royal Academy from 1908: “Exhibition of works by the old masters and by deceased masters of the British school including a collection of water colours” (yes, this is one big main title).

    But the book itself is not the discovery. It’s what is hidden inside the book. A handwritten folded sheet of paper, with letterhead “Hewell Grange. Bromsgrove.” (which is a 19th century country house, seat of the Earls of Plymouth, now a prison), dated Nov. 23. 192. Yes, there seems to be a digit missing there. Or is it “/92”? Which would not be logical in an exhibition catalogue from 1908. It definitely looks like a fountain pen was used. It also has some kind of diagonal stamp in the upper left corner “TELEGRAPH OFFICE FINSTALL”. Finstall is a village 3 km from Hewell Grange.

    The paper also has a pencil sketch of a group of people, probably a copy of a painting. At first I thought it was a letter, but looking more closely it seems to be a personal impression and description of a painting. There are similar handwritings on the pages of the book itself.
    I left the handwritten note where I found it. It’s still there. You can request the book for consultation and see for yourself.


    End users, patrons, customers or whatever you want to call them, can’t find books that the library owns if they are not catalogued. They can find bibliographic descriptions of the books elsewhere, but not the information needed to get a copy at their own institution. This confirms the assertion that holdings information is very important, especially in a library linked open data environment.

    The majority of books in an academic library are never requested, consulted or borrowed. Most outdated textbooks can be removed without any problem.

    There are a lot of cultural heritage treasures hidden in the stacks that should be made accessible to the general public and humanities researchers in a more convenient way.

    In the absence of open stacks and full text search for printed books and journals it is crucial that the content of books, and articles too, is described in a concise, yet complete way. Not only formal cataloguing rules and classification schemes should be used, but definitely also expert summaries and end user generated tags.

    Even with cataloguing rules it can be very hard for cataloguers to decide what the actual titles, subtitles and authors of a book are. The best source for correct title metadata are obviously the authors, editors and publishers themselves.

    Book storage staff can’t find requested books with incorrect shelfmarks on the spine.

    Storing, locating, fetching and transporting books does not require librarian skills.

    All in all, a very informative experience. 

  • Meeting people vs. meeting deadlines

    Posted on June 23rd, 2013 Lukas Koster 1 comment

    Lessons from Cycling for Libraries


    As I am writing this, more than 100 people working in and for libraries from all over the world are cycling from Amsterdam to Brussels in the Cycling for Libraries 2013 event, defying heat, cold, wind and rain. And other cyclists ;-). Cycling for Libraries is an independent unconference on wheels that aims to promote and defend the role of libraries, mainly public libraries, in and for society. This year it’s the third time the trip is organised. In 2011 (Copenhagen-Berlin) I was only able to attend the last two days in Berlin. In 2012 (Vilnius-Tallinn) I could not attend at all. This year I was honoured and pleased to be able to contribute to the organisation of and to actively participate in the first two days of the tour, in the area where I live (Haarlem) and work (Amsterdam).
    I really like the Cycling for Libraries concept and the people involved, and I will tell you why, because it is not so obvious in my case. You may know that I am rather critical of libraries and their slowness in adapting to rapidly changing circumstances. And also that I am more involved with academic and research libraries than with public libraries. Moreover I have become a bit “conference tired” lately. There are so many library and related conferences where there is a lot of talk which doesn’t lead to any practical consequences.

    The things I like about Cycling for Libraries are: the cycling, the passion, the open-mindedness, the camaraderie, the networking, the un-organisedness, the flexibility, the determination and the art of achieving goals and more than those.

    I like cycling trips very much. I have done a few long and far away ones in my time, and I know that this is the best way to visit places you would never see otherwise. While cycling around you have unexpected meetings and experiences, you get a clear mind, and in the end there is the overwhelming satisfaction of having overcome all obstacles and having reached your goal. It is fun, although sometimes you wonder why on earth you thought you were up to this.

    The organisers and participants of Cycling for Libraries are all passionate about and proud of their library and information profession, without being defensive and introverted. As you can see from their “homework assignments” they’re all working on innovative ways, on different levels and with varying scopes, to make sure the information profession stays relevant in the ever changing society where traditional libraries are more and more undervalued and threatened. So, the participants really try to make a difference and they’re willing to cycle around Europe in order to attract attention and spread the message.

    Open minds are a necessity if you embark on an adventure that involves hundreds of people from different backgrounds. For collaborating both in decentralised organisation teams and in the core group of 120 people on the move, you need to appreciate and respect everybody’s ideas and contributions. Working towards one big overall goal requires giving and taking. Especially in the group of 120 people on wheels doing the hard work this leads to an intense form of camaraderie. These comrades on wheels are depending on each other to get where they’re going. A refreshing alternative for the everyday practice of competition and struggle between vendors, publishers and libraries.

    Zandvoort Public Library Bar

    Zandvoort Public Library Bar

    The event offers an unparallelled opportunity for networking. With ordinary conferences the official programme offers a lot of interesting information and sometimes discussion, but let’s be honest, the most useful parts are the informal meetings during lunches, coffee breaks and most importantly in the pubs at night. It is then that relevant information is exchanged, new insights are born and valuable connections are made. Cycling for Libraries turns this model completely upside down and inside out. It is one long networking event with some official sessions in between.

    Cycling for Libraries is an un-conference. As I have learned, this specific type on wheels depends on un-organisation and flexibility. It is impossible to organise an event like this following a strict and centralised coordination model where everybody has to agree on everything. For us Dutch people this can be uneasy. Historically we have depended on talking and agreeing on details in order to win the struggle against the water. The cyclists have ridden along the visible physical results of this struggle between Delft and Brugge, the dykes, dams and bridges of the Delta Works. The need to agree led to what became known as the Polder Model. On the other hand we also have a history of decentralised administration. “The Netherlands” is not a plural name for nothing, the country was formed out of a loose coalition of autonomous counties, cities and social groups, united against a common enemy.

    OBA start event

    OBA start event

    Anyway, the organisation of Cycling for Libraries 2013 started in February with a meeting in The Hague with a number of representatives and volunteers from The Netherlands and Belgium (or Flanders I should say), when Jukka Pennanen, Mace Ojala and Tuomas Lipponen visited the area. After that the preparations were carried out by local volunteers and institutions without any official central coordination whatsoever. This worked quite well, with some challenges of course. I myself happened to end up coordinating events in the Amsterdam-Haarlem-Zandvoort area. I soon learned to take things as they came, delegate as much as possible and rely on time and chance. In the end almost everything worked out fine. In Amsterdam we planned the start event at OBA Central Public Library and the kick-off party at the University of Amsterdam Special Collections building. I only learned about the afternoon visit to KIT Royal Tropical Institute a couple of weeks before. I had nothing to do with that, but it turned out be a successful part of day one. Actually the day after KIT staff told the Cycling for Libraries participants that the museum and library faced closing down, a solution was reached and museum and library were saved. Coincidence?

    Haarlem Station Library

    Haarlem Station Library

    The next day, the first actual cycling day between Amsterdam and The Hague, I cycled along for part of the route, from my home town Haarlem to halfway to The Hague. The visit of the Haarlem Station Library started an hour later than planned, and during lunch in Zandvoort on the coast we received word that the local public library were waiting for us to visit them. This was a surprise for me and Gert-Jan van Velzen (who helped plot the route from Amsterdam to The Hague). But we decided to go there anyway, and we were welcomed with free drinks and presents by the friendly librarians in their brand new building. At 4 o’clock we were expected to arrive at Noordwijk Public Library, but we were still in Zandvoort. No problem for Jeanine Deckers (airport librarian and regional librarian) who was waiting in Noordwijk with stroopwafels. Unfortunately I didn’t make it to Noordwijk, because I had to go back home and work the next day. But it was great to experience one day of actual cycling for libraries.

    This loose, distributed and flexible organisation might be seen as an example of resilience, the concept that was introduced by Beate Rusch in her talk about the future of German regional library service centres at the recent ELAG 2013 conference in Ghent. Resilience means something like “the ability of someone or something to return to its original state after being subjected to a severe disturbance”, or simply put “something doesn’t break, but adapts under unexpected serious outside influences”. I completely agree that it would be better if organisations and infrastructures in the library and information profession were more loosely organised and connected. By the way, Beate was also involved in organising the Berlin part of the first Cycling for Libraries.

    One final thing I want to say is that I admire the way in which Cycling for Libraries manages to reach their goals and more by means of this loose, distributed and flexible organisation. Depending on local coordination teams they succeeded in meeting Dutch members of parliament in The Hague and the European Parliament in Brussels to promote their cause. Which is a remarkable result for a small group of crazy Fins.

  • Mainframe to mobile

    Posted on February 16th, 2010 Lukas Koster 11 comments

    The connection between information technology and library information systems

    This is the first post in a series of three

    [1. Mainframe to mobile – 2. Mobile app or mobile web?3. Mobile library services]

    The functions, services and audience of library information systems, as is the case with all information systems, have always been dependent on and determined by the existing level of information technology. Mobile devices are the latest step in this development.

    © sainz

    In the beginning there was a computer, a mainframe. The only way to communicate with it was to feed it punchcards with holes that represented characters.

    © Mirandala

    If you made a typo (puncho?), you were not informed until a day later when you collected the printout, and you could start again. System and data files could be stored externally on large tape reels or small tape cassettes, identical to music tapes. Tapes were also used for sharing and copying data between systems by means of physical transportation.

    © ajmexico

    Suddenly there was a human operable terminal, consisting of a monitor and keyboard, connected to the central computer. Now you could type in your code and save it as a file on the remote server (no local processing or storage at all). If you were lucky you had a full screen editor, if not there was the line editor. No graphics. Output and errors were shown on screen almost immediately, depending on the capacity of the CPU (central processing unit) and the number of other batch jobs in the queue. The computer was a multi-user time sharing device, a bit like the “cloud”, but every computer was a little cloud of its own.
    There was no email. There were no end users other than systems administrators, programmers and some staff. Communication with customers was carried out by sending them printouts on paper by snail mail.

    I guess this was the first time that some libraries, probably mainly in academic and scientific institutions, started creating digital catalogs, for staff use only of course.

    © n.kahlua72

    © RaeA

    Then came the PC (Personal Computer). Terminal and keyboard were now connected to the computer (or system unit) on your desk. You had the thing entirely to yourself! Input and output consisted of lines of text only, one colour (green or white on black), and still no graphics. Files could be stored on floppy disks, 5¼-inch magnetic things that you could twist and bend, but if you did that you lost your data. There was no internal storage. File sharing was accomplished by moving the floppy from one PC to another and/or copy files from one floppy to another (on the same floppy drive).

    © suburbanslice

    Later we got smaller disks, 3½-inch, in protective cases. The PC was mainly used for early word processing (WordStar, WordPerfect) and games. Finally there was a hard disk (as opposed to “floppy” disk) inside the PC system unit, which held the operating system (mainly MS-DOS), and on which you could store your files, which became larger. Time for stand-alone database applications (dBase).

    Client server GUI

    Then there was Windows, a mouse, and graphics. And of course the Internet! You could connect your PC to the Internet with a modem that occupied your telephone line and made phone calls impossible during your online session. At first there was Gopher, a kind of text based web.
    Then came the World Wide Web (web 0.0), consisting of static web pages with links to other static web pages that you could read on your PC. Not suitable for interactive systems. Libraries could publish addresses and opening hours.
    But fortunately we got client-server architecture, combining the best of both worlds. Powerful servers were good at processing, storing and sharing data. PC’s were good at presenting and collecting data in a “user friendly” graphical user interface (GUI), making use of local programming and scripting languages. So you had to install an application on the local PC which then connected to the remote server database engine. The only bad thing was that the application was tied to the specific PC, with local Windows configuration settings. And it was not possible to move the thing around.

    Now we had multi-user digital catalogs with a shared central database and remote access points with the client application installed, available to staff and customers.

    Luckily dynamic creation of HTML pages came along, so we were able to move the client part of client-server applications to the web as well. With web applications we were able to use the same applications anywhere on any computer linked to the world wide web. You only needed a browser to display the server side pages on the local PC.

    Now everybody could browse through the library catalog any time, anywhere (where there was a computer with an internet connection and a web browser). The library OPAC (Online Public Access Catalog) was born.

    Web OPAC

    The only disadvantage was that every page change had to be generated by the server again, so performance was not optimal.
    But that changed with browser based scripting technology like JavaScript, AJAX, Flash, etc. Application bits are sent to the local browser on the PC at runtime, to be executed there. So actually this is client server “on the fly”, without the need to install a specific application locally.

    © nxtiak

    In the meantime the portable PC appeared, system unit, monitor and keyboard all in one. At first you needed some physical power to move the thing around, but later we got laptops, notebooks, netbooks, getting smaller, lighter and more powerful all the time. And wifi of course, no need to plug the device in to the physical network anymore. And USB-sticks.

    Access to OPAC and online databases became available anytime, anywhere (where you carried your computer).

    The latest development of course is the rise of mobile phones with wireless web access, or rather mobile web devices which can also be used for making phone calls. Mobile devices are small and light enough to carry with you in your pocket all the time. It’s a tiny PC.

    Finally you can access library related information literally any time, anywhere, even in your bedroom and bathroom.

    Mobile library app

    It’s getting boring, but yes, there is a drawback. Web applications are not really accommodated for use in mobile browsers: pages are too large, browser technology is not really compatible, connections are too slow.

    Available options are:

    • creating a special “dumbed down” version of a website for use on mobile devices only: smaller text based pages with links
    • creating a new HTML5/CSS3 website, targeted at mobile devices and “traditional” PC’s alike
    • creating “apps”, to be installed on mobile devices and connect to a database system in the cloud; basically this is the old client-server model all over again.

    A comparison of mobile apps and mobile web architecture is the topic of another post.

  • Who needs MARC?

    Posted on May 15th, 2009 Lukas Koster 22 comments

    Why use a non-normalised metadata exchange format for suboptimal data storage?

    Catalog card

    © leah the librarian

    This week I had a nice chat with André Keyzer of Groningen University library and Peter van Boheemen of Wageningen University Library who attended OCLC’s Amsterdam Mashathon 2009. As can be expected from library technology geeks, we got talking about bibliographic metadata formats, very exciting of course. The question came up: what on earth could be the reason for storing bibliographic metadata in exchange formats like MARC?

    Being asked once at an ELAG conference about the bibliographic format Wageningen University was using in their home grown catalog system, Peter answered: “WDC” ….”we don’t care“.

    Exactly my idea! As a matter of fact I think I may have used the same words a couple of times in recent years, probably even at ELAG2008. The thing is: it really does not matter how you store bibliographic metadata in your database, as long as you can present and exchange the data in any format requested, be it MARC or Dublin Core or anything else.

    Of course the importance of using internationally accepted standards is beyond doubt, but there clearly exists widespread misunderstanding of the functions of certain standards, like for instance MARC. MARC is NOT a data storage format. In my opinion MARC is not even an exchange format, but merely a presentation format.

    St. Marc Express

    St. Marc Express

    With a background and experience in data modeling, database and systems design (among others), I was quite amazed about bibliographic metadata formats when I started working with library systems in libraries, not having a librarian training at all. Of course, MARC (“MAchine Readable Cataloging record“) was invented as a standard in order to facilitate exchange of library catalog records in a digital era.
    But I think MARC was invented by old school cataloguers who did not have a clue about data normalisation at all. A MARC record, especially if it corresponds to an official set of cataloging rules like AARC2, is nothing more than a digitised printed catalog card.

    In pre-computer times it made perfect sense to have a standardised uniform way of registering bibliographic metadata on a printed card in this way. The catalog card was simultaneously used as a medium for presenting AND storing metadata. This is where the confusion originates from!

    MARC record

    MARC record

    But when the Library of Congress saysIf a library were to develop a “home-grown” system that did not use MARC records, it would not be taking advantage of an industry-wide standard whose primary purpose is to foster communication of information” it is saying just plain nonsense.
    Actually it is better NOT to use something like MARC for other purposes than exchanging, or better, presenting data. To illustrate this I will give two examples of MARC tags that have been annoying me since my first day as a library employee:

    100 – Main Entry-Personal Name
    Besides storing an author’s name as a string in each individual bibliographic record instead of using a code, linking to a central authority table (“foreign key” in relational database terms), it is also a mistake to use a person’s name as one complete string in one field. Examples on the Library of Congress MARC website use forms like “Adams, Henry”, “Fowler, T. M.” and “Blackbeard, Author of”. To take only the simple first example, this author could also be registered as “Henry Adams”, “Adams, H.”, “H. Adams”. And don’t say that these forms are not according to the rules! They are out there! There is no way to match these variations as being actually one and the same.
    In a normalised relational database, this subfield $a would be stored something like this (simplified!):

    • Person
      • Surname=Adams
      • First name=Henry
      • Prefix=

    773 – Host Item Entry
    Subfield $g of this MARC tag is used for storing citation information for a journal article, volume, issue, year, start page, end page, all in one string, like: “Vol. 2, no. 2 (Feb. 1976), p. 195-230“. Again I have seen this used in many different ways. In a normalised format this would look something like this, using only the actual values:

    • Journal
      • Volume=2
      • Issue=2
      • Year=1976
      • Month=2
      • Day=
      • Start page=195
      • End page=230

    In a presentation of this normalised data record extra text can be added like “Vol.” or “Volume“, “Issue” or “No.“, brackets, replacing codes by descriptions (Month 2 = Feb.)  etc., according to the format required. So the stored values could be used to generate the text “Vol. 2, no. 2 (Feb. 1976), p. 195-230” on the fly, but also for instance “Volume 2, Issue 2, dated February 1976, pages 195-230“.

    The strange thing with this bibliographic format aimed at exchanging metadata is that it actually makes metadata exchange terribly complicated, especially with these two tags Author and Host Item. I can illustrate this with describing the way this exchange is handled between two digital library tools we use at the Library of the University of Amsterdam, MetaLib and SFX , both from the same vendor, Ex Libris.

    The metasearch tool MetaLib is using the described and preferred mechanism of on the fly conversion of received external metadata from any format to MARC for the purpose of presentation.
    But if we want to use the retrieved record to link to for instance a full text article using the SFX link resolver, the generated MARC data is used as a source and the non-normalised data in the 100 and 773 MARC tags has to be converted to the OpenURL format, which is actually normalised (example in simple OpenUrl 0.1):


    In order to do this all kinds of regular expressions and scripting functions are needed to extract the correct values from the MARC author and citation strings. Wouldn’t it be convenient, if the record in MetaLib would already have been in OpenURL or any other normalised format?

    The point I am trying to make is of course that it does not matter how metadata is stored, as long as it is possible to get the data out of the database in any format appropriate for the occasion. The SRU/SRW protocol is particularly aimed at precisely this: getting data out of a database in the required format, like MARC, Dublin Core, or anything else. An SRU server is a piece of middleware that receives requests, gets the requested data, converts the data and then returns the data in the requested format.

    Currently at the Library of the University of Amsterdam we are migrating our ILS which also involves converting our data from one bibliographic metadata format (PICA+) to another (MARC). This is extremely complicated, especially because of the non-normalised structure of both formats. And I must say that in my opinion PICA+ is even the better one.
    Also all German and Austrian libraries are meant to migrate from the MAB format to MARC, which also seems to be a move away from a superior format.
    All because of the need to adhere to international standards, but with the wrong solution.

    Maybe the projected new standard for resource description and access RDA will be the solution, but that may take a while yet.

  • ReTweet @Reply – Twitter communities

    Posted on April 27th, 2009 Lukas Koster 1 comment


    In my post “Tweeting Libraries” among other things I described my Personal Twitter experience as opposed to Institutional Twitter use. Since then I have discovered some new developments in my own Twitter behaviour and some trends in Twitter at large: individual versus social.

    There have been some discussions on the web about the pros and cons and the benefits and dangers of social networking tools like Twitter, focusing on “noise” (uninteresting trivial announcements) versus “signal” (meaningful content), but also on the risk of web 2.0 being about digital feudalism, and being a possible vehicle for fascism (as argumented by Andrew Keen).

    My kids say: “Twitter is for old people who think they’re cool“. According to them it’s nothing more than : “Just woke up; SEND”, “Having breakfast; SEND”; “Drinking coffee; SEND”; “Writing tweet; SEND”. For them Twitter is only about broadcasting trivialities, narcissistic exhibitionism, “noise”.
    For their own web communications they use chat (MSN/Messenger), SMS (mobile phone text messages), communities (Hyves, the Dutch counterpart of MySpace) and email. Basically I think young kids communicate online only within their groups of friends, with people they know.

    Just to get an overview: a tweet, or Twitter message, can basically be of three different types:

    • just plain messages, announcements
    • replies: reactions to tweets from others, characterised by the “@<twittername>” string
    • retweets: forwarding tweets from others, characterised by the letters “RT

    Although a lot of people use Twitter in the “exhibitionist” way, I don’t do that myself at all. If I look at my Twitter behaviour of the past weeks, I almost only see “retweets” and “replies”.

    Both “replies” and “retweets” obviously were not features of the original Twitter concept, they came into being because Twitter users needed conversation.
    A reply is becoming more and more a replacement for short emails or mobile phone text messages, at least for me. These Twitter replies are not “monologues”, but “dialogues”. If you don’t want everybody to read these, you can use a “Direct message” or “DM“.
    Retweets are used to forward interesting messages to the people who are following you, your “community” so to speak. No monologue, no dialogue, but sharing information with specific groups.
    The “@<twittername>” mechanism is also used to refer to another Twitter user in a tweet. In official Twitter terminology “replies” have been replaced by “mentions“.

    Retweets and replies are the building blocks of Twitter communities. My primary community consists of people and organisations related to libraries. Just a small number of these people I actually know in person. Most of them I have never met. The advantage of Twitter here is obvious: I get to know more people who are active in my professional area, I stay informed and up to date, I can discuss topics. This is all about “signal”. If issues are too big for twitter (more than 140 characters) we can use our blogs.
    But it’s not only retweets and replies that make Twitter communities work. Trivialities (“noise”) are equally important. They make you get to know people and in this way help create relationships built on trust.

    Another compelling example of a very positive social use of Twitter I experienced last week, when there were a number of very interesting Library 2.0 conferences, none of which I could attend in person because of our ILS project:

    All of these conferences were covered on Twitter by attendees using the hashtags #elag09, #csnr09 and #ugul09 . This phenomenon makes it possible for non-participants to follow all events and discussions at these conferences and even join in the discussions. Twitter at its best!

    Twitter is just a tool, a means to communicate in many different ways. It can be used for good and for bad, and of course what is “good” and what is “bad” is up to the individual to decide.

  • Replacing our ILS, business as usual

    Posted on April 24th, 2009 Lukas Koster 2 comments

    © Peter Morville

    As you may have noticed from some of my tweets, the Library of the Unversity of Amsterdam, my place of work, is in the process of replacing its ILS (Integrated Library System). All in all this project, or better these two projects (one selecting a new ILS, the other one implementing it) will have taken 18 months or more from the decision to go ahead until STP (Switch to Production), planned for August 15 this year. My colleague Bert Zeeman blogged about this (in Dutch) recently.

    One thing that has become absolutely clear to me is that replacing an ILS is not just about replacing one information system by another. It is about replacing an entire organisational structure of work processes, with its huge impact on all people involved. And in our case it affects two organisations: besides the Library of the University of Amsterdam also the Media Library of the Hogeschool van Amsterdam. We have been managing library systems for both organisations in a mini consortial structure since a couple of years. So the Media Library is facing a second ILS replacement within two years.

    While the decision was made because of pressing technical reasons, also with an eye on preparing for future library 2.0 developments, it turned out to be of substantial consequence for the organisation.
    This is the first time that I am participating in such a radical library system project. I have done a couple of projects implementing and upgrading metasearching and OpenURL link resolver tools in the last six years, but these are nothing compared to the current project. With these “add-on” tools, that started as a means of extending the library’s primary stream of information, only a relatively limited number of people were involved. But with an ILS you are talking about the core business of a library (still!) and about day to day working life of everybody involved in acquisitions, cataloguing, circulation as well as system administrators and system librarians.

    To make it even more complicated, the University Library is also switching from the old system’s proprietary bibliographic format to MARC21, because that is what the new system is using. Personally I think that the old system’s format is better (just like our German colleagues think about their move from MAB to MARC), but of course the advantages of using an internationally accepted and used standard outweighs this, as always. Maybe food for another blog post later…

    Last but not least, the Library is simultaneously doing a project for the implementation of RFID for self check machines. The initial idea was to implement RFID in the old system and then just migrate everything to the new one. However, for various reasons, recently it was decided to postpone RFID implementation to shortly after our ILS STP. Some initial tests have shown that this probably will work.

    And while all this is going on, all normal work needs to be taken care of too: ” business as usual” .

    Now, looking at workflows: the way that our individual departments have organised their workflows, is partly dictated by the way the old system is designed. The new system obviously dictates workflows too, but in other areas. Although this new system is very flexible and highly configurable, there are still some local requirements that cannot be met by the new system.
    Of course this is NOT the way it should be! Systems should enable us to do what we want and how we want it! Hopefully new developments like Ex Libris’ URM and the very recently announced new OCLS WorldCat Web based ILS will take care of users better.

    Talking about “very flexible and highly configurable”: although a very big advantage, this also makes it much more complicated and time consuming to implement the new system. Fortunately there are a lot of other libraries in The Netherlands and around the world using the new system that are willing to help us in every possible way. And this is highly appreciated!

    Other isues that make this project complicated:

    • unexpected issues, bottlenecks: these keep on coming
    • migration of data from old system: conversion of old to new format
    • implementing links with external systems like student’s and staff database, financial system, national union catalogue

    I think we will make STP on the planned date, but I also think we need to postpone a number of issues until after that. There will still be a lot of work to be done for my department after the project has finished.

    To end with a positive note: the new OPAC wil be much nicer and more flexible than the old one. And in the end that is what we are doing this for: our patrons.

  • Developers meet developers, people meet people

    Posted on December 14th, 2008 Lukas Koster No comments

    Jerusalem map

    Last month I was in the opportunity to participate in the first official Ex Libris “Developers meet developers” meeting in Jerusalem, November 12-13, 2008. The meeting was dedicated to the new Open Platform strategy that Ex Libris has adopted. I already mentioned this development in my post How open are open systems?. Together with one of the other attendees, Mark Dehmlow, of Notre Dame University Library, I wrote a short report on this meeting in the IGeLU newsletter issue 2, 2008, page 21-22.

    The intention of this event was that representatives from Ex Libris customer institutions that use Ex Libris’ Digital Library tools Aleph, SFX, MetaLib, and Primo and are actively involved in developing plug-ins, add-ons and extensions to one or more of these products, and Ex Libris staff involved in development of these tools, had the chance to meet face to face and talk, discuss and exchange ideas from both sides.

    The political, cultural and social circumstances of the location of the event (about which I blogged some personal thoughts here) are such that I can’t resist the temptation of using them as a metaphor, although I am fully aware that the actual situation in Jerusalem is of course much more complicated. I apologise in advance if I unintentionally offend anyone by using the serious real world situation in an inappropriate way.

    So, let’s give it a try: in Jerusalem there are a number of separate areas for different population groups. In general there are the Jewish western part and the Arab eastern part. But there is also the old city right in the middle, with Jewish, Arab, Christian and Armenian quarters. Besides that you can also see separate neighbourhoods within the Jewish part with different Jewish groups. And last but not least, right in the middle of the Christian quarter there is the Church of the Holy Sepulchre, with corners for almost all christian religious groups. Very fascinating and intriguing.

    Although there are no physical borders between these areas, the complicated serious political, social and cultural circumstances prevent most people to visit their neighbours in their own areas. Now here comes the metaphor! In the world of informations systems you normally have a similar situation of “us and them”. Customers and users often think that providers of systems do not take them seriously and give them tools they can’t work with, and the other way around system developers often see end users as nagging bores, never satisfied and complaining about everything.
    Customers and providers inhabit the same space, like Jerusalem, but do not cross the imaginary border to really meet.

    This is why it is so remarkable that the “crossing of the border” between Ex Libris customers and developers actually happened in Jerusalem. Of course I immediately must add that Ex Libris has always favoured open systems for customers to use in their own way, and supports the international user groups, but an actual face-to-face meeting on the level of developers is something different.

    From personal experience I know that it is very easy for situations to get out of hand if there is no real communication and no willingness for mutual understanding. That is why I think that it is absolutely vital that meetings like this can continue to take place. From the customers’ side the user groups IGeLU and ELUNA are fully dedicated to this goal, and I really hope that Ex Libris is also serious about it.

    In this month of Christmas, Chanuka and Eid Al-Adha, let me end with the wish for better understanding on the personal, professional and global level!

  • Antisocial Networking

    Posted on November 2nd, 2008 Lukas Koster 2 comments

    In his post “Twitter me thisOwen Stephens writes about differences in use and audience of Social Networking Sites. (Apparently at Imperial College London they had a similar kind of Web2.0 Learning programme as we had at the Library of the University of Amsterdam.)
    Owen distinguishes audiences on several, intermixed levels (my interpretation): “young” (e.g. MySpace ) vs. “old(er)” (e.g. Facebook ); “business/networking” (e.g. LinkedIn ) vs. “family and friends” (also FaceBook); “professional” (e.g. Ning ).

    And Owen mentions the risk involved here:

    I do find that Facebook raises the issue of how I mix my professional and personal life – whereas on LinkedIn everyone is one there as a ‘professional contact’ (even those people who are also friends), in Facebook I have some professional contacts, and some personal contacts. Although it hasn’t happened yet, there is a clearly a risk that in the future there could be a conflict between how I want to present myself professionally, and how I do personally – I’m not sure I’d want my boss (not singling out my current boss) to be my ‘Friend’ on Facebook.

    I recognise these differences and risks as well. In The Netherlands the most popular social networking site is Hyves, which can be compared to MySpace (according to my interpretation of Owen’s classification), but without the music angle. I have an account there, with only 13 “friends”, but my kids have 100 or more.

    On LinkedIn however, I have 80 connections (a term used to stress that these contacts are to be regarded as serious business relations), of which 99% I have met face-to-face at least once, by the way. Owen says about LinkedIn:

    “I’ve got a LinkedIn account but I don’t tend to use it for ‘social networking’, and more really as a ‘contacts’ list – while some people clearly use LinkedIn to ‘work’ their business contacts, I can’t say that I’ve ever been terribly good at this.”

    I guess I am using LinkedIn the same way as Owen does. Last week I had a discussion with a colleague/friend (!) about the use of these business networking sites like LinkedIn. We concluded that a number of people obviously use LinkedIn to show off: “Look, I have more than 300 connections on my list; mine is bigger than yours“. I must confess that I have thoughts like that myself sometimes: “I hope that this colleague has noticed that I know that famous person“….

    Now these “serious” business networks are starting to offer more social features. LinkedIn has groups, forums and “LinkedIn Applications”: integrating web 2.0 stuff like Amazon reading list, Slideshare, WordPress. In fact, this very blog post will show up on my LinkedIn Profile.

    I guess there is a lot of competition, for instance with Plaxo. Besides “connections”, which can be marked “business”, “friends” or “family”, Plaxo offers the options of “hooking up feeds” from web 2.0 services that you use, like flickr, delicious, twitter, blogs, youtube, lastfm, etc. I find this a very useful feature, because it gives me an integrated overview of all my web2.0 streams, much like SecondBrain does, which has a slightly different “connection” implementation, more like Twitter, with “followers”.
    Plaxo lets you also synchronise connections with LinkedIn, but this is a “Premium service”, meaning it costs money.

    Now, to come back to Owen’s risk assessment: in my Plaxo profile I show my professional blog (this one, that you are reading right now) to “Everyone”, but my twitter, personal blog, flickr, delicious, picasa and lastfm streams only to “Friends” and “Family”, because I think I should not draw unnecessary attention to my twitter “trivia” (as Owen calls it), holiday snapshots and non-professional bookmarks. These streams are publicly available of course, but I do not want to actually push them in the faces of my “serious” connections.

    You might argue that this kind of behaviour is not “social“, but rather “antisocial“: certain groups of contacts are excluded from information that privileged groups do have access to. And this term could also be applied to the “showing off” behaviour that I mentioned above.

    The funny thing is, that the “killer application” that won me over to Plaxo and that I use the most, is not social at all: it’s something that I have been looking for since playing around with web 1.0 “Personal Information Managers”: the option of integrating and synchronising the Plaxo Calendar with my Outlook work calendar and my private Google calendar. For me this is a huge advantage to having to consult several calendars when planning an appointment.
    But I do not share my Plaxo Calendar at all. Would you call this antisocial behaviour too?