Posted on May 24th, 2009 11 comments
Will library buildings and library catalogs survive the web?
Some weeks ago a couple of issues appeared in the twitter/blogosphere (or at least MY twitter/blogoshere) related to the future of the library in this digital era.
- There was the Espresso book machine that prints books on demand on location, which led to questions like: “apart from influencing publishing and book shops, what does this mean for libraries?“.
- There was a Twitter discussion about “will we still need library buildings?“.
- There was another blog post about the future of library catalogs by Edwin Mijnsbergen (in Dutch) that asked the question of the value of library catalogs in relation to web2.0 and the new emerging semantic web.
This made me start thinking about a question that concerns us all: is there a future for the library as we know it?
To begin with, what is a library anyway?
For ages, since the beginning of history, up until some 15 years ago, a library was an institution characterised by:
- a physical collection of printed and handwritten material
- a physical location, a building, to store the collection
- a physical printed or handwritten on site catalog
- on location searching and finding of information sources using the catalog
- on site requesting, delivery, reading, lending and returning of material
- a staff of trained librarians to catalog the collection and assist patrons
The central concept here is of course the collection. That is the “raison d’être” of a library. The purpose of library building, catalog and librarians is to give people access to the collection, and provide them with the information they need.
Clearly, because of the physical nature of the collection and the information transmission process the library needed to be a building with collection and catalog inside it. People had to go there to find and get the publications they needed.
If collections and the transmission of information were completely digital, then the reason for a physical location to go to for finding and getting publications would not exist anymore. Currently one of these conditions has been met fully and the other one partly. The transmission of information can take place in a completely digital way. Most new scientific publications are born digital (e-Journals, e-Books), and a large number of digitisation projects are taking care of making digital copies of existing print material.
Searching for items in a library’s collection is already taking place remotely through OPACs and other online tools almost everywhere. A large part of these collections can be accessed digitally. Only in case a patron wants to read or borrow a printed book or journal, he or she has to go the library building to fetch it.
All this seems to lead to the conclusion that the library may be slowly moving away from a physical presence to a digital one.
But there is something else to be considered here, that reaches beyond the limits of one library. In my view the crucial notion here is again the collection.
In my post Collection 2.0 I argue that in this digital information age a library’s collection is everything a library has access to as opposed to the old concept of everything a library owns. This means in theory that every library could have access to the same digital objects of information available on the web, but also to each other’s print objects through ILL. There will be no physically limited collection only available in one library anymore, just one large global collection.
In this case, there is not only no need for people to go to a specific library for an item in its collection, but also there is no need to search for items using a specific library’s catalog.
Now you may say that people like going to a library building and browse through the stacks. That may still be true for some, but in general, as I argue in my post “Open Stack 2.0“, the new Open Stack is the Web.
In the future there will be collections, but not physical ones (except of course for the existing ones with items that are not allowed to leave the library location). We will see virtual subject collections, determined by classifications and keywords assigned both by professionals and non-professionals.
On a parallel level there will be virtual catalogs, which are views on virtual collections defined by subjects on different levels and in different locations: global, local, subject-oriented, etc. These virtual collections and catalogs will be determined and maintained by a great number of different groups of people and institutions (commercial and non-commercial). One of these groups can still be a library. As Patrick Vanhoucke observed on Twitter (in Dutch): “We have to let go of the idea of the library as a building; the ‘library’ is the network of librarians“. These virtual groups of people may be identical to what is getting known more and more as “tribes“.
Having said all this, of course there will still be occurrences of libraries as buildings and as physical locations for collections. Institutions like the Library of Congress will not just vanish into thin air. Even if all print items have been digitised, print items will still be wanted for a number of reasons: research, art, among others. Libraries can have different functions, like archives, museums, etc. and still be named “libraries” too.
Library buildings can transform into other types of locations: in universities they can become meeting places and study facilities, including free wifi and Starbucks coffee. Public libraries can shift focus to becoming centres of discovery and (educational) gaming. Anything is possible.
It’s obvious that libraries obey the same laws of historical development as any other social institution or phenomenon. The way that information is found and processed is determined, or at least influenced, by the status of technological development. And I am not saying that all development is technology driven! This is not the place for a philosophy on history, economics and society.
Some historical parallels to illustrate the situation that libraries are facing:
- writing: inscribing clay tablets > scratching ink on paper > printing (multiplication, re-usability) > typewriter > computer/printer (digital multiplication and re-usability!) > digital only (computer files, blogs, e-journal, e-books)
- consumption of music: attending live performance on location > listening to radio broadcast > playing purchased recordings (vinyl, cassettes, cd, dvd) > make home recordings > play digital music with mp3/personal audio > listen to digital music online
From these examples it’s perfectly clear that new developments do not automatically make the old ways disappear! Prevailing practices can coexist with “outdated” ways of doing things. Libraries may still have a future.
In the end it comes down to these questions:
- Will libraries cease to exist, simply because they no longer serve the purpose of providing access to information?
- Are libraries engaged in a rear guard fight?
- Will libraries become tourist attractions?
- Will libraries adapt to the changing world and shift focus to serve other, related purposes?
- Are professional librarian skills useful in a digital information world?
I do not know what will happen with libraries. What do you think?
Posted on May 15th, 2009 22 comments
Why use a non-normalised metadata exchange format for suboptimal data storage?
This week I had a nice chat with André Keyzer of Groningen University library and Peter van Boheemen of Wageningen University Library who attended OCLC’s Amsterdam Mashathon 2009. As can be expected from library technology geeks, we got talking about bibliographic metadata formats, very exciting of course. The question came up: what on earth could be the reason for storing bibliographic metadata in exchange formats like MARC?
Exactly my idea! As a matter of fact I think I may have used the same words a couple of times in recent years, probably even at ELAG2008. The thing is: it really does not matter how you store bibliographic metadata in your database, as long as you can present and exchange the data in any format requested, be it MARC or Dublin Core or anything else.
Of course the importance of using internationally accepted standards is beyond doubt, but there clearly exists widespread misunderstanding of the functions of certain standards, like for instance MARC. MARC is NOT a data storage format. In my opinion MARC is not even an exchange format, but merely a presentation format.
With a background and experience in data modeling, database and systems design (among others), I was quite amazed about bibliographic metadata formats when I started working with library systems in libraries, not having a librarian training at all. Of course, MARC (“MAchine Readable Cataloging record“) was invented as a standard in order to facilitate exchange of library catalog records in a digital era.
But I think MARC was invented by old school cataloguers who did not have a clue about data normalisation at all. A MARC record, especially if it corresponds to an official set of cataloging rules like AARC2, is nothing more than a digitised printed catalog card.
In pre-computer times it made perfect sense to have a standardised uniform way of registering bibliographic metadata on a printed card in this way. The catalog card was simultaneously used as a medium for presenting AND storing metadata. This is where the confusion originates from!
But when the Library of Congress says “If a library were to develop a “home-grown” system that did not use MARC records, it would not be taking advantage of an industry-wide standard whose primary purpose is to foster communication of information” it is saying just plain nonsense.
Actually it is better NOT to use something like MARC for other purposes than exchanging, or better, presenting data. To illustrate this I will give two examples of MARC tags that have been annoying me since my first day as a library employee:
- 100 – Main Entry-Personal Name (NR) – subfield $a – Personal name (NR)
- 773 – Host Item Entry (R) – subfield $g – Relationship information (R)
100 – Main Entry-Personal Name
Besides storing an author’s name as a string in each individual bibliographic record instead of using a code, linking to a central authority table (“foreign key” in relational database terms), it is also a mistake to use a person’s name as one complete string in one field. Examples on the Library of Congress MARC website use forms like “Adams, Henry”, “Fowler, T. M.” and “Blackbeard, Author of”. To take only the simple first example, this author could also be registered as “Henry Adams”, “Adams, H.”, “H. Adams”. And don’t say that these forms are not according to the rules! They are out there! There is no way to match these variations as being actually one and the same.
In a normalised relational database, this subfield $a would be stored something like this (simplified!):
- First name=Henry
773 – Host Item Entry
Subfield $g of this MARC tag is used for storing citation information for a journal article, volume, issue, year, start page, end page, all in one string, like: “Vol. 2, no. 2 (Feb. 1976), p. 195-230“. Again I have seen this used in many different ways. In a normalised format this would look something like this, using only the actual values:
- Start page=195
- End page=230
In a presentation of this normalised data record extra text can be added like “Vol.” or “Volume“, “Issue” or “No.“, brackets, replacing codes by descriptions (Month 2 = Feb.) etc., according to the format required. So the stored values could be used to generate the text “Vol. 2, no. 2 (Feb. 1976), p. 195-230” on the fly, but also for instance “Volume 2, Issue 2, dated February 1976, pages 195-230“.
The strange thing with this bibliographic format aimed at exchanging metadata is that it actually makes metadata exchange terribly complicated, especially with these two tags Author and Host Item. I can illustrate this with describing the way this exchange is handled between two digital library tools we use at the Library of the University of Amsterdam, MetaLib and SFX , both from the same vendor, Ex Libris.
The metasearch tool MetaLib is using the described and preferred mechanism of on the fly conversion of received external metadata from any format to MARC for the purpose of presentation.
But if we want to use the retrieved record to link to for instance a full text article using the SFX link resolver, the generated MARC data is used as a source and the non-normalised data in the 100 and 773 MARC tags has to be converted to the OpenURL format, which is actually normalised (example in simple OpenUrl 0.1):
isbn=;issn=0927-3255;date=1976; volume=2;issue=2;spage=195;epage=230; aulast=Adams;aufirst=Henry;auinit=;
In order to do this all kinds of regular expressions and scripting functions are needed to extract the correct values from the MARC author and citation strings. Wouldn’t it be convenient, if the record in MetaLib would already have been in OpenURL or any other normalised format?
The point I am trying to make is of course that it does not matter how metadata is stored, as long as it is possible to get the data out of the database in any format appropriate for the occasion. The SRU/SRW protocol is particularly aimed at precisely this: getting data out of a database in the required format, like MARC, Dublin Core, or anything else. An SRU server is a piece of middleware that receives requests, gets the requested data, converts the data and then returns the data in the requested format.
Currently at the Library of the University of Amsterdam we are migrating our ILS which also involves converting our data from one bibliographic metadata format (PICA+) to another (MARC). This is extremely complicated, especially because of the non-normalised structure of both formats. And I must say that in my opinion PICA+ is even the better one.
Also all German and Austrian libraries are meant to migrate from the MAB format to MARC, which also seems to be a move away from a superior format.
All because of the need to adhere to international standards, but with the wrong solution.
Maybe the projected new standard for resource description and access RDA will be the solution, but that may take a while yet.