Posted on October 16th, 2009 3 comments
Metasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
Posted on October 6th, 2009 1 comment
What will library staff do 5 years from now?
I attended the IGeLU 2009 annual conference in Helsinki September 6-9. IGeLU is the International Group of Ex Libris Users, an independent organisation that represents Ex Libris customers. Just to state my position clearly I would like to add that I am a member of the IGeLU Steering Committee.
These annual user group meetings typically have three types of sessions: internal organisational sessions (product working groups and steering committee business meetings, elections), Ex Libris sessions (product updates, Q&A, strategic visions), and customer sessions (presentations of local solutions, addons, developments).
Not surprisingly, the main overall theme of this conference was the future of library systems and libraries. The word that characterises the conference best in my mind (besides “next generation“and “metaphor“) is “roadmap“. All Ex Libris products but also all attending libraries are on their way to something new, which strangely enough is still largely uncertain.
Ex Libris presented the latest state of design and development of their URM (Unified Resource Management) project, ‘A New Model for Next-generation Library Services’. In the final URM environment all back end functionality of all current Ex Libris products will be integrated into one big modular system, implemented in a SaaS (“Software as a Service“) architecture. In the Ex Libris vision the front end to this model will be their Primo Indexing and Discovery interface, but all URM modules will have open API’s to enable using them with other tools.
The goal of this roadmap apparently is efficiency in the areas of technical and functional system administration for libraries.
In the mean time development of existing products is geared towards final inclusion in URM. All future upgrades will result in what I would like to call “intermediate” instead of “next generation” products . MetaLib, the metasearch or federated search tool, will be replaced by MetaLib Next Generation, with a re-designed metasearch engine and a Primo front end. The digital collection management tool DigiTool will be merged into its new and bigger nephew Rosetta, the digital preservation system. The database of the OpenUrl resolver SFX will be restructured to accommodate the URM datamodel. The next version of Verde (electronic resource management) will effectively be URM version 1, which will also be usable as an alternative for both ILS’es Voyager and Aleph.
Here we see a kind of “intermediate” roadmap to different “base camps” from where the travelers can try to reach their final destination.
From the perspective of library staff we see another panorama appearing.
In one of the customer presentations Janet Lute of Princeton University Library, one of the three (now four) URM development partners, mentioned a couple of “holy cows” or library tasks they might consider stopping doing while on their way to the new horizon:
- managing prediction patterns for journal issues
- checking in print serials
- maintaining lots of circulation matrices and policies
- collecting fines
- cataloging over 80% of bibliographic records
I would like to add my own holy cow MARC to this list, about which I have written a previous post Who needs MARC?. (Some other developments in this area are self service, approval plans, shared cataloging, digitisation, etc.)
This roadmap is supposed to lead to more efficient work and less pressure for acquisitions, cataloging and circulation staff.
Eldorado or Brave New World?
To summarise: we see a sketchy roadmap leading us via all kinds of optional intermediate stations to an as yet still vague and unclear Eldorado of scholarly information disclosure and discovery.
The majority of public and professional attention is focused on discovery: modern web 2.0 front ends to library collections, and the benefits for the libraries’ end users. But it is probably even more important to look at the other side, disclosure: the library back end, and the consequences of all these developments for library staff, both technically oriented system administrators and professionally oriented librarians.
Future efficient integrated and modular library systems will no doubt eliminate a lot of tasks performed by library staff, but does this mean there will be no more library jobs?
Will the university library of the future be “sparsely staffed, highly decentralized, and have a physical plant consisting of little more than special collections and study areas“, as was stated recently in an article in “Inside Higher Education”? I mentioned similar options in “No future for libraries?“.
Personally I expect that the two far ends of the library jobs spectrum will merge into a single generic job type which we can truly call “system librarian“, as I stated in my post “System librarians 2.0“. But what will these professionals do? Will they catalog? Will they configure systems? Will they serve the public? Will they develop system add-ons?
This largely depends on how the new integrated systems will be designed and implemented, how systems and databases from different vendors and providers will be able to interact, how much libraries/information management organisations will outsource and crowdsource, how much library staff is prepared to rethink existing workflows, how much libraries want to distinguish themselves from other organisations, how much end users are interested in differences between information management organisations; in brief: how much these new platforms will allow us to do ourselves.
We have come up with a realistic image of ourselves for the next couple of decades soon, otherwise our publishers and system vendors will be doing it for us.
Posted on June 19th, 2009 8 commentsLinked Data and bibliographic metadata models
Some time after I wrote “UMR – Unified Metadata Resources“, I came across Chris Keene’s post “Linked data & RDF : draft notes for comment“, “just a list of links and notes” about Linked Data, RDF and the Semantic Web, put together to start collecting information about “a topic that will greatly impact on the Library / Information management world“.
While reading this post and working my way through the links on that page, I started realising that Linked Data is exactly what I tried to describe as One single web page as the single identifier of every book, author or subject. I did mention Semantic Web, URI’s and RDF, but the term “Linked Data” as a separate protocol had escaped me.
The concept of Linked Data was described by Tim Berners Lee, the inventor of the World Wide Web. Whereas the World Wide Web links documents (pages, files, images), which are basically resources about things, (“Information Resources” in Semantic Web terms), Linked Data (or the Semantic Web) links raw data and real life things (“Non-Information Resources”).
There are several definitions of Linked Data on the web, but here is my attempt to give a simple definition of it (loosely based on the definition in Structured Dynamics’ Linked Data FAQ):Linked Data is a methodology for providing relationships between things (data, concepts and documents) anywhere on the web, using URI’s for identifying, RDF for describing and HTTP for publishing these things and relationships, in a way that they can be interpreted and used by humans and software.
I will try to illustrate the different aspects using some examples from the library world. The article is rather long, because of the nature of the subject, then again the individual sections are a bit short. But I do supply a lot of links for further reading.
Data is relationships
The important thing is that “data is relationships“, as Tim Berners Lee says in his recent presentation for TED.
Before going into relationships between things, I have to point out the important distinction between abstract concepts and real life things, which are “manifestations” of the concepts. In Object modeling these are called “classes” (abstract concepts, types of things) and “objects” (real life things, or “instances” of “classes“).
- the class book can have the instances/objects “Cloud Atlas“, “Moby Dick“, etc.
- the class person can have the instances/objects “David Mitchell“, “Herman Melville“, etc.
In the Semantic Web/RDF model the concept of triples is used to describe a relationship between two things: subject – predicate – object, meaning: a thing has a relation to another thing, in the broadest sense:
- a book (subject) is written by (predicate) a person (object)
You can also reverse this relationship:
- a person (subject) is the author of (predicate) a book (object)
The person in question is only an author because of his or her relationship to the book. The same person can also be a mother of three children, an employee of a library, and a speaker at a conference.
Moreover, and this is important: there can be more than one relationship between the same two classes or types of things. A book (subject) can also be about (predicate) a person (object). In this case the person is a “subject” of the book, that can be described by a “keyword”, “subject heading”, or whatever term is used. A special case would be a book, written by someone about himself (an autobiography).
The problem with most legacy systems, and library catalogues as an example of these, is that a record for let’s say a book contains one or more fields for the author (or at best a link to an entry in an authority file or thesaurus), and separately one or more fields for subjects. This way it is not possible to see books written by an author and books about the same author in one view, without using all kinds of workarounds, link resolvers or mash-ups.
Using two different relationships that link to the same thing would provide for an actual view or representation of the real world situation.
Another important option of Linked Data/RDF: a certain thing can have as a property a link to a concept (or “class”) , describing the nature of the thing: “object Cloud Atlas” has type “book“; “object David Mitchell” has type “person“; “object Cloud Atlas” is written by “object David Mitchell“.
And of course, the property/relationship/predicate can also link to a concept describing the nature of the link.
Anywhere on the web
So far so good. But you may argue that this relationship theory is not very new. Absolutely right, but up until now this data-relationship concept has mainly been used with a view to the inside, focused on the area of the specific information system in question, because of the nature and the limitations of the available technology and infrastructure.
The “triple” model is of course exactly the same as the long standing methodology of Entity Relationship Diagrams (ERD), with which relationships between entities (=”classes“) are described. An ERD is typically used to generate a database that contains data in a specific information system. But ERD’s could just as well be used to describe Linked Data on the web.
Information systems, such as library catalogs, have been, and still are, for the greatest part closed containers of data, or “silos” without connections between them, as Tim Berners Lee also mentions in his TED presentation.
Lots of these silo systems are accessible with web interfaces, but this does not mean that items in these closed systems with dedicated web front ends can be linked to items in other databases or web pages. Of course these systems can have API‘s that allow system developers to create scripts to get related information from other systems and incorporate that external information in the search results of the calling system. This is what is being done in web 2.0 with so-called mash-ups.
But in this situation you need developers who know how to make scripts using specific scripting languages for all the different proprietary API’s that are being supported for all the individual systems.
If Linked Data was a global standard and all open and closed systems and websites supported RDF, then all these links would be available automatically to RDF enabled browser and client software, using SPARQL, the RDF Query Language.
- Linked Data/RDF can be regarded as a universal API.
The good thing about Linked Data is, that it is possible to use Linked Data mechanisms to link to legacy data in silo databases. You just need to provide an RDF wrapper for the legacy system, like has been done with the Library of Congress Subject Headings.
Some examples of available tools for exposing legacy data as RDF:
- Triplify – a web applications plugin that converts relational database structures into RDF triples
- D2R Server – a tool for publishing relational databases on the Semantic Web
- wp-RDFa – a wordpress plugin that adds some RDF information about Author and Title to WordPress blog posts
Of course, RDF that is generated like this will very probably only expose objects to link TO, not links to RDF objects external to the system.
Also, Linked Data can be used within legacy systems, for mixing legacy and RDF data, open and closed access data, etc. In this case we have RDF triples that have a subject URI from one data source and an object URI from another data source. In a situation with interlinked systems it would for instance be possible to see that the author of a specific book (data from a library catalog) is also speaking at a specific conference (data from a conference website). Objects linked together on the web using RDF triples are also known as an “RDF graph”. With RDF-aware client software it is possible to navigate through all the links to retrieve additional information about an object.
URI’s (“Uniform Resource Identifiers”) are necessary for uniquely identifying and linking to resources on the web. A URI is basically a string that identifies a thing or resource on the web. All “Information Resources”, or WWW pages, documents, etc. have a URI, which is commonly known as a URL (Uniform Resource Locator).
With Linked Data we are looking at identifying “Non-information Resources” or “real world objects” (people, concepts, things, even imaginary things), not web pages that contain information about these real world objects. But it is a little more complicated than that. In order to honour the requirement that a thing and its relations can be interpreted and used by humans and software, we need at least 3 different representations of one resource (see: How to publish Linked Data on the web):
- Resource identifier URI (identifies the real world object, the concept, as such)
- RDF document URI (a document readable for semantic web applications, containing the real world object’s RDF data and relationships with other objects)
- HTML document URI (a document readable for humans, with information about the real world object)
For instance, there could be a Resource Identifier URI for a book called “Cloud Atlas“. The web resource at that URI can redirect an RDF enabled browser to the RDF document URI, which contains RDF data describing the book and its properties and relationships. A normal HTML web browser would be redirected to the HTML document URI, for instance a web page about the book at the publisher’s website.
There are several methods of redirecting browsers and application to the required representation of the resource. See Cool URIs for the Semantic Web for technical details.
There are also RDF enabled browsers that transform RDF into web pages readable by humans, like the FireFox addon “Tabulator“, or the web based Disco and Marbles browsers, both hosted at the Free University Berlin.
RDF, vocabularies, ontologies
RDF or Resource Description Framework, is, like the name suggests, just a framework. It uses XML (or a simpler non-XML method N3) to describe resources by means of relationships. RDF can be implemented in vocabularies or ontologies, which are sets of RDF classes describing objects and relationships for a given field.
Basically, anybody can create an RDF vocabulary by publishing an RDF document defining the classes and properties of the vocabulary, at a URI on the web. The vocabulary can then be used in a resource by referring to the namespace (the URI) and the classes in that RDF document.
A nice and useful feature of RDF is that more than one vocabularies can be mixed and used in one resource.
Also, a vocabulary itself can reference other vocabularies and thereby inherit well established classes and properties from other RDF documents.
Another very useful feature of RDF is that objects can be linked to similar object resources describing the same real world thing. This way confusion about which object we are talking about, can be avoided.
A couple of existing and well used RDF vocabularies/ontologies:
- RDF – the base RDF vocabulary
- RDFS (for RDF Schema)
- DC (for Dublin Core)
- FOAF (for FOAF- Friend of a Friend) – online identities and social networks
- SKOS (for SKOS – Simple Knowledge Organisation System) – thesauri, classification schemes, subject heading systems and taxonomies
- OWL (for OWL -Ontology Web Language)
(By the way, the links in the first column (to the RDF files themselves) may act as an illustration of the redirection mechanism described before. Some of them may link to either the RDF file with the vocabulary definition itself, or to a page about the vocabulary, depending on the type of browser you use: rdf-aware or not.)
A special case is:
<?xml version=”1.0″ encoding=”UTF-8″ ?>
- RDFa – a sort of microformat without a vocabulary of its own, which relies on other vocabularies for turning XHTML page attributes into RDF
<dc:publisher>Random House Trade Paperbacks</dc:publisher>
<dc:title>Cloud Atlas: A Novel</dc:title>
<rdfs:label>Cloud Atlas: A Novel</rdfs:label>
<rdfs:label>RDF document about the book: Cloud Atlas: A Novel</rdfs:label>
<rdfs:label>Review number 1 about: Cloud Atlas: A Novel</rdfs:label>
<rdfs:label>RDF Book Mashup</rdfs:label>
A partial view on this RDF file with the Marbles browser:
It seems obvious that Linked Data can be very useful in providing a generic infrastructure for linking data, metadata and objects, available in numerous types of data stores, in the online library world. With such a networked online data structure, it would be fairly easy to create all kinds of discovery interfaces for bibliographic data and objects. Moreover, it would also be possible to link to non-bibliographic data that might interest the users of these interfaces.
A brief and incomplete list of some library related Linked Data projects, some of which already mentioned above:
- RDF BookMashup – Integration of Web 2.0 data sources like Amazon, Google or Yahoo into the Semantic Web.
- Library of Congress Authorities – Exposing LoC Autorities and Vocabularies to the web using URI’s
- DBPedia – Exposing structured data from WikiPedia to the web
- LIBRIS – Linked Data interface to Swedish LIBRIS Union catalog
- Scriblio+Wordpress+Triplify – “A social, semantic OPAC Union Catalogue”
And what about MARC, AACR2 and RDA? Is there a role for them in the Linked Data environment? RDA is supposed to be the successor of AACR2 as a content standard that can be used with MARC, but also with other encoding standards like MODS or Dublin Core.
The RDA Entity Relationship Diagram, that incorporates FRBR as well, can of course easily be implemented as an RDF vocabulary, that could be used to create a universal Linked Data library network. It really does not matter what kind of internal data format the connected systems use.
Posted on May 24th, 2009 11 comments
Will library buildings and library catalogs survive the web?
Some weeks ago a couple of issues appeared in the twitter/blogosphere (or at least MY twitter/blogoshere) related to the future of the library in this digital era.
- There was the Espresso book machine that prints books on demand on location, which led to questions like: “apart from influencing publishing and book shops, what does this mean for libraries?“.
- There was a Twitter discussion about “will we still need library buildings?“.
- There was another blog post about the future of library catalogs by Edwin Mijnsbergen (in Dutch) that asked the question of the value of library catalogs in relation to web2.0 and the new emerging semantic web.
This made me start thinking about a question that concerns us all: is there a future for the library as we know it?
To begin with, what is a library anyway?
For ages, since the beginning of history, up until some 15 years ago, a library was an institution characterised by:
- a physical collection of printed and handwritten material
- a physical location, a building, to store the collection
- a physical printed or handwritten on site catalog
- on location searching and finding of information sources using the catalog
- on site requesting, delivery, reading, lending and returning of material
- a staff of trained librarians to catalog the collection and assist patrons
The central concept here is of course the collection. That is the “raison d’être” of a library. The purpose of library building, catalog and librarians is to give people access to the collection, and provide them with the information they need.
Clearly, because of the physical nature of the collection and the information transmission process the library needed to be a building with collection and catalog inside it. People had to go there to find and get the publications they needed.
If collections and the transmission of information were completely digital, then the reason for a physical location to go to for finding and getting publications would not exist anymore. Currently one of these conditions has been met fully and the other one partly. The transmission of information can take place in a completely digital way. Most new scientific publications are born digital (e-Journals, e-Books), and a large number of digitisation projects are taking care of making digital copies of existing print material.
Searching for items in a library’s collection is already taking place remotely through OPACs and other online tools almost everywhere. A large part of these collections can be accessed digitally. Only in case a patron wants to read or borrow a printed book or journal, he or she has to go the library building to fetch it.
All this seems to lead to the conclusion that the library may be slowly moving away from a physical presence to a digital one.
But there is something else to be considered here, that reaches beyond the limits of one library. In my view the crucial notion here is again the collection.
In my post Collection 2.0 I argue that in this digital information age a library’s collection is everything a library has access to as opposed to the old concept of everything a library owns. This means in theory that every library could have access to the same digital objects of information available on the web, but also to each other’s print objects through ILL. There will be no physically limited collection only available in one library anymore, just one large global collection.
In this case, there is not only no need for people to go to a specific library for an item in its collection, but also there is no need to search for items using a specific library’s catalog.
Now you may say that people like going to a library building and browse through the stacks. That may still be true for some, but in general, as I argue in my post “Open Stack 2.0“, the new Open Stack is the Web.
In the future there will be collections, but not physical ones (except of course for the existing ones with items that are not allowed to leave the library location). We will see virtual subject collections, determined by classifications and keywords assigned both by professionals and non-professionals.
On a parallel level there will be virtual catalogs, which are views on virtual collections defined by subjects on different levels and in different locations: global, local, subject-oriented, etc. These virtual collections and catalogs will be determined and maintained by a great number of different groups of people and institutions (commercial and non-commercial). One of these groups can still be a library. As Patrick Vanhoucke observed on Twitter (in Dutch): “We have to let go of the idea of the library as a building; the ‘library’ is the network of librarians“. These virtual groups of people may be identical to what is getting known more and more as “tribes“.
Having said all this, of course there will still be occurrences of libraries as buildings and as physical locations for collections. Institutions like the Library of Congress will not just vanish into thin air. Even if all print items have been digitised, print items will still be wanted for a number of reasons: research, art, among others. Libraries can have different functions, like archives, museums, etc. and still be named “libraries” too.
Library buildings can transform into other types of locations: in universities they can become meeting places and study facilities, including free wifi and Starbucks coffee. Public libraries can shift focus to becoming centres of discovery and (educational) gaming. Anything is possible.
It’s obvious that libraries obey the same laws of historical development as any other social institution or phenomenon. The way that information is found and processed is determined, or at least influenced, by the status of technological development. And I am not saying that all development is technology driven! This is not the place for a philosophy on history, economics and society.
Some historical parallels to illustrate the situation that libraries are facing:
- writing: inscribing clay tablets > scratching ink on paper > printing (multiplication, re-usability) > typewriter > computer/printer (digital multiplication and re-usability!) > digital only (computer files, blogs, e-journal, e-books)
- consumption of music: attending live performance on location > listening to radio broadcast > playing purchased recordings (vinyl, cassettes, cd, dvd) > make home recordings > play digital music with mp3/personal audio > listen to digital music online
From these examples it’s perfectly clear that new developments do not automatically make the old ways disappear! Prevailing practices can coexist with “outdated” ways of doing things. Libraries may still have a future.
In the end it comes down to these questions:
- Will libraries cease to exist, simply because they no longer serve the purpose of providing access to information?
- Are libraries engaged in a rear guard fight?
- Will libraries become tourist attractions?
- Will libraries adapt to the changing world and shift focus to serve other, related purposes?
- Are professional librarian skills useful in a digital information world?
I do not know what will happen with libraries. What do you think?
Posted on February 15th, 2009 No comments
Henk Ellerman of Groningen University Library writes about the “Collection in the digital age” reacting to Mary Frances Casserly’s article “Developing a Concept of Collection for the Digital Age“. I haven’t read this 2002 article yet, but Henk Ellerman goes into the problem of finding a metaphor describing collections that for a large part consists of resources available on the internet.
“…the collection (the one deemed relevant for… well whatever) is a subset that needs to be picked from the total set of available online resources.”
“I find it quite remarkable that the new collection is seen as the result of a process of picking elements, a process similar to finding shells on a beach.”
“What if we expand the notion of a collection in such a way that the sea becomes part of it?”
“The main issue with any sensible collection is quality control. We don’t want ugly things in our collections.”
“Then a collection is not a simple store of documents anymore, but a rather complex system of interrelated documents, controlled by a selected group of people.”
“Librarians ‘just’ need to make the system searchable.”
I have a couple of thoughts about collections myself that I would like to add to these.
Originally, a collection is the total number of physical objects of a specific type that are in the possession of a person, or an organisation. Merriam-Webster says: “an accumulation of objects gathered for study, comparison, or exhibition or as a hobby“. People can collect Barbie dolls or miniature cars as a hobby, or rare books or monkey skulls for scientific reasons.
(By the way, individual collectors of rare books are often described in movies as rich, old, excentric people with a small but very valuable collection of very old books about topics such as satanism, who end up being killed in a horrible way, and having there collections destroyed by fire, like I saw some time ago in Polanski’s “The ninth gate”.)
When organisations have collections then it is almost always for study or exhibition, but also for practical reasons. We are talking mainly about museums and libraries. In the case of libraries there is a rough distinction between public libraries and libraries belonging to scientific and/or educational institutions. Let’s focus on educational libraries, or “university libraries” to make the picture a bit simple.
University libraries have collected written and/or printed texts (books, journals, also containing images, maps, diagrams, etc.) in order to provide their staff and students with material to be able to teach and study. A library’s collection then describes all objects in the possession of the library. In the digital age, electronic journals and databases have been added to these collections, but in most cases this concerns only resources the library owns or for which the library pays money to gain access to. The collection then becomes the totality of objects (physical or digital) that the library owns or is granted access to by means of a contract. Freely available resources are explicitly not counted here.
Now, here we have to make an important distinction between a library’s total collection (“the collection”), meaning “all items the library owns or has access to”), and a collection on a specific topic or for a specific subject (“subject collections”), meaning “all items that have been selected by professionals to be part of the material that is necessary for studying a specific topic”, for instance “the University of Amsterdam library’s Chess collection”. In the past, people would have to go to a specific library to consult a specific collection on a specific subject.
“The collection” is merely the sum of all the library’s “subject collections”, nothing more.
Before we go to the collection in the digital age, an interesting intermediate question is: what is the position of interlibrary loan in the concept of collection? Are books from other libraries that are available to a specific library’s patrons to be considered as part of that second library’s collection? In the strict sense of the collection concept (“all items the library owns”), the answer is “no”. But if we expand the notion of collection to mean: “everything a library has access to”, then the answer clearly would have to be “yes”.
Now, in the digital age, the limitation that a collection’s objects should be available physically in a specific location, disappears. This means that anything can be part of “the collection” of a specific library, also objects or texts that have not been judged as scientific before, like blog posts. This is the “sea” that Henk Ellerman is talking about. A subject collection is also not limited by physical borders anymore. Subject collections can contain material, physical and digital, from anywhere. In this case, there is no reason that a subject collection should be a specific library’s subject collection, obviously. Key is “quality control”, or as Henk Ellerman puts it: “We don’t want ugly things in our collections“. Subject collections should be universal, global, virtual collections of physical and digital objects, “controlled by a selected group of people“.
Now, the most important question: who decides who will be part of these selected groups of people? The answer to this question is still to be found. I guess we will see several types of “expert groups” emerge: coalitions between university libraries nationally or globally, but also between not-for-profit and commercial organisations, and of course also between individuals cooperating informally, like in the blogosphere, or in wikipedia .
The collections that will be controlled by these coalitions will not have fixed boundaries, but will have more “professional” cores with several “less professional” spheres around it or intersecting with other collections.
It is time we start building.
Posted on October 17th, 2008 2 comments
It strikes me that training for and documentation about our new Aleph ILS are aimed at three types of staff: system administrators, system librarians and staff (expert) users. Basically system administrators are supposed to take care of “technical stuff” like installing, upgrading, monitoring, backups, general system configuration etc., while staff users are dealing with the “real stuff”, like cataloging, acquisition, circulation, etc. System librarians appear to be a kind of hybrid species, both technicians and librarians: information specialists with UNIX and vi experience.
At the Library of the University of Amsterdam we do not have these three staff types, we only have what we call system administrators and staff users. We as system administrators do both system administrator and system librarian tasks as defined in the Aleph documentation. Only hardware, operating system, network, server monitoring and system backups are taken care of by the University’s central ICT department.
There is no such job title as “system librarian”, in fact I would not even know how to translate this term into Dutch. However, we do have terms for three different types of tasks: technical system administration, application administration and functional administration, which may be equivalent to the above mentioned staff types, although the terms are used in different ways and boundaries between them are unclear. In The Netherlands we even have system administrators, application administrators and functional administrators, but these are all general terms, not limited to the library world.
Anyway, the need for three types of library system administration tasks and staff is typically related to the legacy systems of Library 1.0.
Library 0.0 (the catalog card era) had only one type: the expert staff user, also known as “librarian“.
Library 2.0 (also known as “next generation” library systems) will probably also have only one type of staff user that is needed in the libraries themselves: and I guess we will call these library staff users “system librarians“. These future system librarians will have knowledge of and experience in library and information issues, and will take care of configuration of the integrated library information systems at their disposal through sophisticated, intuitive and user friendly web admin interfaces.
The systems themselves will be hosted and monitored on remote servers, according to the SaaS model (Software as a Service), either by system vendors or by user consortia or in cooperation between both. Technical system administration will no longer be necessary at the local libraries.
Cataloging, tagging, indexing etc. will not be necessary at the local library level either, because metadata will be provided by publishers, or dynamically generated by harvesting and indexing systems, and enriched by our end users/clients via tagging. These metadata stores will also be hosted and administered on remote servers, either by publishers or again by cooperative user organisations.
Of course this will have a lot of consequences for the current organisation and staffing of our libraries, but there will be plenty of time to adapt.
System librarians of the world: unite!
Posted on October 5th, 2008 No comments
End of August I attended the Technological Developments: Threats and Opportunities for Libraries module of TICER – Digital Libraries à la Carte 2008 at the University of Tilburg, The Netherlands.
One of the speakers was Marshall Breeding. His presentation “Library Automation Challenges for the next generation” consisted of three topics, one of which was “Moving toward new generation of library automation”.
He discussed “rethinking the ILS”. The old I(ntegrated) L(ibrary) S(ystem) was about integration of acquisition, serials, cataloguing, circulation, OPAC and reporting of print material. Now we are moving towards a completely elecronic information universe, so new means of integration (and also dis-integration!) are necessary.
Developments until now have been targeted at the front ends: new integrated web 2.0 user interfaces that can also be used in a “dis-integrated” way (by means of API’s that allow embedding portions of the user interface in other environments), such as Primo, Encore, WorldCat Local, AquaBrowser, VuFind, eXtensible Catalog, etc.
Keyword here is “decoupling” of the front end from the back end. But with these products that is not really the case: there is always a harvesting, indexing and enrichting component integrated in them, that moves at least part of the content and also processing to this front end environment.
A new direction here is what Marshall Breeding calls “Comprehensive Resource Management”: the integration of all types of administration (acquisition, cataloging, OPAC, metasearching, linking, etc.) of all types of library resources (print and electronic, text and objects).
One and a half year ago (February 2007) I wrote an article “My Ideal Knowledge Base” about this in “SMUG 4 EU – Newsletter for MetaLib and SFX users” Issue 4 (page 14), targeted at Ex Libris tools Aleph, Metalib, SFX, DigiTool. I ended this vision of an ideal situation with: “Is this ideal image only a dream, or will it come true some day?“.
According to Marshall Breeding it will take 2-3 years more to see library automation systems that follow this approach and 5-7 years for wider adoption. He also said that traditional ILS vendors were working on this, but that no public announcements had been made yet.
Exactly two weeks later, at IGeLU2008 in Madrid, Ex Libris announced and presented their plans for URM (Unified Resources Management) and URD2 (Unified Resource Discovery and Delivery, meaning Primo). Eventually all of their existing products will be integrated in this new next generation environment. The first release will focus on ERM (Electronic Resource Management).
Short term plans for existing tools are focused on preparing them for the new URM/URD2 environment. For instance SFX 4.0 will have a re-designed database ready for integration with URM 1.0.
MetaLib will see its final official version with minor release 4.3 spring 2009. After that a “next generation metasearch tool” will be developed with a completely re-designed back end and metasearch engine, and Primo as front end. Existing customers will be able to upgrade to this NextGen MetaSearch without paying a license fee for Primo (remote search option only).
Interesting times ahead….