Posted on October 16th, 2009 4 comments
Metasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
Posted on October 6th, 2009 1 comment
What will library staff do 5 years from now?
I attended the IGeLU 2009 annual conference in Helsinki September 6-9. IGeLU is the International Group of Ex Libris Users, an independent organisation that represents Ex Libris customers. Just to state my position clearly I would like to add that I am a member of the IGeLU Steering Committee.
These annual user group meetings typically have three types of sessions: internal organisational sessions (product working groups and steering committee business meetings, elections), Ex Libris sessions (product updates, Q&A, strategic visions), and customer sessions (presentations of local solutions, addons, developments).
Not surprisingly, the main overall theme of this conference was the future of library systems and libraries. The word that characterises the conference best in my mind (besides “next generation“and “metaphor“) is “roadmap“. All Ex Libris products but also all attending libraries are on their way to something new, which strangely enough is still largely uncertain.
Ex Libris presented the latest state of design and development of their URM (Unified Resource Management) project, ‘A New Model for Next-generation Library Services’. In the final URM environment all back end functionality of all current Ex Libris products will be integrated into one big modular system, implemented in a SaaS (“Software as a Service“) architecture. In the Ex Libris vision the front end to this model will be their Primo Indexing and Discovery interface, but all URM modules will have open API’s to enable using them with other tools.
The goal of this roadmap apparently is efficiency in the areas of technical and functional system administration for libraries.
In the mean time development of existing products is geared towards final inclusion in URM. All future upgrades will result in what I would like to call “intermediate” instead of “next generation” products . MetaLib, the metasearch or federated search tool, will be replaced by MetaLib Next Generation, with a re-designed metasearch engine and a Primo front end. The digital collection management tool DigiTool will be merged into its new and bigger nephew Rosetta, the digital preservation system. The database of the OpenUrl resolver SFX will be restructured to accommodate the URM datamodel. The next version of Verde (electronic resource management) will effectively be URM version 1, which will also be usable as an alternative for both ILS’es Voyager and Aleph.
Here we see a kind of “intermediate” roadmap to different “base camps” from where the travelers can try to reach their final destination.
From the perspective of library staff we see another panorama appearing.
In one of the customer presentations Janet Lute of Princeton University Library, one of the three (now four) URM development partners, mentioned a couple of “holy cows” or library tasks they might consider stopping doing while on their way to the new horizon:
- managing prediction patterns for journal issues
- checking in print serials
- maintaining lots of circulation matrices and policies
- collecting fines
- cataloging over 80% of bibliographic records
I would like to add my own holy cow MARC to this list, about which I have written a previous post Who needs MARC?. (Some other developments in this area are self service, approval plans, shared cataloging, digitisation, etc.)
This roadmap is supposed to lead to more efficient work and less pressure for acquisitions, cataloging and circulation staff.
Eldorado or Brave New World?
To summarise: we see a sketchy roadmap leading us via all kinds of optional intermediate stations to an as yet still vague and unclear Eldorado of scholarly information disclosure and discovery.
The majority of public and professional attention is focused on discovery: modern web 2.0 front ends to library collections, and the benefits for the libraries’ end users. But it is probably even more important to look at the other side, disclosure: the library back end, and the consequences of all these developments for library staff, both technically oriented system administrators and professionally oriented librarians.
Future efficient integrated and modular library systems will no doubt eliminate a lot of tasks performed by library staff, but does this mean there will be no more library jobs?
Will the university library of the future be “sparsely staffed, highly decentralized, and have a physical plant consisting of little more than special collections and study areas“, as was stated recently in an article in “Inside Higher Education”? I mentioned similar options in “No future for libraries?“.
Personally I expect that the two far ends of the library jobs spectrum will merge into a single generic job type which we can truly call “system librarian“, as I stated in my post “System librarians 2.0“. But what will these professionals do? Will they catalog? Will they configure systems? Will they serve the public? Will they develop system add-ons?
This largely depends on how the new integrated systems will be designed and implemented, how systems and databases from different vendors and providers will be able to interact, how much libraries/information management organisations will outsource and crowdsource, how much library staff is prepared to rethink existing workflows, how much libraries want to distinguish themselves from other organisations, how much end users are interested in differences between information management organisations; in brief: how much these new platforms will allow us to do ourselves.
We have come up with a realistic image of ourselves for the next couple of decades soon, otherwise our publishers and system vendors will be doing it for us.