Permalink: https://purl.org/cpl/825
Metasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
4 thoughts on “Just in time or just in case?”
Thanks for sharing that presentation. It’s a good conclusion of problems with metasearch. I think, I’ll nail it on my forehead whenever I meet customers using the metasearch products wo support 🙂
Though, what’s somewhat missing is the user’s perspective (or at least the presentation doesn’t point out clearly that perspective). Metasearch is hard to understand and use for end users, I have only anecdotal evidence for that, no own evaluation results. The user experience often starts with hard decisions to be made (which databases to select?). Result list presentation is often unusal (at best), handling of these lists often being complicated (there often is a somewhat mysteriously sorted “single list” of (parts of) merged results, seperate lists for each searched database, somtimes added by automatic reloads of result numbers, lists or rankings and the like…
Of course, we will never have all data for local indexing, even Serials Solutions won’t get it all (although they have already more than half a billion records). And even with all data at your hands, you still may offer horrible user experience…
So I totally agree with your conclusions, but nevertheless am looking forward for the day metasearch finally is obsolete…
Hi Till, thanks for your thoughts. Yes, the end user perspective is a little bit out of the picture here, but implicitly present at the same time.
Selecting databases and interpreting results are the most problematic issues for end users. Although statistics show that the vast majority of users (more than 85% I think) don’t even try to find the most appropriate databases, they just enter a search term and hit “Search”, that way just using the preselected default set of databases.
As someone who teaches lukas’ endusers at the UvA, I was very surprised how much the students seem to like metasearch. During my instruction last week, the metasearchtask generated the least amount of questions (Maybe because it was the first task and they were still fresh.. but nevertheless..)
Btw, I would still like to maintain that my inner librarian doesnt like metasearch (and H&I for that matter), she cannot handle such a lack of control ;-). But as someone who likes a happy customer, I am very interested to see what you and your colleagues come up with next.
Fastidious answer back in return of this question with real arguments and telling the whole thing on the topic of that.