CommonPlace.Net
Library2.0 and beyond-
Mobile library services
Posted on March 4th, 2010 2 commentsLocation aware services in a digital library world
This is the third post in a series of three
[1. Mainframe to mobile - 2. Mobile app or mobile web? - 3. Mobile library services]
While library systems technology and mobile apps architecture make up the technical and functional infrastructure of mobile web access, mobile library services are what it’s all about. What type of mobile services should libraries offer to their customers?
As stated before, the two main features that distinguish mobile, handheld devices from other devices are:
- web access any time anywhere
- location awareness
It seems obvious that libraries should take these two conditions into account when providing mobile services, not in the least the first one. I don’t think that mobile devices will completely replace other devices like pc’s and netbooks, like Google seems to think, but they will definitely be an important tool for lots of people, simply because they always carry a mobile phone with them. So in order to offer something extra, mobile applications should be focused on the situational circumstance of potential access to information any time anywhere, and make use of the location awareness of the device as much a possible. But does this also apply to services for library customers? That partly depends on the type of library (public, academic, special) and the physical and geographical structure of the library (one central location, branch locations).
As a starting point we can say that mobile library services should cover the total range of online library services already offered through traditional web interfaces. However, mobile users may not want to use certain library services on their mobile devices. For instance, from an analysis of usage statistics of EBSCO Mobile at the Library of Texas A&M University, generously provided by Bennett Ponsford, it appears that although the number of searches in EBSCO mobile is increasing, only 1% of mobile searches leads to a fulltext download, against 77% of regular EBSCO searches. These findings suggest that library customers, at least academic ones, are willing to search for books and articles on their mobile devices, but will postpone actually using them until they are in a more convenient environment. Apparently small screens and/or mobile PDF readers are not very reader friendly in academic settings. This may be different for public library customers and e-books.
So, libraries should concentrate on offering those mobile services that are wanted and will actually be used. In the beginning this may involve analysis of usage statistics and customer feedback to be able to determine the perfect mobile services suite for your library. Libraries should be prepared for “perpetual beta” and “agile development”.
There are two main areas of information in which libraries can offer mobile services:
- practical information
- bibliographical information
This is no different from other library information channels, like normal websites and printed guides and catalogues.
Practical information may consist of contact address, email and telephone information, opening hours, staff information, rules and regulations of any kind, etc. In most cases this is information that does not change very often, so static information pages will be sufficient. However, especially with mobile devices who’s owners are on the move, providing dynamic up to date information will give an advantage. For instance: today’s and tomorrow’s opening hours, number of currently available public workstations per location, etc.
The information provided will be even more precisely aimed at the user’s personal situation, if the “location awareness” feature is added to the “any time anywhere” feature, and up to date static and dynamic information for the locations in the immediate vicinity of the customer is shown first, using the device’s automatic geolocation properties. And all this gets better still if the library’s own information is mashed up with available online tools, like showing a location on Google Maps when selecting an address, and with the device’s tools, like making a phone call when clicking on a phone number.
Bibliographical information should be handled somewhat differently. Searching library catalogues or online databases is in essence not location dependent. Online digital bibliographical metadata is available “in the cloud” any time anywhere. It’s not the discovery but the delivery that makes the difference. We have already seen that mobile academic library customers do not download fulltext articles to their mobile devices. But mobile customers will definitely be interested in the possibility of requesting a print item to be delivered to them in the nearest location. WorldCat Mobile, like “normal” WorldCat, for instance offers the option to select a library manually from a list in order to find the nearest location to obtain an item from. It would of course be nice if the delivery location would be automatically determined by the mobile request service, using the device’s location awareness and the current opening hours of the library branches.
The funny thing here is that we have the paradoxical situation of state-of-the-art technology in a world of global online digital information being used to obtain “old fashioned” physical carriers of information (books) from the nearest physical location.
Augmented reality, as a link between the physical and virtual world, may be a valuable extension of mobile services. A frequently mentioned example is scanning a book cover or a barcode with the camera of a mobile phone and locating the item on Amazon. It would be helpful if your phone could automatically find and request the item in the nearest library branch. Personally I am not convinced that this is very valuable. Typing in ISBN or book title will do the job just as fast. Moreover, bookshop staff may not appreciate this behaviour.
A more common use of augmented reality would be to point the camera of your mobile device to a library building, after which a variety of information about the building is shown. The best known augmented reality app at the moment is Layar. This tool allows you to add a number of “layers”, with which you can for instance find the nearest ATM’s or museums, or Wikipedia information about physical objects or locations around you.

Layar - LibraryThing Local
There is also a LibraryThing Local layer for Layar, with which you can find
information about all libraries, bookshops and book related events in the neighbourhood. It may even be possible to find a specific book in an open stack using this technology.
All these extended mobile applications suggest that users of apps may not just be a specific group of people (like library customers), but that mobile users will be interested in all kinds of useful information about their current location. Library information may be only a part of that. Maybe mobile apps should be targeted at a more general audience and include related information from other sources, making use of the linked data concept.
A search in a library catalog in this case may result in a list of books with links to related objects in a museum nearby or a historic location related to the subject of the book. Alternatively, an item in a museum website might have links to related literature in catalogs of nearby libraries. Anything is possible.
The question that remains is: should libraries take care of providing these generic location based services, or will others do that?
-
Mobile app or mobile web?
Posted on February 21st, 2010 11 commentsTechnology, users and business models
This is the second post in a series of three
[1. Mainframe to mobile - 2. Mobile app or mobile web? - 3. Mobile library services]
Mobile access to information on the internet is the latest step in the development of information systems technology, as described in the previous post in this series. The two main features that distinguish mobile devices from other devices are:
- Access to the web literally any time, anywhere
- Location awareness using GPS or the mobile network
Let’s focus on web access first. There are two main ways in which information providers can provide access to their data: by a mobile web browser or by apps.
The easiest way to provide mobile access is: do nothing. Users of mobile internet devices can simply visit all existing websites with their mobile browser. However, in doing so they will experience a number of problems: performance is slow, pages are too large, navigation is difficult, certain parts of websites don’t work. These problems are caused by the very physical characteristics of mobile technology that make mobile internet access possible: the small size of devices and displays, the wireless network, the limited features of dedicated mobile operating systems and browsers.
Fortunately, technological development is an interactive, reciprocal, cyclic process. Technology continuously needs to find solutions to problems that were caused by new uses of existing technology.Many organisations have solved this problem by creating separate “dumbed down” mobile versions of their websites, containing mainly text only pages and textual links to their most important services and information. In the case of libraries for instance “locations and addresses“, “opening hours“, etc. See this list of examples (with thanks to Aaron Tay). Another example is LibraryThing Mobile, which also has a catalog search option. In these cases you have to manually point your browser to the dedicated mobile URL, unless the webserver is configured to automatically recognise mobile browsers and redirect them to the mobile site.
Of course this not the optimal solution for two reasons:
- On the front end: as an information provider you are complete ignoring all graphical, dynamic, interactive and web 2.0 functionality on the end user side. This means actually going back to the early days of the world wide web of static text pages.
- On the back end: duplicating system and content administration. In most cases it will come down to manually creating and editing HTML pages, because most website content management systems may not offer manual or automatic editing of pages for mobile access. Some systems offer automatic recognition of mobile browsers and display content in the appropriate format, like the WordPress plugin “WordPress Mobile Edition” that automatically shows a list of posts if mobile browsers are detected. This is what happens on this blog.
Because of this situation we are witnessing a re-enactment of the client-server alternative to static HTML that I described previously: mobile apps! “Apps” is short for “applications“, apparently everything needs to be short in the mobile online web 2.0 age. Apps are installed on mobile devices, they run locally making use of the hardware, operating system and user friendly interface of the device, and they only connect to the internet for retrieving data from a database system in the cloud (on a remote server).
A disadvantage of this solution obviously is that you have to multiply development and maintenance in order to support all mobile platforms that your customers are using, or just support the most used platform (iPhone) and ignore the rest of your end users. Alternatively you can support one mobile platform with an app, and the rest with a mobile web site. Organisations have the choice of developing apps themselves from scratch, or using one of the commercial parties that offer library apps, such as Boopsie, Blackboard or the recently announced LibraryThing Anywhere, that is meant to offer both mobile web and apps for iPhone, Blackberry and Android.Some examples:
- TU Delft mobile app for iPhone (powered by Blackboard). University wide, including library. Haven’t been able to test this because I have an Android phone. An Android version will be developed. For other devices they offer a mobile website.
- Duke University mobile app for iPhone. University wide, including library. For other devices they offer a mobile website.
- Santa Clara County Library mobile apps for iPhone and Android (powered by Boopsie).
- WorldCat Mobile (powered by Boopsie).
An alternative solution to the client-server and “dumbed down” models would be to use the new HTML5 and CSS3 options to create websites that can easily be handled by all PC and mobile webbrowsers alike. HTML 5 has geolocation options, and browsers are made location aware this way too. The iWebKit Framework is a free and easy package to create web apps compatible for all mobile platforms. See this demo on PC, iPhone, Android, etc.
Some say that HTML5/CSS3 will make apps disappear, but I suspect performance may still be a problem, due to slow connections. But it’s not only a technology issue. It’s also a matter of business models, as Owen Stephens and Till Kinstler pointed out.Apps can be distributed for free by organisations that want to draw traffic to their own data, ignoring the open web. This method fits their clasic business model, as Till remarked, mentioning the newspaper business as an example.
But there is also another side to this: apps can be created by anybody, making use of APIs to online systems and databases, and be shared with others for free or for a small fee, as is the case with the iPhone Apps Store, the Android Market, the Nokia Ovi Store, or the newly announced Wholesale Applications Community (WAC). This model will never be possible with web based apps (like HTML5), because nobody has access to a system’s web server other than the system administrators. It is also much too complicated for developers and consumers of apps to host web apps on a server that mobile device users can connect too.
And there is more: independent developers are more likely to look beyond the boundaries of the classic model of giving access to your own data only. Third party apps have the opportunity to connect data from a number of data sources in the cloud in order to satisfy mobile user needs better. To take the newspaper business example, I mentioned this in my post “Mobile reading“: general news apps vs dedicated newspaper apps. The rise of the open linked data movement will only boost the development and use of the mobile client server model.In my view there will be a hybrid situation: HTML5/CSS3 based web apps and local mobile apps will coexist, depending on developer, audience, and objectives.
What services library mobile apps should offer, including location awareness and linking data, is the topic of another post.
-
Mainframe to mobile
Posted on February 16th, 2010 9 commentsThe connection between information technology and library information systems
This is the first post in a series of three
[1. Mainframe to mobile - 2. Mobile app or mobile web? - 3. Mobile library services]
The functions, services and audience of library information systems, as is the case with all information systems, have always been dependent on and determined by the existing level of information technology. Mobile devices are the latest step in this development.
In the beginning there was a computer, a mainframe. The only way to communicate with it was to feed it punchcards with holes that represented characters.
If you made a typo (puncho?), you were not informed until a day later when you collected the printout, and you could start again. System and data files could be stored externally on large tape reels or small tape cassettes, identical to music tapes. Tapes were also used for sharing and copying data between systems by means of physical transportation.
Suddenly there was a human operable terminal, consisting of a monitor and keyboard, connected to the central computer. Now you could type in your code and save it as a file on the remote server (no local processing or storage at all). If you were lucky you had a full screen editor, if not there was the line editor. No graphics. Output and errors were shown on screen almost immediately, depending on the capacity of the CPU (central processing unit) and the number of other batch jobs in the queue. The computer was a multi-user time sharing device, a bit like the “cloud”, but every computer was a little cloud of its own.
There was no email. There were no end users other than systems administrators, programmers and some staff. Communication with customers was carried out by sending them printouts on paper by snail mail.I guess this was the first time that some libraries, probably mainly in academic and scientific institutions, started creating digital catalogs, for staff use only of course.
Then came the PC (Personal Computer). Terminal and keyboard were now connected to the computer (or system unit) on your desk. You had the thing entirely to yourself! Input and output consisted of lines of text only, one colour (green or white on black), and still no graphics. Files could be stored on floppy disks, 5¼-inch magnetic things that you could twist and bend, but if you did that you lost your data. There was no internal storage. File sharing was accomplished by moving the floppy from one PC to another and/or copy files from one floppy to another (on the same floppy drive).
Later we got smaller disks, 3½-inch, in protective cases. The PC was mainly used for early word processing (WordStar, WordPerfect) and games. Finally there was a hard disk (as opposed to “floppy” disk) inside the PC system unit, which held the operating system (mainly MS-DOS), and on which you could store your files, which became larger. Time for stand-alone database applications (dBase).
Then there was Windows, a mouse, and graphics. And of course the Internet! You could connect your PC to the Internet with a modem that occupied your telephone line and made phone calls impossible during your online session. At first there was Gopher, a kind of text based web.
Then came the World Wide Web (web 0.0), consisting of static web pages with links to other static web pages that you could read on your PC. Not suitable for interactive systems. Libraries could publish addresses and opening hours.
But fortunately we got client-server architecture, combining the best of both worlds. Powerful servers were good at processing, storing and sharing data. PC’s were good at presenting and collecting data in a “user friendly” graphical user interface (GUI), making use of local programming and scripting languages. So you had to install an application on the local PC which then connected to the remote server database engine. The only bad thing was that the application was tied to the specific PC, with local Windows configuration settings. And it was not possible to move the thing around.Now we had multi-user digital catalogs with a shared central database and remote access points with the client application installed, available to staff and customers.
Luckily dynamic creation of HTML pages came along, so we were able to move the client part of client-server applications to the web as well. With web applications we were able to use the same applications anywhere on any computer linked to the world wide web. You only needed a browser to display the server side pages on the local PC.
Now everybody could browse through the library catalog any time, anywhere (where there was a computer with an internet connection and a web browser). The library OPAC (Online Public Access Catalog) was born.
The only disadvantage was that every page change had to be generated by the server again, so performance was not optimal.
But that changed with browser based scripting technology like JavaScript, AJAX, Flash, etc. Application bits are sent to the local browser on the PC at runtime, to be executed there. So actually this is client server “on the fly”, without the need to install a specific application locally.In the meantime the portable PC appeared, system unit, monitor and keyboard all in one. At first you needed some physical power to move the thing around, but later we got laptops, notebooks, netbooks, getting smaller, lighter and more powerful all the time. And wifi of course, no need to plug the device in to the physical network anymore. And USB-sticks.
Access to OPAC and online databases became available anytime, anywhere (where you carried your computer).
The latest development of course is the rise of mobile phones with wireless web access, or rather mobile web devices which can also be used for making phone calls. Mobile devices are small and light enough to carry with you in your pocket all the time. It’s a tiny PC.
Finally you can access library related information literally any time, anywhere, even in your bedroom and bathroom.
It’s getting boring, but yes, there is a drawback. Web applications are not really accommodated for use in mobile browsers: pages are too large, browser technology is not really compatible, connections are too slow.
Available options are:
- creating a special “dumbed down” version of a website for use on mobile devices only: smaller text based pages with links
- creating a new HTML5/CSS3 website, targeted at mobile devices and “traditional” PC’s alike
- creating “apps”, to be installed on mobile devices and connect to a database system in the cloud; basically this is the old client-server model all over again.
A comparison of mobile apps and mobile web architecture is the topic of another post.
-
Mobile reading
Posted on January 22nd, 2010 7 commentsNew models, new formats

© Lukas Koster
Recently I have been experimenting a bit with reading newspapers on my mobile phone (a G1 android device), or maybe I should say “reading news on my mobile”. I looked at two Dutch newspapers that adopt two completely different approaches.
“NRC Handelsblad” publishes it’s daily print newspaper as a daily “e-paper” in PDF, Mobi and ePub format, to be downloaded every day to the platform of your choice. In order to read the e-paper you need a physical device plus software (mobile phone, PC, e-reader, etc.) that can handle one of the available formats. On my G1 I use the Aldiko e-reader app for android with the ePub format. The e-paper is treated as an e-book file, with touch screen operation for browsing tables of content, paging through chapters or articles, zooming, etc. Access to the e-paper files is on a subscription basis.
“Het Parool” on the other hand offers a free app to be downloaded from the Android Market that serves as a front end to all recent articles available from their news server on the web. There is no need for a daily download of a file in a specific format that has to be supported by the physical platform of your choice. There is also an iPhone app. The app and access to the news articles are free of charge.

© Lukas Koster
Besides the difference in access (free vs paid), the most important contrast between these two mobile newspapers is the form in which the printed news is transformed to the digital and mobile environment. “NRC Handelsblad” takes the physical form the newspaper has had since it’s origin in the 17the century, dictated by physical, logistical and economical conditions, and transforms this 1 to 1 to the digital world: the e-paper still is one big monolithic bundle of articles that can’t be retrieved individually, completely ignoring the fact that the centuries old limitations don’t apply anymore. It is basically exactly the same as most manifestations of e-books.
“Het Parool” does completely the opposite. It treats individual news articles as units of content in their own right, “stories” as I call them in my post “Is an e-book a book?“. And this is how it should be in the digital mobile world. This is similar to the way that e-journals offer direct access to individual articles already.
Readers should be able to apply their own selection of “stories” to read in a specific, virtual, on the fly bundle, using the front end of their choice.
However, the “Parool” app functions as a predefined filter: it presents the reader with the most recent (24 hour max) articles from it’s own source of news. Of course this is fine as long as the readers choose to use the “Parool” app, but they may also choose to read news stories from different sources. This could be achieved with a different mobile, PC or web application that gathers content from a variety of sources.Another drawback of the ‘Parool” implementation is that it does not offer a “save” option. There is no way to read old articles, other than to go to the official newspaper website, either through mobile browsing or by using a PC web browser. The “NRC Handelsblad” implementation on the other hand does offer this option, because it is based on a download model to begin with.
This brings me to the matter of mobile web browsing. Reading and navigating a web page designed for the PC screen on a mobile device is annoying at least, not to mention the time it takes to load complete web pages into the mobile browser. Common practice is to create a simplified version of full fledged web pages for mobile use only. Of course this means doubling the website maintenance effort.
An alternative could be the adoption of HTML 5 and CSS 3, as was stated at a Top Tech Trends Panel session at ALA Midwinter 2010, where a university library official said: “2010 is the year that the app dies“, because “developers can leverage a single well-designed service to serve both browser-based and mobile users“. But this view completely misses the point: “Apps are not about technology, they are about a business model” as Owen Stephens pointed out. This business model implies the separation of content and presentation in a much broader sense then that of database back end – website front end only. This was an innovative concept until a couple of years ago.As I briefly described above, we need units of content being accessible by all kinds of platforms and applications through universal APIs. This model not only applies to reading texts, but also to finding these texts. Especially libraries should be aware of that.
Although the ALA Top Trends Panel stated that libraries’ focus should be on content rather than hardware, they did not touch upon the changing concept of what books are in the e-book era, as again Owen Stephens pointed out. New models and formats will have all kinds of consequences for the way we handle information. For instance: pages. A PDF file, which is a 1 to 1 translation of the print unit to a digital unit, as I explained, still has fixed pages and page numbers. An ePub file however has a flexible format that allows “pages” to be automatically adapted to the size of the device’s screen (thanks to @rsnijders and @Wowter for discussing this). There are no fixed pages or page numbers anymore. HTML pages containing full articles don’t have page numbers either, by the way. This will change the way we refer to texts online, without page numbers, which is one of the subject of the Telstar project, again with Owen Stephens involved (watch that guy).
The flexible page is another reason to have a critical look at MARC. There is no use anymore for tags like 300,a “Extent (Number of physical pages, etc.)”, 773,g (“Vol. 2, no. 2 (Feb. 1976), p. 195-230“).
The inevitable conclusion of all this is that all innovative developments on the end user interface presentation front end need to be supported by corresponding developments on the content back end, and vice versa.
-
Old library, new library
Posted on December 29th, 2009 9 commentsOn Sunday December 27, 2009 I was in the opportunity to visit the, otherwise closed, library of The Netherlands’ oldest museum Teylers museum in my home town Haarlem, together with a small group of Dutch library twitter people. We were very kindly shown around by librarian Marijn van Hoorn, who explained to us the library’s history and collection.
Now I’m not going to say something about the pleasant real life consequences of getting to know people in the virtual world (that has been done by @PeterMEvers already, in Dutch), or about the guided tour (already described very well by @underdutchskies in English and by @festinaatje and @ecobibl in Dutch). Also, none of the photos I made with my G1 phone are presentable; but you can have a look at the photos made by @Dymphie, @underdutchskies and @wbk500).
Instead, I will try to make a comparison between the old library’s course of life and the developments that modern libraries are going through, because I see some parallels there.
The museum was built in 1784 with money from the legacy of the wealthy banker and merchant Pieter Teyler van der Hulst, to preserve his collections and advance the arts and sciences. The museum’s library was established in 1826 to house a separate collection of books and journals in the field of natural history (botany, zoology, paleontology and geology).
One of the objectives for the library was to have a complete collection of all journals in the area of natural history. In the beginning the library was only accessible by invitation, and the honoured guests were welcomed and assisted by the “caretaker” or “landlord” of the museum.
By the middle of the 19th century the library opened up to a more general public, that is to say teaching and research staff members of the emerging universities.
But from 1870 the importance of the Teylers library for university staff declined drastically, because the universities in The Netherlands started to organise academic libraries of their own. So the library closed its doors for regular visitors. The collection continued to be maintained and expanded until 1987, when it was no longer realistic to pursue completeness.
During the 1970’s the privately funded museum and library faced the threat of closing down because of the cost of preserving the historical buildings and collections. In the written library catalog (created over time by a large number of volunteers and employees) all items were annotated with an estimated value in case of forced sale of the collection.Fortunately the Dutch state decided to subsidise the historically and culturally valuable museum, and now Teylers is a very popular place, with a new wing with a large hall for temporary exhibitions, an educational section and a cafe.
The museum library is only open for visitors on request and on special occasions. The collection is not expanded anymore, but it is a very complete and valuable historical natural history collection, which is, among other things, used to organise temporary thematic exhibitions in the museum. Besides the natural history items there are also old maps and atlases and travel journals, like the James Cook journals by Sydney Parkinson that @jaapvandegeer drew my attention to.
The museum and the library are also looking to the future. Both museum objects and library items are being digitised, there is a European project for creating a website on ornithology that uses the library’s birds images, there is a new thematic website that combines documents, images, metadata from the museum, the library and external sources, the library catalog has been migrated to an Adlib system, and there is a Ning social network.So, what are the parallels with modern libraries? First of all, it is clear that the influence of external developments on libraries is not something that is limited to the modern digital web age of Google. Just like Teylers library, modern public, academic and special libraries were at first targeted at a limited, well defined audience, and only accessible on location on specific times, after which their target audience and accessibility widened substantially. Catalogs and varying parts of the collection are available online to a global audience.
The external influence from competing university libraries is currently mirrored by the world wide web itself, with Google as one of the main external threats. I have written about this in my post “No future for libraries?“.
The two important issues here are: the effects on modern library collections and audience. Teylers library decided to stop building its own collection, but they keep using it in a number of ways: temporary physical thematic exhibitions, but also in new digital “mashed up” ways. This might be a good example for modern libraries to follow: make use of modern technologies to reuse existing collections to create virtual online thematic aggregations of data, texts, images, etc. See also my post “Collection 2.0“.
As for modern libraries’ response to changing audiences: proceeding with new ways of using their collections will draw new customers anyway. But it is equally important to find other ways to “go where your users are”, like being on social networks like the Teylers Ning site. One of the most important moves in the near future will be mobile presence.Teylers library shows us that there may be a new life for old libraries.
-
Is an e-book a book?
Posted on November 19th, 2009 28 commentsAbout cataloging physical items or units of content2009 is the year of the e-book, or perhaps better: of the e-book reader. This is an important distinction that I will explain below. E-books are becoming more popular because of the increasing availability of various cheap e-book readers.
But what is an e-book? Is it the same as a book? Some people say yes, some people say no. This question shouldn’t be so hard to answer, should it? We just have to define what a book is first. So, what is a book?When people think of a book, they picture something like the archetypal book: printed,
medium sized, hardcover, no illustrations on the front. The thing that you can actually hold in your hands and read.
But if they say: “This book was written by that author”, they don’t think that the author actually wrote that particular item they are holding in their hands. Now we already have two different meanings of the concept “book”: one is a tangible object, the other is the content that is made available in this tangible object by means of printed text.Besides these conceptual levels, there are more ways by which books can be described, as shown by this incomplete list of examples:
Physical form: Historically there have been clay tablets, inscribed stones, handwritten scrolls, handwritten bound pages, printed pages. We also know different formats targeted at specific uses or audiences: audio books, braille books, pop up books.
Content: A book can contain text only, or images only (for instance a children’s picture book, or a book of photographs), or a combination of both.
Units: A book can consist of one “story” ( for instance a novel), optionally subdivided in chapters, or be made up of several stories, or articles (like a text book about a certain subject). Chapters and stories can be written by the same or by several authors. A book can also contain two or more other books by the same author (“collected works”), etc.
Content type: A book can contain fiction, aimed at entertaining readers. Books can be purely administrative, like accounting books. There are religious books to be used in religious ceremonies (sometimes these are referred to as “THE book“). Some books are for studying and learning (“text books”, which may also contain images by the way). There are scientific books and instructional books (travel guides, cook books, manuals).
First, we need see how all this fits together before we can answer the question “Is an e-book a book?” or more precise: “In which sense is an e-book a book?“. Fortunately there is already a conceptual model for bibliographic entities and the relationships between them that describes this: FRBR (Functional Requirements for Bibliographic Records), published by IFLA. The IFLA Final Report (2009 version) says it all, but there are also a couple of short summaries: Barbara Tillet’s (LoC) “What is FRBR?”, Jenn Riley’s “FRBR” blog post, and there is William Denton’s FRBR Blog for more information.
The FRBR model is targeted at libraries, maybe even at publishers and booksellers too.I will not go into the FRBR “Group 2” (persons and corporate bodies) and “Group 3” (subjects) entities here, but focus on the “Group 1” entities.
The FRBR “Group 1 entities” consist of Work, Expression, Manifestation and Item (also referred to as WEMI). FRBR entities not only apply to books or textual works, but also to movies, theater plays, music, etc.
- Work - a distinct intellectual or artistic creation
- Expression - the intellectual or artistic realization of a work
- Manifestation - the physical embodiment of an expression of a work
- Item - a single exemplar (or copy) of a manifestation
There are hierarchical relationships between the entities:- A work (for instance a book) can have (“is realized through“) one or more expressions (for instance the original English text and the Dutch translation).
- Each expression can have (“is embodied in“) one or more manifestations (for instance a specific edition with an ISBN, or one of more works/expressions in a “collected works” edition).
- Each manifestation has (“is exemplified by“) one or more items, the things you can actually hold in your hands.
- A manifestation can also consist of several expressions, as in the “collected works” example.
Besides these hierarchical relationships between different entity types there are also recursive relationships between entities of the same type: hierarchical and other. Some examples:
- A work is part of another work (hierarchical), as in a series like Harry Potter
- A work is an adaptation of another work
- An expression is a sequel to another expression
- A manifestation is a facsimile of another manifestation
So far so good. The FRBR conceptual model describes (or aims to describe) real world things and relationships on an abstract level. The model can be implemented in actual systems (both computerised and manual!). In these systems you are free to refer to the conceptual model entities (“work”, “expression”, “manifestation”, “item”) by names that are actually used in daily life. This is what Rob Styles is trying to do when he talks about “stories” and “editions” in his recent blog post “Bringing FRBR Down to Earth…” I think. I will define the “story” concept in a different way below.
Until now, catalogers and library systems have been targeted at describing the thing they have in their hands (or better the items that make up the library’s collection). In FRBR terms this means that catalogs describe manifestations and items, not works and expressions (or implicitly at best). In short, a bottom up approach. This is understandable, because in the past there was nothing else to go by than the explicit manifestation information available on the physical item (author, title, ISBN, edition, publisher, etc.) .
Of course, MARC21 provides some options to describe relationships with expressions and works and other manifestations, like the 250 – Edition Statement, the 490 – Series Statement and the 76X-78X – Linking Entries-General Information. But these fields can only be used if the information is known to the cataloger.
Also, in traditional catalogs, works that are distinct expressions in one manifestation (like articles, chapters, stories, poems) are not described separately, because of the same reason: you only catalog the item you have before you. In the ideal world, or better in the new digital world, the unit to be cataloged or described should always be the work, which we may call “story”. In other words: we should catalog units of content (“stories”) instead of, or supplementary to, physical items.
Current library practice is that we catalog books and journals in the catalog and offer article descriptions through subscribed article metadata databases separately.So, back to the e-book. Where does that fit in? An e-book could be considered nothing more than a manifestation and/or an item belonging to a certain work/expression, because an e-book can be everything a printed book is. As such it is equivalent to a braille or audio book. Some libraries treat e-books as something different, as works/expressions as such. They catalog e-books separately, just like all other items/manifestations are treated as separate works. There are even separate e-book overviews.
But there is more to it than that. The big difference with books until now is that an e-book is not inseparably linked to the physical carrier. A printed book can only be read if the reader has a physical copy (a FRBR item) consisting of bound paper pages containing the text printed on them with ink. The same applies to handwritten texts, scrolls, clay tablets, etc.
Even more so, the physical form, together with economical conditions and possibilities for distribution, often determines the actual manifestation of a book and a journal. A book (or volume) can only contain a certain number of pages in order to be manageable. There is also a cost consideration in the size and distribution of the items.What we call an e-book is actually only a digital, abstract manifestation of a work/expression. In order to be able to read it you have to download it in a specific format (PDF, epub, etc.) onto a physical carrier (USB-stick, computer disk, etc.), and then you need a physical reading device with dedicated software (dedicated e-book readers like Kindle, a computer, a mobile phone, etc.).
Libraries do not have e-books as items, only as manifestations. These e-book manifestations can be available on an online server somewhere in whatever form, and can be made into an item on-the-fly, using a specific format on-the-fly, choosing a physical carrier on-the-fly. What’s more, the content of e-books can also be selected out of several works/expressions on-the-fly, this way creating manifestations or even expressions on demand.Now, is the FRBR conceptual model suited for describing e-books? If we treat e-books as manifestations without items (like we handle e-journals in our catalogs), how do we proceed? The FRBR Manifestation item among others has these attributes:
- form of carrier
- extent of the carrier
- physical medium
- system requirements (electronic resource)
- file characteristics (electronic resource)
- mode of access (remote access electronic resource)
- access address (remote access electronic resource)
But we have just seen that in the case of e-books these are features of the items generated on-the-fly, which are not known before. Does this mean that we have to describe as manifestations all possible physical forms that one e-book can take? This would also mean that an e-book as such should be described on the level of a FRBR Expression. This may be correct in some cases (the creation of aggregated content on-the-fly), but not in all: where an e-book is similar to manifestations like braille, audio book, etc.
Does FRBR need an extra level? I am not sure. Let’s look briefly at how e-journals are handled. As far as I can see, journal and e-journal issues are described as separate manifestations of journals and e-journals (with a “part-of” relationship to the higher level). These issue manifestations are treated as aggregates that contain articles, that are also described as manifestations with a “part-of” relationship to the issue. In MARC21 this handled by the 773 Host Item Entry tag.
I am not sure if and how different physical formats (PDF, HTML) for articles in e-journals are handled. The obvious difference with e-books is that the described unit is the article (or “story” as definition of unit of content), which can be downloaded as separate items. The e-journal articles are ideally also identified by unique identifiers (DOI’s).What does this mean for e-books? I think we can treat an e-book as either an expression or a manifestation, depending on the nature of the specific e-book in question. For the e-book manifestation we would only need to register the mode of acces, access address and manifestation identifier attributes, preferably in the form of a URI.
I also think we should use the possibilities of the FRBR model to start describing, cataloging and identifying the “stories” (chapters, articles, etc.) that make up books and e-books separately, as units of content in their own right. People are interested in the content, the “stories”, not the physical items or artificial digital aggregate units like e-books or e-journals.
In this sense, the “e-journal” is an archaic concept, where the limitations of the physical journal are translated as such to the digital world. There is no real need to bundle articles in electronic form into one electronic issue of an e-journal that is published at regular intervals in time. Electronic articles can be published individually immediately after peer review and approval. Published articles can be aggregated in one nor more virtual online serials.Like ISBN’s and ISSN’s we need an identifier for the units of content other than journal articles. As a matter of fact, there already is one, the DOI:
“A DOI name can be used to identify any resource involved in an intellectual property transaction. Intellectual property includes both physical and digital manifestations, performances and abstract works. An entity can be identified at any arbitrary level of granularity.” (see http://www.doi.org/faq.html#2). Thanks to Owen Stephens for pointing this out to me in a twitter discussion with Inga Overkamp.I may be wrong about all this. I am open for comments and suggestions.
-
Just in time or just in case?
Posted on October 16th, 2009 7 commentsMetasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
-
Roadmaps to uncertainty
Posted on October 6th, 2009 4 commentsWhat will library staff do 5 years from now?

© Lukas Koster
I attended the IGeLU 2009 annual conference in Helsinki September 6-9. IGeLU is the International Group of Ex Libris Users, an independent organisation that represents Ex Libris customers. Just to state my position clearly I would like to add that I am a member of the IGeLU Steering Committee.
These annual user group meetings typically have three types of sessions: internal organisational sessions (product working groups and steering committee business meetings, elections), Ex Libris sessions (product updates, Q&A, strategic visions), and customer sessions (presentations of local solutions, addons, developments).Not surprisingly, the main overall theme of this conference was the future of library systems and libraries. The word that characterises the conference best in my mind (besides “next generation“and “metaphor“) is “roadmap“. All Ex Libris products but also all attending libraries are on their way to something new, which strangely enough is still largely uncertain.

© Lukas Koster
Library paradise?
Ex Libris presented the latest state of design and development of their URM (Unified Resource Management) project, ‘A New Model for Next-generation Library Services’. In the final URM environment all back end functionality of all current Ex Libris products will be integrated into one big modular system, implemented in a SaaS (“Software as a Service“) architecture. In the Ex Libris vision the front end to this model will be their Primo Indexing and Discovery interface, but all URM modules will have open API’s to enable using them with other tools.
The goal of this roadmap apparently is efficiency in the areas of technical and functional system administration for libraries.Intermediate generation
In the mean time development of existing products is geared towards final inclusion in URM. All future upgrades will result in what I would like to call “intermediate” instead of “next generation” products . MetaLib, the metasearch or federated search tool, will be replaced by MetaLib Next Generation, with a re-designed metasearch engine and a Primo front end. The digital collection management tool DigiTool will be merged into its new and bigger nephew Rosetta, the digital preservation system. The database of the OpenUrl resolver SFX will be restructured to accommodate the URM datamodel. The next version of Verde (electronic resource management) will effectively be URM version 1, which will also be usable as an alternative for both ILS’es Voyager and Aleph.
Here we see a kind of “intermediate” roadmap to different “base camps” from where the travelers can try to reach their final destination.
© Lukas Koster
Holy cows!
From the perspective of library staff we see another panorama appearing.
In one of the customer presentations Janet Lute of Princeton University Library, one of the three (now four) URM development partners, mentioned a couple of “holy cows” or library tasks they might consider stopping doing while on their way to the new horizon:- managing prediction patterns for journal issues
- checking in print serials
- maintaining lots of circulation matrices and policies
- collecting fines
- cataloging over 80% of bibliographic records
I would like to add my own holy cow MARC to this list, about which I have written a previous post Who needs MARC?. (Some other developments in this area are self service, approval plans, shared cataloging, digitisation, etc.)
This roadmap is supposed to lead to more efficient work and less pressure for acquisitions, cataloging and circulation staff.Eldorado or Brave New World?
To summarise: we see a sketchy roadmap leading us via all kinds of optional intermediate stations to an as yet still vague and unclear Eldorado of scholarly information disclosure and discovery.
The majority of public and professional attention is focused on discovery: modern web 2.0 front ends to library collections, and the benefits for the libraries’ end users. But it is probably even more important to look at the other side, disclosure: the library back end, and the consequences of all these developments for library staff, both technically oriented system administrators and professionally oriented librarians.Future efficient integrated and modular library systems will no doubt eliminate a lot of tasks performed by library staff, but does this mean there will be no more library jobs?
Will the university library of the future be “sparsely staffed, highly decentralized, and have a physical plant consisting of little more than special collections and study areas“, as was stated recently in an article in “Inside Higher Education”? I mentioned similar options in “No future for libraries?“.Personally I expect that the two far ends of the library jobs spectrum will merge into a single generic job type which we can truly call “system librarian“, as I stated in my post “System librarians 2.0“. But what will these professionals do? Will they catalog? Will they configure systems? Will they serve the public? Will they develop system add-ons?
This largely depends on how the new integrated systems will be designed and implemented, how systems and databases from different vendors and providers will be able to interact, how much libraries/information management organisations will outsource and crowdsource, how much library staff is prepared to rethink existing workflows, how much libraries want to distinguish themselves from other organisations, how much end users are interested in differences between information management organisations; in brief: how much these new platforms will allow us to do ourselves.
We have come up with a realistic image of ourselves for the next couple of decades soon, otherwise our publishers and system vendors will be doing it for us.
-
Explicit and implicit metadata
Posted on August 20th, 2009 15 commentsOn August 17, after I tested a search in our new Aleph OPAC and mentioned my surprise on Twitter, the following discussion unfolded between me (lukask), Ed Summers of the Library of Congress and Till Kinstler of GBV (German Union Library Network):
- lukask: Just found out we only have one item about RDF in our catalogue: http://tinyurl.com/lz75c4
- edsu: @lukask broaden that search
http://is.gd/2l6vB - lukask: @edsu Ha! Thanks! But I’m sure that RDF will be mentioned in these 29 titles! A case for social tagging!
- edsu: @lukask or better cataloging
- edsu: @lukask i guess they both amount to the same thing eh?
- lukask: @edsu That’s an interesting position…”social tagging=better cataloging”. I will ask my cataloguing co-workers about this specific example
- edsu: @lukask make sure to wear body-armor
- lukask: @edsu Yes I know! I will bring it up at tomorrow’s party for the celebration of our ALEPH STP (after some drinks…)
- tillk: @edsu @lukask or fulltext search…
SCNR… - edsu: @tillk yeah, totally — with projects like @googlebooks and @hathitrust we may look back on the age of cataloging with different eyes …
- lukask: @tillk @edsu Fulltext search yes, or “implicit automatic metadata generation”?
What happened here was:
- A problem with findability of specific bibliographic items was observed: although it is highly unlikely that books about the Semantic Web will not cover RDF-Resource Description Framework, none of the 29 titles found with “Semantic Web” could be found with the search term “Resource Description Framework“; on the other side, the only item found with “Resource Description Framework” was NOT found with “Semantic Web“. I must add that the “Semantic web” search was an “All words” search. Only 20 of the results were indexed with the Dutch subject heading “Semantisch web” (which term is never used in real life as far as I know; the English term is an international concept). Some results were off topic, they just happened to have “semantic” and “web” somewhere in their metadata. A better search would have been a phrase search (adjacent) with “semantic web” in actual quotes, which gives 26 items. But of these, a small number were not indexed with subject heading “Semantisch web“. Another note: searching with “RDF” gets you all kinds of results. Read more on the issue of searching and relevance in my post Relevance in context.
- Four possible solutions were suggested:
- social tagging
- better cataloging
- fulltext searching
- automatic metadata generation
Social tagging
Clearly, the 26 items found with the search “Semantic web” are not indexed by the “Resource description framework” or “RDF” subject heading. There is not even a subject heading for “Resource description framework” or “RDF“. In my personal view, from my personal context, this is an omission. Mind you, this is not only an issue in the catalogue of the Library of the University of Amsterdam, it is quite common. I tried it in the British Library Integrated Catalogue with similar results. Try it in your own OPAC!
I presume that our professional cataloging colleagues can’t know everything about all subjects. That is completely understandable. I would not know how to catalog a book about a medical subject myself either! But this is exactly the point. If you allow end users to add their own tags to your bibliographic records, you enhance the findability of these records for specific groups of end users.
I am not saying that cataloguing and indexing by library specialists using controlled vocabularies should be replaced by social tagging! No, not at all. I am just saying that both types of tagging/indexing are complementary. Sure, some of the tags added by end users may not follow cataloging standards, but who cares? Very often the end users adding tags of their own will be professional experts in their field. In any case, items with social tags will be found more often because specific end user groups can find them searching with their own terms.Better cataloging
I suppose Ed Summers was trying to say the same thing as I just did above, when he commented “or better cataloging, I guess they both amount to the same thing eh?“, which I summarised as “social tagging=better cataloging“, but he can correct me if I’m wrong.
Anyway, I hope I made it clear that I would not say “social tagging=better cataloging“, but rather “controlled vocabularies+social tagging=better cataloging“.
Or alternatively, could we improve cataloging by professional library catalogers? I must admit I do not know enough about library training and practice to say something about that. I am not a trained librarian. Don’t hesitate to comment!Fulltext searching
Is fulltext searching the miracle cure for findability problems, as Till Kinstler seems to suggest? Maybe.
Suppose all our print material was completely digitised and available for fulltext search, I have no doubt that all 26 items mentioned above (the results of the “semantic web” all words search) would be found with the “resource description framework” or “rdf” search as well. But because fulltext search is by its very nature an “all words” search, the “rdf” fulltext search would also give a lot of “noise”, or items not having any relation to “semantic web” at all (author’s initials “R.D.F”, other acronyms “R.D.F.”, just see RDF in the BL catalogue). Again, see my post Relevance in context for an explanation of searching without context.
Also, there will be books or articles about a subject that will not contain the actual subject term at all. With fulltext search these items will not be found.
Moreover, fulltext searching actually limits the findable items to text, excluding other types, like images, maps, video, audio etc.
This brings me to the “final solution”:Automatic metadata generation
Of course this is mostly still wishful thinking. But there are a number of interesting implementations already.
What I mean when I say “(implicit) automatic metadata generation” is: metadata that is NOT created deliberately by humans, but either generated and assigned as static metadata, or generated on the fly, by software, applying intelligent analysis to objects, of all types (text, images, audio, video, etc.).
In the case of our “rdf” example, such a tool would analyse a text and assign “rdf” as a subject heading based on the content and context of this text, even if the term “rdf” does not appear in the text at all. It would also discard texts containing the string “rdf” that refer to something completely different. Of course for this to succeed there should be some kind of contextual environment with links to other records or even other systems to be able to determine if certain terminology is linked to frequently used terms not mentioned in the text itself (here the Linked Data developments could play a major role).
The same principle should also apply to non-textual objects, so that images, audio, video etc. about the same subject can be found in one go. Google has some interesting implementations in this field already: image search by colour and content type: see for example the search for “rdf” in Google Images with colour “red”and content type “clip art”.
But of course there still needs a lot to be done. -
Relevance in context
Posted on August 11th, 2009 9 commentsIf you do a search in a bibliographic database, you should find what you need, not just what you are looking for, or what the database “thinks” you are looking for. If you find what you are looking for, then you will not be surprised and you will not discover anything new. And that’s not what you want, is it? But if you find things you did not look for but also do not need, you’re not just surprised, you are confused! And that’s not what you want either.
You want the results that are the most relevant for your search, with your specific objectives, at that specific point in time time, for your specific circumstances, and you want them immediately.
So, how should search systems behave to make you find what you need? There are two conditions that need to be met:
- The search terms must be interpreted correctly
- The most relevant search results must be presented
The Problem
First of all, let’s take a look at current practice.Search systems cannot cope with ambiguous search terms. My favorite example and test search term is “voc“. This can stand for a number of things in various disciplines: V.O.C. (Dutch: “Verenigde Oostindische Compagnie” or “Dutch United East Indies Company”) in historical databases; “vocals” in musical databases; “volatile organic compounds” in physics databases. So if you do a search for “voc” in a standard library catalogue, you get all kinds of results. Even more so if you use a metasearch or federated search tool for searching several databases simultaneously.
You are confused. You would like the system to “understand” which one of these concepts you are referring to instead of just using the literal string. You would like the system to take into account your context.
In most databases search results can be sorted or filtered by a number of fields, most commonly by year, title, author, and also by more specific fields in dedicated databases. But unless you are interested in a specific year, author or title, this will not do. Recently many systems have implemented “faceted” and “clustered” browsing of results, enabling “drilling down” on specific terms or subjects. This basically comes down to setting the context after the fact.
But after the system has interpreted your search terms, the results should also be ordered in a specific way, the ones you need most should be on top. This is where “relevance ranking” of search results comes in. Most catalogues and databases use a system specific default relevance ranking algorithm. Search results are assigned a rank, based on a number of criteria, that can differ between databases, depending on the nature of the database.
Some databases just present the most recent results on top. For medical and physical sciences this may be right, but for history and literature databases this may just be wrong.
Sometimes the search terms are taken into account: the number of times the given search terms are present in the result fields is important, but also the specific fields in which search terms appear. The appearance of search terms in “Title” and “Subject” may rank higher than in “Abstract” or “Publisher”. Moreover, the search indexes used can have a major influence on rank: if you search for “Subject” = “flu”, then results with “flu” as subject will be ranked higher than results with “flu” in the title only.
To come back to my example, with ambiguous search terms like “voc” this type of relevance ranking will definitely not be enough, because results from the three different conceptual areas will be completely mixed up.When searching with a metasearch or federated search tool things get even more complicated. Each of the remote databases that you search in has its own default way of ranking. Usually the metasearch tool fetches the first 30 or so results from each remote database (one set sorted by date, the other by internal rank, the next by title), merges these into one list and then applies its own local ranking mechanism to this subset only. Confusion! And I did not even mention searching databases with metadata in multiple languages. Moreover, databases containing only metadata will produce different results and relevance than databases with full text articles. There is absolutely no way of telling if you actually have the most relevant results for your situation.
Again, with relevance ranking search systems do not take into account the context either. You could say it is an introverted, internally focused way of ranking, the confusing results of which are multiplied in case of metasearching.
Most metasearch tools give users the option of searching in sets of pre-selected databases, based on subject or type. This way you can limit your search to those databases that are known to have data about that specific subject. You more or less set the context in advance. But this mechanism only eliminates results from databases that probably do not have data on your subject at all, so they would not have shown up in the results anyway. Moreover, the same issues that were discussed above apply to this limited set of databases.
The metasearch tool that I know best (MetaLib) offers the option of setting a relative rank per database, so results from databases with a higher rank will have a higher relevance in merged result sets. But this is a system wide option, set by system administration, so it is not taking into account any context at all. It would be better if you could make the relative database rank dependent on the set or subject the search is done from (for instance: if a history database is searched in the context of a “History” set, the results get a higher rank than in a search from a “Music” set).
The best solution for this “internal” relevance problem regarding distributed databases is a central database of harvested indexes. In this case all harvested metadata is normalised and ranked in a uniform way, and users do not have to select databases in advance. But these systems still do not take into account “external” relevance: there is no context!
A very interesting and intelligent solution for the problem of pre-selecting databases is provided in PurpleSearch, the integrated front end to MetaLib (among other things), developed by the Library of the University of Groningen. The system records which databases actually produce results for specific search terms. As soon as the user enters search terms in the single search box, the system knows which databases will have results, and the search is automatically carried out in these databases, without asking the user to select the databases or subject area he or she wants to search in. Simultaneously a background search in all other databases is performed in order to check additional new results, and the information about results in databases is updated.
Of course, all other usual options are available as well, like pre-selecting databases (setting context in advance) and faceted results drilling down (setting context after the fact). But again, no external contextual settings.
Search "voc" in PurpleSearch
- Conclusion: the only way to find what you need, is to make search systems take into account the context in which the search is done, both for searching and for relevance ranking.
Solutions
Now, let’s have a look at a couple of conditions that would make contextual searches possible.Personal context: a system should “know” about your personal interests, field of study, job situation, age, etc. so it can “decide” which databases to search in and which results are the most relevant for you. Some systems, like university systems, have access to information about their users. Once you log in, the system potentially knows which subjects you are studying or teaching and could use this information for setting the context for searching and ranking.
But what if you are a student in Law AND Social Siences, which subject area should the system choose? Or: if you are a History teacher, and you have a personal interest in Ecology, which the system does not know about, what then? Somehow you still need to set context yourself.Some systems also offer the opportunity of setting personal preferences, like: area of interest, specific databases, type of material (only digital or print), only recent material, etc. Again: you must be able to deviate from these preferences, depending on your situation, which means setting context manually.
Different search systems will have different user profiles (user data and preferences). It would be nice if search systems could take advantage of universal external personal profiles (like Google Profiles for instance) using some kind of universal API.
Situational context: a system should also “know” about the situation you are in, both in a functional sense and in a physical sense.
Functional context means: wich role are you playing? Are you in your law student role or in your social sciences student role? Are you in your professional role or in your private role? But also: to which resources do you have access?
An interesting idea: if you work Monday to Friday during office hours, study in the evenings and spend time on your personal interests on the weekends, it would be nice if you could link times of day and days of the week to your different roles, so search systems could use the correct context for your searches depending on time and date: “if it’s Tuesday evening then use study profile and search in ‘History’; if it’s Sunday, use private profile and search in ‘Ecology’“.
This temporal context was also referred to by Till Kinstler in a (German) blog post about the new “Suchkiste” search system prototype of the German Union Library Network (GBV): ‘the search for “Charlie Brown” in October should result in “It’s the Great Pumpkin, Charlie Brown” at number 1, and in December in “A Charlie Brown Christmas“‘.
Physical context means: where are you? It would be nice if a library catalogue search system would take into account your actual location, so it could show you the records of the copies of the FRBR-ized results available in the library locations nearest to you (this idea came up in a recent Twitter discussion between @librarianbe and @gbierens). This is what Worldcat does when you supply it with your location manually. In Worldcat this is a static preference. But it would be nice if it would respond to your actual location, for instance by using the GPS coordinates of your mobile phone. Alternatively, search systems could derive your location from the IP address you are sending your search from.
This information could also be used to determine if records for digital or physical copies should be ranked the most relevant in this case. If you are inside the library building and you have a preference for physical books and journals, then records for available print copies should be on top of the results list. If you are at home, then records for digital copies that you have access to should come first.Contextual searching and ranking should always be a combination of all possible conditions, personal, situational and internal system ones.
Of course it goes without saying that it would be great if metasearch tools were able to convey the search context to the remote databases and get contextual results back, using some kind of universal serach context API!
Last but not least, each search system should show the context of the search, and explain how it got to the results in the presented order. Something like: based on your personal preferences, the time of day and day of the week, and your location, the search was done in these databases, with this subject area, and the physical copies of the nearest location are shown on top.
This context area on the results screen could then be used as a kind of inverted faceted search, drilling “up” to a broader level or “sideways” to another context.
































Recent Comments