Posted on January 5th, 2012 33 comments
2011 has in a sense been the year of library linked data. Not that libraries of all kinds are now publishing and consuming linked data in great numbers. No. But we have witnessed the publication of the final report of the W3C Library Linked Data Incubator Group, the Library of Congress announcement of the new Bibliographic Framework for the Digital Age based on Linked Data and RDF, the release by a number of large libraries and library consortia of their bibliographic metadata, many publications, sessions and presentations on the subject.
All these events focus mainly on publishing library bibliographic metadata as linked open data. Personally I am not convinced that this is the most interesting type of data that libraries can provide. Bibliographic metadata as such describe publications, in the broadest sense, providing information about title, authors, subjects, editions, dates, urls, but also physical attributes like dimensions, number of pages, formats, etc. This type of information, in FRBR terms: Work, Expression and Manifestation metadata, is typically shared among a large number of libraries, publishers, booksellers, etc. ‘Shared’ in this case means ‘multiplied and redundantly stored in many different local systems‘. It doesn’t really make sense if all libraries in the world publish identical metadata side by side, does it?
In essence only really unique data is worth publishing. You link to the rest.
Currently, library data that is really unique and interesting is administrative information about holdings and circulation. After having found metadata about a potentially relevant publication it is very useful for someone to know how and where to get access to it, if it’s not freely available online. Do you need to go to a specific library location to get the physical item, or to have access to the online article? Do you have to be affiliated to a specific institution to be entitled to borrow or access it?
Usage data about publications, both print and digital, can be very useful in establishing relevance and impact. This way information seekers can be supported in finding the best possible publications for their specific circumstances. There are some interesting projects dealing with circulation data already, such as the research project by Magnus Pfeffer and Kai Eckert as presented at the SWIB 11 conference, and the JISC funded Library Impact Data project at the University of Huddersfield. The Ex Libris bX service presents article recommendations based on SFX usage log analysis.
The consequence of this assertion is that if libraries want to publish linked open data, they should focus on holdings and circulation data, and for the rest link to available bibliographic metadata as much as possible. It is to be expected that the Library of Congress’ New Bibliographic Framework will take care of that part one way or another.
In order to achieve this libraries should join forces with each other and with publishers and aggregators to put their efforts into establishing shared global bibliographic metadata pools accessible through linked open data. We can think of already existing data sources like WorldCat, OpenLibrary, Summon, Primo Central and the like. We can only hope that commercial bibliographic metadata aggregators like OCLC, SerialsSolutions and Ex Libris will come to realise that it’s in everybody’s interest to contribute to the realisation of the new Bibliographic Framework. The recent disagreement between OCLC and the Swedish National Library seems to indicate that this may take some time. For a detailed analysis of this see the blog post ‘Can linked library data disrupt OCLC? Part one’.
An interesting initiative in this respect is LibraryCloud, an open, multi-library data service that aggregates and delivers library metadata. And there is the HBZ LOBID project, which is targeted at ‘the conversion of existing bibliographic data and associated data to Linked Open Data‘.
So what would the new bibliographic framework look like? If we take the FRBR model as a starting point, the new framework could look something like this. See also my slideshow “Linked Open Data for libraries”, slides 39-42.
The basic metadata about a publication or a unit of content, on the FRBR Work level, would be an entry in a global datastore identified by a URI ( Uniform Resource Identifier). This datastore could for instance be WorldCat, or OpenLibrary, or even a publisher’s datastore. It doesn’t really matter. We don’t even have to assume it’s only one central datastore that contains all Work entries.
The thing identified by the URI would have a text string field associated with it containing the original title, let’s say “The Da Vinci Code” as an example of a book. But also articles can and should be identified this way. The basic information we need to know about the Work would be attached to it using URIs to other things in the linked data web. A set of two things linked by a URI is called a ‘triple’. ‘Author’ could for instance be a link to OCLC’s VIAF (http://viaf.org/viaf/102403515 = Dan Brown), which would then constitute a triple. If there are more authors, you simply add a URI for every person or institution. Subjects could be links to DBPedia/Wikipedia, Freebase, the Library of Congress Authority files, etc. There could be some more basic information, maybe a year, or a URI to a source describing the background of the work.
At the Expression level, a Dutch translation would have it’s own URI, stored in the same or another datastore. I could imagine that the publisher who commissioned the translation would maintain a datastore with this information. Attached to the Expression there would be the URI of the original Work, a URI pointing to the language, a URI identifying the translator and a text string contaning the Dutch title, among others.
Every individual edition of the work could have it’s own Manifestation level URI, with a link to the Expression (in this case the Dutch translation), a publisher URI, a year, etc. For articles published according to the long standing tradition of peer reviewed journals, there would also be information about the journal. On this level there should also be URIs to the actual content when dealing with digital objects like articles, ebooks, etc., no matter if access is free or restricted.
So far we have everything we need to know about publications “in the cloud”, or better: in a number of datastores available on a number of servers connected to the world wide web. This is more or less the situation described by OCLC’s Lorcan Dempsey in his recent post ‘Linking not typing … knowledge organization at the network level’. The only thing we need now is software to present all linked information to the user.
No libraries in sight yet. For accessing freely available digital content on the web you actually don’t need a library, unless you need professional assistance finding the correct and relevant information. Here we have identified a possible role of librarians in this new networked information model.
Now we have reached the interesting part: how to link local library data to this global shared model? We immediately discover that the original FRBR model is inadequate in this networked environment, because it implies a specific local library situation. Individual copies of a work (the Items) are directly linked to the Manifestation, because FRBR refers to the old local catalogue which describes only the works/publications one library actually owns.
In the global shared library linked data network we need an extra explicit level to link physical Items owned by the library or online subscriptions of the library to the appropriate shared network level. I suggest to use the “Holding” level. A Holding would have it’s own URI and contain URIs of the Manifestation and of the Library. A specific Holding in this way would indicate that a specific library has one or more copies (Items) of a specific edition of a work (Manifestation), or offers access to an online digital article by way of a subscription.
If a Holding refers to physical copies (print books or journal issues for instance) then we also need the Item level. An Item would have it’s own URI and the URI of the Holding. For each Item, extra information can be provided, for instance ‘availability’, ‘location’, etc. Local circulation administration data can be registered for all Holdings and Items. For online digital content we don’t need Items, only subscription information directly attached to the Holding.
Local Holding and Item information can reside on local servers within the library’s domain or just as well on some external server ‘in the cloud’.
It’s on the level of the Holding that usage statistics per library can be collected and aggregated, both for physical items and for digital material.
Now, this networked linked library data model still allows libraries to present a local traditional catalogue type interface, showing only information about the library’s own print and digital holdings. What’s needed is software to do this using the local Holdings as entry level.
But the nice thing about the model is that there will also be a lot of other options. It will also be possible to start at the other end and search all bibliographic metadata available in the shared global network, and then find the most appropriate library to get access to a specific publication, much like WorldCat does, but on an even larger scale.
Another nice thing of using triples, URIs and linked data, is that it allows for adding all kinds of other, non-traditional bibliographic links to the old inward looking library world, making it into a flexible and open model, ready for future developments. It will for instance be possible for people to discover links to publications and library holdings from any other location on the web, for instance a Wikipedia page or a museum website. And the other way around, from an item in local library holdings to let’s say a recorded theatre performance on YouTube.
When this new data and metadata framework will be in place, there will be two important issues to be solved:
- Getting new software, systems and tools for both back end administrative functions and front end information finding needs. For this we need efforts from traditional library systems vendors but also from developers in libraries.
- Establishing future roles for libraries, librarians and information professionals in the new framework. This may turn out to be the most important issue.
Posted on September 2nd, 2011 10 commentsShifting focus from information carriers back to information
Library catalogues have traditionally been used to describe and register books and journals and other physical objects that together constitute the holdings of a library. In an integrated library system (ILS), the public catalogue is combined with acquisition and circulation modules to administer the purchases of book copies and journal subscriptions on one side, and the loans to customers on the other side. The “I” for “Integrated” in ILS stands for an internal integration of traditional library workflows. Integration from a back end view, not from a customer perspective.
Because of the very nature of such a catalogue, namely the description of physical objects and the administration of processing them, there are no explicit relations between the different editions and translations of the same book, nor are there descriptions of individual journal articles. If you do a search on a specific person’s name, you may end up with a large number of result records, written by that person or someone with a similar name, or about that person, even with identical titles, without knowing if there is a relationship between them, and what that relationship might be. What’s certain is that you will not find journal articles written by or about that person. The same applies to a search on title. There is no way of telling if there is any relation between identical titles. A library catalogue user would have to look at specific metadata in the records (like MARC 76X-78X – Linking Entries, 534 – Original Version Note or 580 – Linking Entry Complexity Note), if available, to reach their own conclusions.
Most libraries nowadays also purchase electronic versions of books and journals (ebooks and ejournals) and have free or paid subscriptions to online databases. Sometimes these digital items (ebooks, ejournals and databases) are also entered into the traditional library catalogues, but they are sometimes also made available through other library systems, like federated search tools, integrated discovery tools, A-Z lists, etc. All kinds of combinations occur.
In traditional library catalogues digital items are treated exactly the same as their physical counterparts. They are all isolated individual items without relations. As Karen Coyle put it November 2010 at the SWIB10 conference: “The main goal of cataloguing today is to keep things apart” .
Basically, integrated library systems and traditional catalogues are nothing more than inventory and logistics systems for physical objects, mainly focused on internal workflows. Unfortunately in newer end user interfaces like federated search and integrated discovery tools the user experience in this respect has in general been similar to that of traditional public catalogues.
At some point in time during the rise of electronic online catalogues, apparently the lack of relations between different versions of the same original work became a problem. I’m not sure if it was library customers or librarians who started feeling the need to see these implicit connections made explicit. The fact is that IFLA (International Federation of Library Associations) started developing FRBR in 1998.
FRBR (Functional Requirements for Bibliographic Records) is an attempt to provide a model for describing the relations between physical publications, editions, copies and their common denominator, the Work.
FRBR Group 1 describes publications in terms of the entities Work, Expression, Manifestation and Item (WEMI).
FRAD (Functional Requirements for Authority Data – ‘authors’) and FRSAD (Functional Requirements for Subject Authority Data – ‘subjects’) have been developed later on as alternatives for the FRBR Group 2 and 3 entities.
As an example let’s have a look at The Diary of Anne Frank. The original handwritten diary may be regarded as the Work. There are numerous adaptations and translations (Expressions) of the original unfinished and unedited Work. Each of these Expressions can be published in the form of one or more prints, editions, etc. These are the Manifestations, especially if they have different ISBN’s. Finally a library can have one or more physical copies of a Manifestation, the Items.
Some might even say the actual physical diary is the only existing Item embodying one specific (the first) Expression of the Work (Anne’s thoughts) and/or the only Manifestation of that Expression.
Of course, this model, if implemented, would be an enormous improvement to the old public catalogue situation. It makes it possible for library customers to have an automatic overview of all editions, translations, adaptations of one specific original work through the mechanism of Expressions and Manifestations. RDA (Resource Description and Access) is exactly doing this.
However there are some significant drawbacks, because the FRBR model is an old model, based on the traditional way of library cataloguing of physical items (books, journals, and cd’s, dvd’s), etc. (Karen Coyle at SWIB10).
- In the first place the FRBR model only shows the Works and related Manifestations and Expressions of physical copies (Items) that the library in question owns. Editions not in the possession of the library are ignored. This would be a bit different in a union catalogue of course, but then the model still only describes the holdings of the participating libraries.
- Secondly, the focus on physical copies is also the reason that the original FRBR model does not have a place for journal titles as such, only for journal issues. So there will be as many entries for one journal as the library has issues of it.
- Thirdly, it’s a hierarchical model, which incorporates only relations from the Work top down. There is no room for relations like: ‘similar works’, ‘other material on the same subject’, ‘influenced by’, etc.
- In the fourth place, FRBR still does not look at content. It is document centric, instead of information centric. It does however have the option for describing parts of a Work, if they are considered separate entities/works, like journal articles or volumes of a trilogy.
- Finally, the FRBR Item entity is only interesting in a storage and logistics environment for physical copies, such as the Circulation function in libraries, or the Sales function in bookstores. It has no relation to content whatsoever.
FRBR definitely is a positive and necessary development, but it is just not good enough. Basically it still focuses on information carriers instead of information (it’s a set of rules for managing Bibliographic Records, not for describing Information). It is an introverted view of the world. This was OK as long as it was dictated by the prevailing technological, economical and social conditions.
In a new networked digital information world libraries should shift their focus back to their original objective: being gateways to information as such. This entails replacing an introverted hierarchical model with an extroverted networked one, and moving away from describing static information aggregates in favour of units of content as primary objects.
The linked data concept provides the framework of such a networked model. In this model anything can be related to anything, with explicit declarations of the nature of the relationship. In the example of the Diary of Anne Frank one could identify relations with movies and theater plays that are based on the diary, with people connected to the diary or with the background of World War 2, antisemitism, Amsterdam, etc.
In traditional library catalogues defining relations with movies or theater plays is not possible from the description of the book. They could however be entered as a textual reference in the description of a movie, if for instance a DVD of that movie is catalogued. Relations to people, World War 2, antisemitism and Amsterdam would be described as textual or coded references to a short concept description, which in turn could provide lists of other catalogue items indexed with these subjects.
In a networked linked data model these links could connect to information entities in their own right outside the local catalogue, containing descriptions and other material about the subject, and providing links to other related information entities.
FRBR would still be a valuable part of such a universal networked model, as a subset for a specific purpose. In the context of physical information carriers it is a useful model, although with some missing features, as described above. It could be used in isolation, as originally designed, but if it’s an open model, it would also provide the missing links and options to describe and find related information.
Also, the FRBR model is essential as a minimal condition for enabling links from library catalogue items to other entity types through the Work common denominator.
In a completely digital information environment, the model could be simplified by getting rid of the Item entity. Nobody needs to keep track of available copies of online digital information, unless publishers want to enforce the old business models they have been using in order to keep making a profit. Ebooks for instance are essentially Expressions or Manifestations, depending on their nature, as I stated in my post ’Is an e-book a book?’.
The FRBR model can be used and is used also in other subject areas, like music, theater performances, etc. The Work – Expression – Manifestation – Item hierarchy is applicable to a number of creative professions.
The networked model provides the option of describing all traditional library objects, but also other and new ones and even objects that currently don’t exist, because it is an open and adaptable model.
In the traditional library models it is for instance impossible, or at least very hard, to describe a story that continues through all volumes of a trilogy as a central thread, apart from and related to the descriptions of the three separate physical books and their own stories. In the Millennium trilogy by Stieg Larsson, Lisbeth Salander’s life story is the central thread, but it can’t be described as a separate “Work” in MARC/FRBR/RDA because it is not the main subject of one physical content carrier (unless we are dealing with an edition in one physical multi part volume). The three volumes will be described with the subjects ‘Missing girl mystery‘, ‘Sex trafficking‘ and ‘Illegal secret service unit‘ respectively.
In an open networked information model on the contrary it would be entirely possible to describe such a ‘roaming story’.
New forms of information objects could appear in the form of new types of aggregates, other than books or journal articles, for instance consisting of text, images, statistics and video, optionally of a flexible nature (dynamic instead of static information objects).
Existing library systems (ILS’s and Integrated Discovery tools alike), using bibliographic metadata formats and frameworks like MARC, FRBR and RDA, can’t easily deal with new developments without some sort of workaround. Obviously this means that if libraries want to continue playing a role in the information gateway world, they need completely different systems and technology. Library system vendors should take note of this.
Finally, instead of only describing information objects, libraries could take up a new role in creating new objects, in the form of subject based virtual information aggregates, like for instance the Anne Frank Timeline, or Qwiki.This would put libraries back in the center of the information access business.
Posted on December 21st, 2010 43 comments
Mobile services have to fulfill information needs here and now
Like many other libraries, the Library of the University of Amsterdam released a mobile web app this year. For background information about why and how we did it, have a look at the slideshow my colleague Roxana Popistasu and I gave at the IGeLU 2010 conference.
For now I want to have a closer look at the actual reception and use of our mobile library services and draw some conclusions for the future. I have expressed some expectations earlier about mobile library services in my post “Mobile library services”. In summary, I expected that the most valued mobile library services would be of a practical nature, directly tied to the circumstances of internet access ‘any time, anywhere’, and would not include reading and processing of electronic texts.
Let me emphasise that I define mobile devices as smart phones and similar small devices that can be carried around literally any time anywhere, and that need dedicated apps to be used on a small touchscreen. So I am not talking about tablets like the iPad, which are large enough to be used with standard applications and websites, just like netbooks.
As you can see, most, if not all of the services in the Library of the University of Amsterdam mobile app are of a practical nature: opening hours, locations, contact information, news. And of course there is a mobile catalogue. This is the general situation in mobile library land, as has been described by Aaron Tay in his blog post “What are mobile friendly library sites offering? A survey”.
In my view these practical services are not really library services. They are learning or study centre services at best. There is no difference with practical services offered by other organisations like municipal authorities or supermarkets. Nothing wrong with that of course, they are very useful, but I don’t consider these services to be core library services, which would involve enabling access to content.
Real mobile devices are simply to small to be used for reading and processing large bodies of scholarly text. This might be different for public libraries.Their customers may appreciate being able to read fiction on their smart phones, provided that publishers allow them to read ebooks via libraries at all.
Even a mobile library catalogue can be considered a practical service intended to fulfill practical needs of a physical nature, like finding and requesting print books and journals to be delivered to a specific location and renewing loans to avoid paying fines. Let’s face it: an Integrated Library System is basically nothing more than an inventory and logistics management system for physical objects.
Usage statistics of the Library of the University of Amsterdam mobile web app show that between the launch in April and November 2010 the number of unique visits evolves around 30 per day on average, with a couple of peaks (350) on two specific days in October. The full website shows around 6000 visits per day on normal weekdays.
For the mobile catalogue this is between 30 and 50 visits per day. The full OPAC shows around 3000 visits on normal weekdays.
In November we see a huge increase in usage. Our killer mobile app was introduced: an overview of currently available workstations per location. The number of unique visits rises to between 300 and 400 a day. The number of pageviews rises from under 100 per day to around 1000 on weekdays in November. The ‘available workstations’ service accounts for 80% of these. In December 2010, an exam period, these figures rise to around 2000 pageviews per day, with 90% for the ‘available workstations’ service.
We can safely conclude that our students are mainly using our mobile library app on their smart phones to locate the nearest available desktop PC.
Mobile users expect services that are useful to them here and now.
What does this mean for core library services, aimed at giving access to content, on small mobile devices? I think that there is no future for providing mobile access on smart phones to traditional library content in digital form: electronic articles and ebooks. I agree with Aaron Tay when he says “I don’t believe there is any reason to think that it will necessarily lead to high demand for library mobile services” in his post “A few heretical thoughts about library tech trends“.
Rather, mobile services should provide information about specific subjects useful to people here and now.
In the near future anybody interested in a specific physical object or location will have access via their location aware smart phones and augmented reality to information of all kinds (text, images, sound, video, maps, statistics, etc.) from a number of sources: museums, archives, government agencies, maybe even libraries. To make this possible it is essential that all these organisations publish their information as linked open data. This means: under an open license using a generic linked data protocol like RDF.
I expect that consumers of this new type of mobile location based augmented linked information would appreciate some guidance in the possibly overwhelming information landscape, in the form of specific views, with preselection of information sources and their context taken into account.
There may be an opportunity here for libraries, especially public libraries, taking on a new coordinating role as information brokers on the intersection of a large number of different information providers. Of course if libraires want to achieve that, they need to look beyond their traditional scope and invest more in new information technologies, services and expertise.
The future of mobile information services lies in the combination of location awareness, augmented reality and linked open data. Maybe libraries can help.
Posted on March 4th, 2010 3 comments
Location aware services in a digital library world
This is the third post in a series of three
While library systems technology and mobile apps architecture make up the technical and functional infrastructure of mobile web access, mobile library services are what it’s all about. What type of mobile services should libraries offer to their customers?
As stated before, the two main features that distinguish mobile, handheld devices from other devices are:
- web access any time anywhere
- location awareness
It seems obvious that libraries should take these two conditions into account when providing mobile services, not in the least the first one. I don’t think that mobile devices will completely replace other devices like pc’s and netbooks, like Google seems to think, but they will definitely be an important tool for lots of people, simply because they always carry a mobile phone with them. So in order to offer something extra, mobile applications should be focused on the situational circumstance of potential access to information any time anywhere, and make use of the location awareness of the device as much a possible. But does this also apply to services for library customers? That partly depends on the type of library (public, academic, special) and the physical and geographical structure of the library (one central location, branch locations).
As a starting point we can say that mobile library services should cover the total range of online library services already offered through traditional web interfaces. However, mobile users may not want to use certain library services on their mobile devices. For instance, from an analysis of usage statistics of EBSCO Mobile at the Library of Texas A&M University, generously provided by Bennett Ponsford, it appears that although the number of searches in EBSCO mobile is increasing, only 1% of mobile searches leads to a fulltext download, against 77% of regular EBSCO searches. These findings suggest that library customers, at least academic ones, are willing to search for books and articles on their mobile devices, but will postpone actually using them until they are in a more convenient environment. Apparently small screens and/or mobile PDF readers are not very reader friendly in academic settings. This may be different for public library customers and e-books.
So, libraries should concentrate on offering those mobile services that are wanted and will actually be used. In the beginning this may involve analysis of usage statistics and customer feedback to be able to determine the perfect mobile services suite for your library. Libraries should be prepared for “perpetual beta” and “agile development”.
There are two main areas of information in which libraries can offer mobile services:
- practical information
- bibliographical information
This is no different from other library information channels, like normal websites and printed guides and catalogues.
Practical information may consist of contact address, email and telephone information, opening hours, staff information, rules and regulations of any kind, etc. In most cases this is information that does not change very often, so static information pages will be sufficient. However, especially with mobile devices who’s owners are on the move, providing dynamic up to date information will give an advantage. For instance: today’s and tomorrow’s opening hours, number of currently available public workstations per location, etc.
The information provided will be even more precisely aimed at the user’s personal situation, if the “location awareness” feature is added to the “any time anywhere” feature, and up to date static and dynamic information for the locations in the immediate vicinity of the customer is shown first, using the device’s automatic geolocation properties. And all this gets better still if the library’s own information is mashed up with available online tools, like showing a location on Google Maps when selecting an address, and with the device’s tools, like making a phone call when clicking on a phone number.
Bibliographical information should be handled somewhat differently. Searching library catalogues or online databases is in essence not location dependent. Online digital bibliographical metadata is available “in the cloud” any time anywhere. It’s not the discovery but the delivery that makes the difference. We have already seen that mobile academic library customers do not download fulltext articles to their mobile devices. But mobile customers will definitely be interested in the possibility of requesting a print item to be delivered to them in the nearest location. WorldCat Mobile, like “normal” WorldCat, for instance offers the option to select a library manually from a list in order to find the nearest location to obtain an item from. It would of course be nice if the delivery location would be automatically determined by the mobile request service, using the device’s location awareness and the current opening hours of the library branches.
The funny thing here is that we have the paradoxical situation of state-of-the-art technology in a world of global online digital information being used to obtain “old fashioned” physical carriers of information (books) from the nearest physical location.
Augmented reality, as a link between the physical and virtual world, may be a valuable extension of mobile services. A frequently mentioned example is scanning a book cover or a barcode with the camera of a mobile phone and locating the item on Amazon. It would be helpful if your phone could automatically find and request the item in the nearest library branch. Personally I am not convinced that this is very valuable. Typing in ISBN or book title will do the job just as fast. Moreover, bookshop staff may not appreciate this behaviour.
A more common use of augmented reality would be to point the camera of your mobile device to a library building, after which a variety of information about the building is shown. The best known augmented reality app at the moment is Layar. This tool allows you to add a number of “layers”, with which you can for instance find the nearest ATM’s or museums, or Wikipedia information about physical objects or locations around you.
There is also a LibraryThing Local layer for Layar, with which you can find
information about all libraries, bookshops and book related events in the neighbourhood. It may even be possible to find a specific book in an open stack using this technology.
All these extended mobile applications suggest that users of apps may not just be a specific group of people (like library customers), but that mobile users will be interested in all kinds of useful information about their current location. Library information may be only a part of that. Maybe mobile apps should be targeted at a more general audience and include related information from other sources, making use of the linked data concept.
A search in a library catalog in this case may result in a list of books with links to related objects in a museum nearby or a historic location related to the subject of the book. Alternatively, an item in a museum website might have links to related literature in catalogs of nearby libraries. Anything is possible.
The question that remains is: should libraries take care of providing these generic location based services, or will others do that?
Posted on February 21st, 2010 11 comments
Technology, users and business models
This is the second post in a series of three
Mobile access to information on the internet is the latest step in the development of information systems technology, as described in the previous post in this series. The two main features that distinguish mobile devices from other devices are:
- Access to the web literally any time, anywhere
- Location awareness using GPS or the mobile network
Let’s focus on web access first. There are two main ways in which information providers can provide access to their data: by a mobile web browser or by apps.
The easiest way to provide mobile access is: do nothing. Users of mobile internet devices can simply visit all existing websites with their mobile browser. However, in doing so they will experience a number of problems: performance is slow, pages are too large, navigation is difficult, certain parts of websites don’t work. These problems are caused by the very physical characteristics of mobile technology that make mobile internet access possible: the small size of devices and displays, the wireless network, the limited features of dedicated mobile operating systems and browsers.
Fortunately, technological development is an interactive, reciprocal, cyclic process. Technology continuously needs to find solutions to problems that were caused by new uses of existing technology.
Many organisations have solved this problem by creating separate “dumbed down” mobile versions of their websites, containing mainly text only pages and textual links to their most important services and information. In the case of libraries for instance “locations and addresses“, “opening hours“, etc. See this list of examples (with thanks to Aaron Tay). Another example is LibraryThing Mobile, which also has a catalog search option. In these cases you have to manually point your browser to the dedicated mobile URL, unless the webserver is configured to automatically recognise mobile browsers and redirect them to the mobile site.
Of course this not the optimal solution for two reasons:
- On the front end: as an information provider you are complete ignoring all graphical, dynamic, interactive and web 2.0 functionality on the end user side. This means actually going back to the early days of the world wide web of static text pages.
- On the back end: duplicating system and content administration. In most cases it will come down to manually creating and editing HTML pages, because most website content management systems may not offer manual or automatic editing of pages for mobile access. Some systems offer automatic recognition of mobile browsers and display content in the appropriate format, like the WordPress plugin “WordPress Mobile Edition” that automatically shows a list of posts if mobile browsers are detected. This is what happens on this blog.
Because of this situation we are witnessing a re-enactment of the client-server alternative to static HTML that I described previously: mobile apps! “Apps” is short for “applications“, apparently everything needs to be short in the mobile online web 2.0 age. Apps are installed on mobile devices, they run locally making use of the hardware, operating system and user friendly interface of the device, and they only connect to the internet for retrieving data from a database system in the cloud (on a remote server).
A disadvantage of this solution obviously is that you have to multiply development and maintenance in order to support all mobile platforms that your customers are using, or just support the most used platform (iPhone) and ignore the rest of your end users. Alternatively you can support one mobile platform with an app, and the rest with a mobile web site. Organisations have the choice of developing apps themselves from scratch, or using one of the commercial parties that offer library apps, such as Boopsie, Blackboard or the recently announced LibraryThing Anywhere, that is meant to offer both mobile web and apps for iPhone, Blackberry and Android.
- TU Delft mobile app for iPhone (powered by Blackboard). University wide, including library. Haven’t been able to test this because I have an Android phone. An Android version will be developed. For other devices they offer a mobile website.
- Duke University mobile app for iPhone. University wide, including library. For other devices they offer a mobile website.
- Santa Clara County Library mobile apps for iPhone and Android (powered by Boopsie).
- WorldCat Mobile (powered by Boopsie).
An alternative solution to the client-server and “dumbed down” models would be to use the new HTML5 and CSS3 options to create websites that can easily be handled by all PC and mobile webbrowsers alike. HTML 5 has geolocation options, and browsers are made location aware this way too. The iWebKit Framework is a free and easy package to create web apps compatible for all mobile platforms. See this demo on PC, iPhone, Android, etc.
Some say that HTML5/CSS3 will make apps disappear, but I suspect performance may still be a problem, due to slow connections. But it’s not only a technology issue. It’s also a matter of business models, as Owen Stephens and Till Kinstler pointed out.
Apps can be distributed for free by organisations that want to draw traffic to their own data, ignoring the open web. This method fits their clasic business model, as Till remarked, mentioning the newspaper business as an example.
But there is also another side to this: apps can be created by anybody, making use of APIs to online systems and databases, and be shared with others for free or for a small fee, as is the case with the iPhone Apps Store, the Android Market, the Nokia Ovi Store, or the newly announced Wholesale Applications Community (WAC). This model will never be possible with web based apps (like HTML5), because nobody has access to a system’s web server other than the system administrators. It is also much too complicated for developers and consumers of apps to host web apps on a server that mobile device users can connect too.
And there is more: independent developers are more likely to look beyond the boundaries of the classic model of giving access to your own data only. Third party apps have the opportunity to connect data from a number of data sources in the cloud in order to satisfy mobile user needs better. To take the newspaper business example, I mentioned this in my post “Mobile reading“: general news apps vs dedicated newspaper apps. The rise of the open linked data movement will only boost the development and use of the mobile client server model.
In my view there will be a hybrid situation: HTML5/CSS3 based web apps and local mobile apps will coexist, depending on developer, audience, and objectives.
What services library mobile apps should offer, including location awareness and linking data, is the topic of another post.
Posted on December 29th, 2009 5 comments
On Sunday December 27, 2009 I was in the opportunity to visit the, otherwise closed, library of The Netherlands’ oldest museum Teylers museum in my home town Haarlem, together with a small group of Dutch library twitter people. We were very kindly shown around by librarian Marijn van Hoorn, who explained to us the library’s history and collection.
Now I’m not going to say something about the pleasant real life consequences of getting to know people in the virtual world (that has been done by @PeterMEvers already, in Dutch), or about the guided tour (already described very well by @underdutchskies in English and by @festinaatje and @ecobibl in Dutch). Also, none of the photos I made with my G1 phone are presentable; but you can have a look at the photos made by @Dymphie, @underdutchskies and @wbk500).
Instead, I will try to make a comparison between the old library’s course of life and the developments that modern libraries are going through, because I see some parallels there.
The museum was built in 1784 with money from the legacy of the wealthy banker and merchant Pieter Teyler van der Hulst, to preserve his collections and advance the arts and sciences. The museum’s library was established in 1826 to house a separate collection of books and journals in the field of natural history (botany, zoology, paleontology and geology).
One of the objectives for the library was to have a complete collection of all journals in the area of natural history. In the beginning the library was only accessible by invitation, and the honoured guests were welcomed and assisted by the “caretaker” or “landlord” of the museum.
By the middle of the 19th century the library opened up to a more general public, that is to say teaching and research staff members of the emerging universities.
But from 1870 the importance of the Teylers library for university staff declined drastically, because the universities in The Netherlands started to organise academic libraries of their own. So the library closed its doors for regular visitors. The collection continued to be maintained and expanded until 1987, when it was no longer realistic to pursue completeness.
During the 1970′s the privately funded museum and library faced the threat of closing down because of the cost of preserving the historical buildings and collections. In the written library catalog (created over time by a large number of volunteers and employees) all items were annotated with an estimated value in case of forced sale of the collection.
Fortunately the Dutch state decided to subsidise the historically and culturally valuable museum, and now Teylers is a very popular place, with a new wing with a large hall for temporary exhibitions, an educational section and a cafe.
The museum library is only open for visitors on request and on special occasions. The collection is not expanded anymore, but it is a very complete and valuable historical natural history collection, which is, among other things, used to organise temporary thematic exhibitions in the museum. Besides the natural history items there are also old maps and atlases and travel journals, like the James Cook journals by Sydney Parkinson that @jaapvandegeer drew my attention to.
The museum and the library are also looking to the future. Both museum objects and library items are being digitised, there is a European project for creating a website on ornithology that uses the library’s birds images, there is a new thematic website that combines documents, images, metadata from the museum, the library and external sources, the library catalog has been migrated to an Adlib system, and there is a Ning social network.
So, what are the parallels with modern libraries? First of all, it is clear that the influence of external developments on libraries is not something that is limited to the modern digital web age of Google. Just like Teylers library, modern public, academic and special libraries were at first targeted at a limited, well defined audience, and only accessible on location on specific times, after which their target audience and accessibility widened substantially. Catalogs and varying parts of the collection are available online to a global audience.
The external influence from competing university libraries is currently mirrored by the world wide web itself, with Google as one of the main external threats. I have written about this in my post “No future for libraries?“.
The two important issues here are: the effects on modern library collections and audience. Teylers library decided to stop building its own collection, but they keep using it in a number of ways: temporary physical thematic exhibitions, but also in new digital “mashed up” ways. This might be a good example for modern libraries to follow: make use of modern technologies to reuse existing collections to create virtual online thematic aggregations of data, texts, images, etc. See also my post “Collection 2.0“.
As for modern libraries’ response to changing audiences: proceeding with new ways of using their collections will draw new customers anyway. But it is equally important to find other ways to “go where your users are”, like being on social networks like the Teylers Ning site. One of the most important moves in the near future will be mobile presence.
Teylers library shows us that there may be a new life for old libraries.
Posted on October 16th, 2009 3 comments
Metasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
Posted on August 20th, 2009 12 comments
On August 17, after I tested a search in our new Aleph OPAC and mentioned my surprise on Twitter, the following discussion unfolded between me (lukask), Ed Summers of the Library of Congress and Till Kinstler of GBV (German Union Library Network):
- lukask: Just found out we only have one item about RDF in our catalogue: http://tinyurl.com/lz75c4
- edsu: @lukask broaden that search http://is.gd/2l6vB
- lukask: @edsu Ha! Thanks! But I’m sure that RDF will be mentioned in these 29 titles! A case for social tagging!
- edsu: @lukask or better cataloging
- edsu: @lukask i guess they both amount to the same thing eh?
- lukask: @edsu That’s an interesting position…”social tagging=better cataloging”. I will ask my cataloguing co-workers about this specific example
- edsu: @lukask make sure to wear body-armor
- lukask: @edsu Yes I know! I will bring it up at tomorrow’s party for the celebration of our ALEPH STP (after some drinks…)
- tillk: @edsu @lukask or fulltext search… SCNR…
- edsu: @tillk yeah, totally — with projects like @googlebooks and @hathitrust we may look back on the age of cataloging with different eyes …
- lukask: @tillk @edsu Fulltext search yes, or “implicit automatic metadata generation”?
What happened here was:
- A problem with findability of specific bibliographic items was observed: although it is highly unlikely that books about the Semantic Web will not cover RDF-Resource Description Framework, none of the 29 titles found with “Semantic Web” could be found with the search term “Resource Description Framework“; on the other side, the only item found with “Resource Description Framework” was NOT found with “Semantic Web“. I must add that the “Semantic web” search was an “All words” search. Only 20 of the results were indexed with the Dutch subject heading “Semantisch web” (which term is never used in real life as far as I know; the English term is an international concept). Some results were off topic, they just happened to have “semantic” and “web” somewhere in their metadata. A better search would have been a phrase search (adjacent) with “semantic web” in actual quotes, which gives 26 items. But of these, a small number were not indexed with subject heading “Semantisch web“. Another note: searching with “RDF” gets you all kinds of results. Read more on the issue of searching and relevance in my post Relevance in context.
- Four possible solutions were suggested:
- social tagging
- better cataloging
- fulltext searching
- automatic metadata generation
Clearly, the 26 items found with the search “Semantic web” are not indexed by the “Resource description framework” or “RDF” subject heading. There is not even a subject heading for “Resource description framework” or “RDF“. In my personal view, from my personal context, this is an omission. Mind you, this is not only an issue in the catalogue of the Library of the University of Amsterdam, it is quite common. I tried it in the British Library Integrated Catalogue with similar results. Try it in your own OPAC!
I presume that our professional cataloging colleagues can’t know everything about all subjects. That is completely understandable. I would not know how to catalog a book about a medical subject myself either! But this is exactly the point. If you allow end users to add their own tags to your bibliographic records, you enhance the findability of these records for specific groups of end users.
I am not saying that cataloguing and indexing by library specialists using controlled vocabularies should be replaced by social tagging! No, not at all. I am just saying that both types of tagging/indexing are complementary. Sure, some of the tags added by end users may not follow cataloging standards, but who cares? Very often the end users adding tags of their own will be professional experts in their field. In any case, items with social tags will be found more often because specific end user groups can find them searching with their own terms.
I suppose Ed Summers was trying to say the same thing as I just did above, when he commented “or better cataloging, I guess they both amount to the same thing eh?“, which I summarised as “social tagging=better cataloging“, but he can correct me if I’m wrong.
Anyway, I hope I made it clear that I would not say “social tagging=better cataloging“, but rather “controlled vocabularies+social tagging=better cataloging“.
Or alternatively, could we improve cataloging by professional library catalogers? I must admit I do not know enough about library training and practice to say something about that. I am not a trained librarian. Don’t hesitate to comment!
Is fulltext searching the miracle cure for findability problems, as Till Kinstler seems to suggest? Maybe.
Suppose all our print material was completely digitised and available for fulltext search, I have no doubt that all 26 items mentioned above (the results of the “semantic web” all words search) would be found with the “resource description framework” or “rdf” search as well. But because fulltext search is by its very nature an “all words” search, the “rdf” fulltext search would also give a lot of “noise”, or items not having any relation to “semantic web” at all (author’s initials “R.D.F”, other acronyms “R.D.F.”, just see RDF in the BL catalogue). Again, see my post Relevance in context for an explanation of searching without context.
Also, there will be books or articles about a subject that will not contain the actual subject term at all. With fulltext search these items will not be found.
Moreover, fulltext searching actually limits the findable items to text, excluding other types, like images, maps, video, audio etc.
This brings me to the “final solution”:
Automatic metadata generation
Of course this is mostly still wishful thinking. But there are a number of interesting implementations already.
What I mean when I say “(implicit) automatic metadata generation” is: metadata that is NOT created deliberately by humans, but either generated and assigned as static metadata, or generated on the fly, by software, applying intelligent analysis to objects, of all types (text, images, audio, video, etc.).
In the case of our “rdf” example, such a tool would analyse a text and assign “rdf” as a subject heading based on the content and context of this text, even if the term “rdf” does not appear in the text at all. It would also discard texts containing the string “rdf” that refer to something completely different. Of course for this to succeed there should be some kind of contextual environment with links to other records or even other systems to be able to determine if certain terminology is linked to frequently used terms not mentioned in the text itself (here the Linked Data developments could play a major role).
The same principle should also apply to non-textual objects, so that images, audio, video etc. about the same subject can be found in one go. Google has some interesting implementations in this field already: image search by colour and content type: see for example the search for “rdf” in Google Images with colour “red”and content type “clip art”.
But of course there still needs a lot to be done.
Posted on May 24th, 2009 11 comments
Will library buildings and library catalogs survive the web?
Some weeks ago a couple of issues appeared in the twitter/blogosphere (or at least MY twitter/blogoshere) related to the future of the library in this digital era.
- There was the Espresso book machine that prints books on demand on location, which led to questions like: “apart from influencing publishing and book shops, what does this mean for libraries?“.
- There was a Twitter discussion about “will we still need library buildings?“.
- There was another blog post about the future of library catalogs by Edwin Mijnsbergen (in Dutch) that asked the question of the value of library catalogs in relation to web2.0 and the new emerging semantic web.
This made me start thinking about a question that concerns us all: is there a future for the library as we know it?
To begin with, what is a library anyway?
For ages, since the beginning of history, up until some 15 years ago, a library was an institution characterised by:
- a physical collection of printed and handwritten material
- a physical location, a building, to store the collection
- a physical printed or handwritten on site catalog
- on location searching and finding of information sources using the catalog
- on site requesting, delivery, reading, lending and returning of material
- a staff of trained librarians to catalog the collection and assist patrons
The central concept here is of course the collection. That is the “raison d’être” of a library. The purpose of library building, catalog and librarians is to give people access to the collection, and provide them with the information they need.
Clearly, because of the physical nature of the collection and the information transmission process the library needed to be a building with collection and catalog inside it. People had to go there to find and get the publications they needed.
If collections and the transmission of information were completely digital, then the reason for a physical location to go to for finding and getting publications would not exist anymore. Currently one of these conditions has been met fully and the other one partly. The transmission of information can take place in a completely digital way. Most new scientific publications are born digital (e-Journals, e-Books), and a large number of digitisation projects are taking care of making digital copies of existing print material.
Searching for items in a library’s collection is already taking place remotely through OPACs and other online tools almost everywhere. A large part of these collections can be accessed digitally. Only in case a patron wants to read or borrow a printed book or journal, he or she has to go the library building to fetch it.
All this seems to lead to the conclusion that the library may be slowly moving away from a physical presence to a digital one.
But there is something else to be considered here, that reaches beyond the limits of one library. In my view the crucial notion here is again the collection.
In my post Collection 2.0 I argue that in this digital information age a library’s collection is everything a library has access to as opposed to the old concept of everything a library owns. This means in theory that every library could have access to the same digital objects of information available on the web, but also to each other’s print objects through ILL. There will be no physically limited collection only available in one library anymore, just one large global collection.
In this case, there is not only no need for people to go to a specific library for an item in its collection, but also there is no need to search for items using a specific library’s catalog.
Now you may say that people like going to a library building and browse through the stacks. That may still be true for some, but in general, as I argue in my post “Open Stack 2.0“, the new Open Stack is the Web.
In the future there will be collections, but not physical ones (except of course for the existing ones with items that are not allowed to leave the library location). We will see virtual subject collections, determined by classifications and keywords assigned both by professionals and non-professionals.
On a parallel level there will be virtual catalogs, which are views on virtual collections defined by subjects on different levels and in different locations: global, local, subject-oriented, etc. These virtual collections and catalogs will be determined and maintained by a great number of different groups of people and institutions (commercial and non-commercial). One of these groups can still be a library. As Patrick Vanhoucke observed on Twitter (in Dutch): “We have to let go of the idea of the library as a building; the ‘library’ is the network of librarians“. These virtual groups of people may be identical to what is getting known more and more as “tribes“.
Having said all this, of course there will still be occurrences of libraries as buildings and as physical locations for collections. Institutions like the Library of Congress will not just vanish into thin air. Even if all print items have been digitised, print items will still be wanted for a number of reasons: research, art, among others. Libraries can have different functions, like archives, museums, etc. and still be named “libraries” too.
Library buildings can transform into other types of locations: in universities they can become meeting places and study facilities, including free wifi and Starbucks coffee. Public libraries can shift focus to becoming centres of discovery and (educational) gaming. Anything is possible.
It’s obvious that libraries obey the same laws of historical development as any other social institution or phenomenon. The way that information is found and processed is determined, or at least influenced, by the status of technological development. And I am not saying that all development is technology driven! This is not the place for a philosophy on history, economics and society.
Some historical parallels to illustrate the situation that libraries are facing:
- writing: inscribing clay tablets > scratching ink on paper > printing (multiplication, re-usability) > typewriter > computer/printer (digital multiplication and re-usability!) > digital only (computer files, blogs, e-journal, e-books)
- consumption of music: attending live performance on location > listening to radio broadcast > playing purchased recordings (vinyl, cassettes, cd, dvd) > make home recordings > play digital music with mp3/personal audio > listen to digital music online
From these examples it’s perfectly clear that new developments do not automatically make the old ways disappear! Prevailing practices can coexist with “outdated” ways of doing things. Libraries may still have a future.
In the end it comes down to these questions:
- Will libraries cease to exist, simply because they no longer serve the purpose of providing access to information?
- Are libraries engaged in a rear guard fight?
- Will libraries become tourist attractions?
- Will libraries adapt to the changing world and shift focus to serve other, related purposes?
- Are professional librarian skills useful in a digital information world?
I do not know what will happen with libraries. What do you think?