Posted on January 5th, 2012 33 comments
2011 has in a sense been the year of library linked data. Not that libraries of all kinds are now publishing and consuming linked data in great numbers. No. But we have witnessed the publication of the final report of the W3C Library Linked Data Incubator Group, the Library of Congress announcement of the new Bibliographic Framework for the Digital Age based on Linked Data and RDF, the release by a number of large libraries and library consortia of their bibliographic metadata, many publications, sessions and presentations on the subject.
All these events focus mainly on publishing library bibliographic metadata as linked open data. Personally I am not convinced that this is the most interesting type of data that libraries can provide. Bibliographic metadata as such describe publications, in the broadest sense, providing information about title, authors, subjects, editions, dates, urls, but also physical attributes like dimensions, number of pages, formats, etc. This type of information, in FRBR terms: Work, Expression and Manifestation metadata, is typically shared among a large number of libraries, publishers, booksellers, etc. ‘Shared’ in this case means ‘multiplied and redundantly stored in many different local systems‘. It doesn’t really make sense if all libraries in the world publish identical metadata side by side, does it?
In essence only really unique data is worth publishing. You link to the rest.
Currently, library data that is really unique and interesting is administrative information about holdings and circulation. After having found metadata about a potentially relevant publication it is very useful for someone to know how and where to get access to it, if it’s not freely available online. Do you need to go to a specific library location to get the physical item, or to have access to the online article? Do you have to be affiliated to a specific institution to be entitled to borrow or access it?
Usage data about publications, both print and digital, can be very useful in establishing relevance and impact. This way information seekers can be supported in finding the best possible publications for their specific circumstances. There are some interesting projects dealing with circulation data already, such as the research project by Magnus Pfeffer and Kai Eckert as presented at the SWIB 11 conference, and the JISC funded Library Impact Data project at the University of Huddersfield. The Ex Libris bX service presents article recommendations based on SFX usage log analysis.
The consequence of this assertion is that if libraries want to publish linked open data, they should focus on holdings and circulation data, and for the rest link to available bibliographic metadata as much as possible. It is to be expected that the Library of Congress’ New Bibliographic Framework will take care of that part one way or another.
In order to achieve this libraries should join forces with each other and with publishers and aggregators to put their efforts into establishing shared global bibliographic metadata pools accessible through linked open data. We can think of already existing data sources like WorldCat, OpenLibrary, Summon, Primo Central and the like. We can only hope that commercial bibliographic metadata aggregators like OCLC, SerialsSolutions and Ex Libris will come to realise that it’s in everybody’s interest to contribute to the realisation of the new Bibliographic Framework. The recent disagreement between OCLC and the Swedish National Library seems to indicate that this may take some time. For a detailed analysis of this see the blog post ‘Can linked library data disrupt OCLC? Part one’.
An interesting initiative in this respect is LibraryCloud, an open, multi-library data service that aggregates and delivers library metadata. And there is the HBZ LOBID project, which is targeted at ‘the conversion of existing bibliographic data and associated data to Linked Open Data‘.
So what would the new bibliographic framework look like? If we take the FRBR model as a starting point, the new framework could look something like this. See also my slideshow “Linked Open Data for libraries”, slides 39-42.
The basic metadata about a publication or a unit of content, on the FRBR Work level, would be an entry in a global datastore identified by a URI ( Uniform Resource Identifier). This datastore could for instance be WorldCat, or OpenLibrary, or even a publisher’s datastore. It doesn’t really matter. We don’t even have to assume it’s only one central datastore that contains all Work entries.
The thing identified by the URI would have a text string field associated with it containing the original title, let’s say “The Da Vinci Code” as an example of a book. But also articles can and should be identified this way. The basic information we need to know about the Work would be attached to it using URIs to other things in the linked data web. A set of two things linked by a URI is called a ‘triple’. ‘Author’ could for instance be a link to OCLC’s VIAF (http://viaf.org/viaf/102403515 = Dan Brown), which would then constitute a triple. If there are more authors, you simply add a URI for every person or institution. Subjects could be links to DBPedia/Wikipedia, Freebase, the Library of Congress Authority files, etc. There could be some more basic information, maybe a year, or a URI to a source describing the background of the work.
At the Expression level, a Dutch translation would have it’s own URI, stored in the same or another datastore. I could imagine that the publisher who commissioned the translation would maintain a datastore with this information. Attached to the Expression there would be the URI of the original Work, a URI pointing to the language, a URI identifying the translator and a text string contaning the Dutch title, among others.
Every individual edition of the work could have it’s own Manifestation level URI, with a link to the Expression (in this case the Dutch translation), a publisher URI, a year, etc. For articles published according to the long standing tradition of peer reviewed journals, there would also be information about the journal. On this level there should also be URIs to the actual content when dealing with digital objects like articles, ebooks, etc., no matter if access is free or restricted.
So far we have everything we need to know about publications “in the cloud”, or better: in a number of datastores available on a number of servers connected to the world wide web. This is more or less the situation described by OCLC’s Lorcan Dempsey in his recent post ‘Linking not typing … knowledge organization at the network level’. The only thing we need now is software to present all linked information to the user.
No libraries in sight yet. For accessing freely available digital content on the web you actually don’t need a library, unless you need professional assistance finding the correct and relevant information. Here we have identified a possible role of librarians in this new networked information model.
Now we have reached the interesting part: how to link local library data to this global shared model? We immediately discover that the original FRBR model is inadequate in this networked environment, because it implies a specific local library situation. Individual copies of a work (the Items) are directly linked to the Manifestation, because FRBR refers to the old local catalogue which describes only the works/publications one library actually owns.
In the global shared library linked data network we need an extra explicit level to link physical Items owned by the library or online subscriptions of the library to the appropriate shared network level. I suggest to use the “Holding” level. A Holding would have it’s own URI and contain URIs of the Manifestation and of the Library. A specific Holding in this way would indicate that a specific library has one or more copies (Items) of a specific edition of a work (Manifestation), or offers access to an online digital article by way of a subscription.
If a Holding refers to physical copies (print books or journal issues for instance) then we also need the Item level. An Item would have it’s own URI and the URI of the Holding. For each Item, extra information can be provided, for instance ‘availability’, ‘location’, etc. Local circulation administration data can be registered for all Holdings and Items. For online digital content we don’t need Items, only subscription information directly attached to the Holding.
Local Holding and Item information can reside on local servers within the library’s domain or just as well on some external server ‘in the cloud’.
It’s on the level of the Holding that usage statistics per library can be collected and aggregated, both for physical items and for digital material.
Now, this networked linked library data model still allows libraries to present a local traditional catalogue type interface, showing only information about the library’s own print and digital holdings. What’s needed is software to do this using the local Holdings as entry level.
But the nice thing about the model is that there will also be a lot of other options. It will also be possible to start at the other end and search all bibliographic metadata available in the shared global network, and then find the most appropriate library to get access to a specific publication, much like WorldCat does, but on an even larger scale.
Another nice thing of using triples, URIs and linked data, is that it allows for adding all kinds of other, non-traditional bibliographic links to the old inward looking library world, making it into a flexible and open model, ready for future developments. It will for instance be possible for people to discover links to publications and library holdings from any other location on the web, for instance a Wikipedia page or a museum website. And the other way around, from an item in local library holdings to let’s say a recorded theatre performance on YouTube.
When this new data and metadata framework will be in place, there will be two important issues to be solved:
- Getting new software, systems and tools for both back end administrative functions and front end information finding needs. For this we need efforts from traditional library systems vendors but also from developers in libraries.
- Establishing future roles for libraries, librarians and information professionals in the new framework. This may turn out to be the most important issue.
Posted on September 2nd, 2011 10 commentsShifting focus from information carriers back to information
Library catalogues have traditionally been used to describe and register books and journals and other physical objects that together constitute the holdings of a library. In an integrated library system (ILS), the public catalogue is combined with acquisition and circulation modules to administer the purchases of book copies and journal subscriptions on one side, and the loans to customers on the other side. The “I” for “Integrated” in ILS stands for an internal integration of traditional library workflows. Integration from a back end view, not from a customer perspective.
Because of the very nature of such a catalogue, namely the description of physical objects and the administration of processing them, there are no explicit relations between the different editions and translations of the same book, nor are there descriptions of individual journal articles. If you do a search on a specific person’s name, you may end up with a large number of result records, written by that person or someone with a similar name, or about that person, even with identical titles, without knowing if there is a relationship between them, and what that relationship might be. What’s certain is that you will not find journal articles written by or about that person. The same applies to a search on title. There is no way of telling if there is any relation between identical titles. A library catalogue user would have to look at specific metadata in the records (like MARC 76X-78X – Linking Entries, 534 – Original Version Note or 580 – Linking Entry Complexity Note), if available, to reach their own conclusions.
Most libraries nowadays also purchase electronic versions of books and journals (ebooks and ejournals) and have free or paid subscriptions to online databases. Sometimes these digital items (ebooks, ejournals and databases) are also entered into the traditional library catalogues, but they are sometimes also made available through other library systems, like federated search tools, integrated discovery tools, A-Z lists, etc. All kinds of combinations occur.
In traditional library catalogues digital items are treated exactly the same as their physical counterparts. They are all isolated individual items without relations. As Karen Coyle put it November 2010 at the SWIB10 conference: “The main goal of cataloguing today is to keep things apart” .
Basically, integrated library systems and traditional catalogues are nothing more than inventory and logistics systems for physical objects, mainly focused on internal workflows. Unfortunately in newer end user interfaces like federated search and integrated discovery tools the user experience in this respect has in general been similar to that of traditional public catalogues.
At some point in time during the rise of electronic online catalogues, apparently the lack of relations between different versions of the same original work became a problem. I’m not sure if it was library customers or librarians who started feeling the need to see these implicit connections made explicit. The fact is that IFLA (International Federation of Library Associations) started developing FRBR in 1998.
FRBR (Functional Requirements for Bibliographic Records) is an attempt to provide a model for describing the relations between physical publications, editions, copies and their common denominator, the Work.
FRBR Group 1 describes publications in terms of the entities Work, Expression, Manifestation and Item (WEMI).
FRAD (Functional Requirements for Authority Data – ‘authors’) and FRSAD (Functional Requirements for Subject Authority Data – ‘subjects’) have been developed later on as alternatives for the FRBR Group 2 and 3 entities.
As an example let’s have a look at The Diary of Anne Frank. The original handwritten diary may be regarded as the Work. There are numerous adaptations and translations (Expressions) of the original unfinished and unedited Work. Each of these Expressions can be published in the form of one or more prints, editions, etc. These are the Manifestations, especially if they have different ISBN’s. Finally a library can have one or more physical copies of a Manifestation, the Items.
Some might even say the actual physical diary is the only existing Item embodying one specific (the first) Expression of the Work (Anne’s thoughts) and/or the only Manifestation of that Expression.
Of course, this model, if implemented, would be an enormous improvement to the old public catalogue situation. It makes it possible for library customers to have an automatic overview of all editions, translations, adaptations of one specific original work through the mechanism of Expressions and Manifestations. RDA (Resource Description and Access) is exactly doing this.
However there are some significant drawbacks, because the FRBR model is an old model, based on the traditional way of library cataloguing of physical items (books, journals, and cd’s, dvd’s), etc. (Karen Coyle at SWIB10).
- In the first place the FRBR model only shows the Works and related Manifestations and Expressions of physical copies (Items) that the library in question owns. Editions not in the possession of the library are ignored. This would be a bit different in a union catalogue of course, but then the model still only describes the holdings of the participating libraries.
- Secondly, the focus on physical copies is also the reason that the original FRBR model does not have a place for journal titles as such, only for journal issues. So there will be as many entries for one journal as the library has issues of it.
- Thirdly, it’s a hierarchical model, which incorporates only relations from the Work top down. There is no room for relations like: ‘similar works’, ‘other material on the same subject’, ‘influenced by’, etc.
- In the fourth place, FRBR still does not look at content. It is document centric, instead of information centric. It does however have the option for describing parts of a Work, if they are considered separate entities/works, like journal articles or volumes of a trilogy.
- Finally, the FRBR Item entity is only interesting in a storage and logistics environment for physical copies, such as the Circulation function in libraries, or the Sales function in bookstores. It has no relation to content whatsoever.
FRBR definitely is a positive and necessary development, but it is just not good enough. Basically it still focuses on information carriers instead of information (it’s a set of rules for managing Bibliographic Records, not for describing Information). It is an introverted view of the world. This was OK as long as it was dictated by the prevailing technological, economical and social conditions.
In a new networked digital information world libraries should shift their focus back to their original objective: being gateways to information as such. This entails replacing an introverted hierarchical model with an extroverted networked one, and moving away from describing static information aggregates in favour of units of content as primary objects.
The linked data concept provides the framework of such a networked model. In this model anything can be related to anything, with explicit declarations of the nature of the relationship. In the example of the Diary of Anne Frank one could identify relations with movies and theater plays that are based on the diary, with people connected to the diary or with the background of World War 2, antisemitism, Amsterdam, etc.
In traditional library catalogues defining relations with movies or theater plays is not possible from the description of the book. They could however be entered as a textual reference in the description of a movie, if for instance a DVD of that movie is catalogued. Relations to people, World War 2, antisemitism and Amsterdam would be described as textual or coded references to a short concept description, which in turn could provide lists of other catalogue items indexed with these subjects.
In a networked linked data model these links could connect to information entities in their own right outside the local catalogue, containing descriptions and other material about the subject, and providing links to other related information entities.
FRBR would still be a valuable part of such a universal networked model, as a subset for a specific purpose. In the context of physical information carriers it is a useful model, although with some missing features, as described above. It could be used in isolation, as originally designed, but if it’s an open model, it would also provide the missing links and options to describe and find related information.
Also, the FRBR model is essential as a minimal condition for enabling links from library catalogue items to other entity types through the Work common denominator.
In a completely digital information environment, the model could be simplified by getting rid of the Item entity. Nobody needs to keep track of available copies of online digital information, unless publishers want to enforce the old business models they have been using in order to keep making a profit. Ebooks for instance are essentially Expressions or Manifestations, depending on their nature, as I stated in my post ’Is an e-book a book?’.
The FRBR model can be used and is used also in other subject areas, like music, theater performances, etc. The Work – Expression – Manifestation – Item hierarchy is applicable to a number of creative professions.
The networked model provides the option of describing all traditional library objects, but also other and new ones and even objects that currently don’t exist, because it is an open and adaptable model.
In the traditional library models it is for instance impossible, or at least very hard, to describe a story that continues through all volumes of a trilogy as a central thread, apart from and related to the descriptions of the three separate physical books and their own stories. In the Millennium trilogy by Stieg Larsson, Lisbeth Salander’s life story is the central thread, but it can’t be described as a separate “Work” in MARC/FRBR/RDA because it is not the main subject of one physical content carrier (unless we are dealing with an edition in one physical multi part volume). The three volumes will be described with the subjects ‘Missing girl mystery‘, ‘Sex trafficking‘ and ‘Illegal secret service unit‘ respectively.
In an open networked information model on the contrary it would be entirely possible to describe such a ‘roaming story’.
New forms of information objects could appear in the form of new types of aggregates, other than books or journal articles, for instance consisting of text, images, statistics and video, optionally of a flexible nature (dynamic instead of static information objects).
Existing library systems (ILS’s and Integrated Discovery tools alike), using bibliographic metadata formats and frameworks like MARC, FRBR and RDA, can’t easily deal with new developments without some sort of workaround. Obviously this means that if libraries want to continue playing a role in the information gateway world, they need completely different systems and technology. Library system vendors should take note of this.
Finally, instead of only describing information objects, libraries could take up a new role in creating new objects, in the form of subject based virtual information aggregates, like for instance the Anne Frank Timeline, or Qwiki.This would put libraries back in the center of the information access business.
Posted on December 21st, 2010 43 comments
Mobile services have to fulfill information needs here and now
Like many other libraries, the Library of the University of Amsterdam released a mobile web app this year. For background information about why and how we did it, have a look at the slideshow my colleague Roxana Popistasu and I gave at the IGeLU 2010 conference.
For now I want to have a closer look at the actual reception and use of our mobile library services and draw some conclusions for the future. I have expressed some expectations earlier about mobile library services in my post “Mobile library services”. In summary, I expected that the most valued mobile library services would be of a practical nature, directly tied to the circumstances of internet access ‘any time, anywhere’, and would not include reading and processing of electronic texts.
Let me emphasise that I define mobile devices as smart phones and similar small devices that can be carried around literally any time anywhere, and that need dedicated apps to be used on a small touchscreen. So I am not talking about tablets like the iPad, which are large enough to be used with standard applications and websites, just like netbooks.
As you can see, most, if not all of the services in the Library of the University of Amsterdam mobile app are of a practical nature: opening hours, locations, contact information, news. And of course there is a mobile catalogue. This is the general situation in mobile library land, as has been described by Aaron Tay in his blog post “What are mobile friendly library sites offering? A survey”.
In my view these practical services are not really library services. They are learning or study centre services at best. There is no difference with practical services offered by other organisations like municipal authorities or supermarkets. Nothing wrong with that of course, they are very useful, but I don’t consider these services to be core library services, which would involve enabling access to content.
Real mobile devices are simply to small to be used for reading and processing large bodies of scholarly text. This might be different for public libraries.Their customers may appreciate being able to read fiction on their smart phones, provided that publishers allow them to read ebooks via libraries at all.
Even a mobile library catalogue can be considered a practical service intended to fulfill practical needs of a physical nature, like finding and requesting print books and journals to be delivered to a specific location and renewing loans to avoid paying fines. Let’s face it: an Integrated Library System is basically nothing more than an inventory and logistics management system for physical objects.
Usage statistics of the Library of the University of Amsterdam mobile web app show that between the launch in April and November 2010 the number of unique visits evolves around 30 per day on average, with a couple of peaks (350) on two specific days in October. The full website shows around 6000 visits per day on normal weekdays.
For the mobile catalogue this is between 30 and 50 visits per day. The full OPAC shows around 3000 visits on normal weekdays.
In November we see a huge increase in usage. Our killer mobile app was introduced: an overview of currently available workstations per location. The number of unique visits rises to between 300 and 400 a day. The number of pageviews rises from under 100 per day to around 1000 on weekdays in November. The ‘available workstations’ service accounts for 80% of these. In December 2010, an exam period, these figures rise to around 2000 pageviews per day, with 90% for the ‘available workstations’ service.
We can safely conclude that our students are mainly using our mobile library app on their smart phones to locate the nearest available desktop PC.
Mobile users expect services that are useful to them here and now.
What does this mean for core library services, aimed at giving access to content, on small mobile devices? I think that there is no future for providing mobile access on smart phones to traditional library content in digital form: electronic articles and ebooks. I agree with Aaron Tay when he says “I don’t believe there is any reason to think that it will necessarily lead to high demand for library mobile services” in his post “A few heretical thoughts about library tech trends“.
Rather, mobile services should provide information about specific subjects useful to people here and now.
In the near future anybody interested in a specific physical object or location will have access via their location aware smart phones and augmented reality to information of all kinds (text, images, sound, video, maps, statistics, etc.) from a number of sources: museums, archives, government agencies, maybe even libraries. To make this possible it is essential that all these organisations publish their information as linked open data. This means: under an open license using a generic linked data protocol like RDF.
I expect that consumers of this new type of mobile location based augmented linked information would appreciate some guidance in the possibly overwhelming information landscape, in the form of specific views, with preselection of information sources and their context taken into account.
There may be an opportunity here for libraries, especially public libraries, taking on a new coordinating role as information brokers on the intersection of a large number of different information providers. Of course if libraires want to achieve that, they need to look beyond their traditional scope and invest more in new information technologies, services and expertise.
The future of mobile information services lies in the combination of location awareness, augmented reality and linked open data. Maybe libraries can help.
Posted on July 8th, 2010 2 comments
Meeting new user expectations at ELAG 2010
In the near future libraries and librarians will be very different from what they are now. That’s the overall impression I took away from the ELAG 2010 conference in Helsinki, June 8-11, 2010. ELAG stands for “European Library Automation Group”, which is an indication of its age (34 years): “automation” was then what is now “ICT”. The meetings are characterised by a combination of plenary presentations and parallel workshops.
This year’s theme was “Meeting new users’ expectations”, where the term “users” refers to “end users”, “customers” or “patrons”, as library customers are also called. When you hear the phrase “end user expectations” in relation to library technology you first of all think of front end functionality (user interfaces and services) and the changing experiences there. A number of presentations and workshops were indeed focused on user experience and user studies.
Keywords: discovery, guidance, knowing/engaging users, relevance ranking, context.
But a considerable number of sessions, maybe even the majority, were dedicated to backend technology and systems development.
Keywords: webservices, API, REST, JSON, XML, Xpath, SOLR, data wells, aggregation, identifiers, FRBR, linked data, RDF.
It is becoming ever more obvious that improving libraries’ digital user experience cannot be accomplished without proper data infrastructures and information systems and services. This is directly related to the shift of existing library traditions to the new web experience, which was the leading topic of the presentation given by Rosemie Callewaert and myself: “Discovering the library collections”. We are experiencing a move from closed local physical collections to open networked digital information.
First of all, library collections will be digital. If you don’t believe that, look at the music industry. The recording of stories started 5000 years ago already. The first music recordings only date from the 19th century.
Next, collections will be networked, interlinked and virtual. Data, metadata, and digital objects will be fetched from all kinds of databases on the web, not only traditional bibliographic metadata from library catalogues, and mixed into new result sets, using mashup or linked data techniques.
In this open digital environment, existing and new library systems and discovery tools simply cannot incorporate all possible data services available now and in the future. That is why libraries (or maybe we should start saying ‘information brokers’) MUST have ‘developer skills’ in one form or another. This can range from building your own data wells and discovery tools on one end to using existing online service builders for enriching third party frontends on the other, and everything in between, with different levels of skills required.
Another inevitable development in this open information environment is “cooperation” in all kinds of areas with all kinds of partners in all kinds of forms. Cooperation in development, procurement, hosting and sharing of software (systems, services) and aggregation of data, with libraries, museums, archives, educational institutions, commercial partners, etc.
Last but not least there is the question of the value of the physical library building in the digital age. A number of people stress the importance of libraries as places where students like to come to study. But being a learning center in my view is not part of the core business of a library, which is providing access to information. In pre-digital times it was obviously a natural and necessary thing to study information at the location of the physical collection. But this direct physical link between access to and processing of information does not exist anymore in an open digital information environment.
Back to the ELAG 2010 theme “Meeting new users’ expectations”. In the last slide of our presentation we asked the question “Can LIBRARIES meet new user expectations?” Because we did not have time to discuss it then and there, I will answer it here: “No, not libraries as they are now!”.
New users don’t expect libraries, they expect information services. Libraries were once the best way of providing access to information. Instead of taking the defensive position of trying to secure their survival as organisation (as is the natural aspiration of organisations) libraries should focus on finding new ways of achieving their original mission. This may even lead to the disappearance of libraries, or rather the replacement of the library organisation by other organisational structures. This may of course vary between types of libraries (public, academic, special, etc.).
We may need to redefine the concept of library from “the location of a physical collection” to “a set of information services administered by a group of specialists”.
To summarise: the new digital and networked nature of collections of information leads to a focus on new information services, supported by library staff with information and technology skills, in new organisational structures and in cooperation with other organisations.
Posted on March 4th, 2010 3 comments
Location aware services in a digital library world
This is the third post in a series of three
While library systems technology and mobile apps architecture make up the technical and functional infrastructure of mobile web access, mobile library services are what it’s all about. What type of mobile services should libraries offer to their customers?
As stated before, the two main features that distinguish mobile, handheld devices from other devices are:
- web access any time anywhere
- location awareness
It seems obvious that libraries should take these two conditions into account when providing mobile services, not in the least the first one. I don’t think that mobile devices will completely replace other devices like pc’s and netbooks, like Google seems to think, but they will definitely be an important tool for lots of people, simply because they always carry a mobile phone with them. So in order to offer something extra, mobile applications should be focused on the situational circumstance of potential access to information any time anywhere, and make use of the location awareness of the device as much a possible. But does this also apply to services for library customers? That partly depends on the type of library (public, academic, special) and the physical and geographical structure of the library (one central location, branch locations).
As a starting point we can say that mobile library services should cover the total range of online library services already offered through traditional web interfaces. However, mobile users may not want to use certain library services on their mobile devices. For instance, from an analysis of usage statistics of EBSCO Mobile at the Library of Texas A&M University, generously provided by Bennett Ponsford, it appears that although the number of searches in EBSCO mobile is increasing, only 1% of mobile searches leads to a fulltext download, against 77% of regular EBSCO searches. These findings suggest that library customers, at least academic ones, are willing to search for books and articles on their mobile devices, but will postpone actually using them until they are in a more convenient environment. Apparently small screens and/or mobile PDF readers are not very reader friendly in academic settings. This may be different for public library customers and e-books.
So, libraries should concentrate on offering those mobile services that are wanted and will actually be used. In the beginning this may involve analysis of usage statistics and customer feedback to be able to determine the perfect mobile services suite for your library. Libraries should be prepared for “perpetual beta” and “agile development”.
There are two main areas of information in which libraries can offer mobile services:
- practical information
- bibliographical information
This is no different from other library information channels, like normal websites and printed guides and catalogues.
Practical information may consist of contact address, email and telephone information, opening hours, staff information, rules and regulations of any kind, etc. In most cases this is information that does not change very often, so static information pages will be sufficient. However, especially with mobile devices who’s owners are on the move, providing dynamic up to date information will give an advantage. For instance: today’s and tomorrow’s opening hours, number of currently available public workstations per location, etc.
The information provided will be even more precisely aimed at the user’s personal situation, if the “location awareness” feature is added to the “any time anywhere” feature, and up to date static and dynamic information for the locations in the immediate vicinity of the customer is shown first, using the device’s automatic geolocation properties. And all this gets better still if the library’s own information is mashed up with available online tools, like showing a location on Google Maps when selecting an address, and with the device’s tools, like making a phone call when clicking on a phone number.
Bibliographical information should be handled somewhat differently. Searching library catalogues or online databases is in essence not location dependent. Online digital bibliographical metadata is available “in the cloud” any time anywhere. It’s not the discovery but the delivery that makes the difference. We have already seen that mobile academic library customers do not download fulltext articles to their mobile devices. But mobile customers will definitely be interested in the possibility of requesting a print item to be delivered to them in the nearest location. WorldCat Mobile, like “normal” WorldCat, for instance offers the option to select a library manually from a list in order to find the nearest location to obtain an item from. It would of course be nice if the delivery location would be automatically determined by the mobile request service, using the device’s location awareness and the current opening hours of the library branches.
The funny thing here is that we have the paradoxical situation of state-of-the-art technology in a world of global online digital information being used to obtain “old fashioned” physical carriers of information (books) from the nearest physical location.
Augmented reality, as a link between the physical and virtual world, may be a valuable extension of mobile services. A frequently mentioned example is scanning a book cover or a barcode with the camera of a mobile phone and locating the item on Amazon. It would be helpful if your phone could automatically find and request the item in the nearest library branch. Personally I am not convinced that this is very valuable. Typing in ISBN or book title will do the job just as fast. Moreover, bookshop staff may not appreciate this behaviour.
A more common use of augmented reality would be to point the camera of your mobile device to a library building, after which a variety of information about the building is shown. The best known augmented reality app at the moment is Layar. This tool allows you to add a number of “layers”, with which you can for instance find the nearest ATM’s or museums, or Wikipedia information about physical objects or locations around you.
There is also a LibraryThing Local layer for Layar, with which you can find
information about all libraries, bookshops and book related events in the neighbourhood. It may even be possible to find a specific book in an open stack using this technology.
All these extended mobile applications suggest that users of apps may not just be a specific group of people (like library customers), but that mobile users will be interested in all kinds of useful information about their current location. Library information may be only a part of that. Maybe mobile apps should be targeted at a more general audience and include related information from other sources, making use of the linked data concept.
A search in a library catalog in this case may result in a list of books with links to related objects in a museum nearby or a historic location related to the subject of the book. Alternatively, an item in a museum website might have links to related literature in catalogs of nearby libraries. Anything is possible.
The question that remains is: should libraries take care of providing these generic location based services, or will others do that?
Posted on February 21st, 2010 11 comments
Technology, users and business models
This is the second post in a series of three
Mobile access to information on the internet is the latest step in the development of information systems technology, as described in the previous post in this series. The two main features that distinguish mobile devices from other devices are:
- Access to the web literally any time, anywhere
- Location awareness using GPS or the mobile network
Let’s focus on web access first. There are two main ways in which information providers can provide access to their data: by a mobile web browser or by apps.
The easiest way to provide mobile access is: do nothing. Users of mobile internet devices can simply visit all existing websites with their mobile browser. However, in doing so they will experience a number of problems: performance is slow, pages are too large, navigation is difficult, certain parts of websites don’t work. These problems are caused by the very physical characteristics of mobile technology that make mobile internet access possible: the small size of devices and displays, the wireless network, the limited features of dedicated mobile operating systems and browsers.
Fortunately, technological development is an interactive, reciprocal, cyclic process. Technology continuously needs to find solutions to problems that were caused by new uses of existing technology.
Many organisations have solved this problem by creating separate “dumbed down” mobile versions of their websites, containing mainly text only pages and textual links to their most important services and information. In the case of libraries for instance “locations and addresses“, “opening hours“, etc. See this list of examples (with thanks to Aaron Tay). Another example is LibraryThing Mobile, which also has a catalog search option. In these cases you have to manually point your browser to the dedicated mobile URL, unless the webserver is configured to automatically recognise mobile browsers and redirect them to the mobile site.
Of course this not the optimal solution for two reasons:
- On the front end: as an information provider you are complete ignoring all graphical, dynamic, interactive and web 2.0 functionality on the end user side. This means actually going back to the early days of the world wide web of static text pages.
- On the back end: duplicating system and content administration. In most cases it will come down to manually creating and editing HTML pages, because most website content management systems may not offer manual or automatic editing of pages for mobile access. Some systems offer automatic recognition of mobile browsers and display content in the appropriate format, like the WordPress plugin “WordPress Mobile Edition” that automatically shows a list of posts if mobile browsers are detected. This is what happens on this blog.
Because of this situation we are witnessing a re-enactment of the client-server alternative to static HTML that I described previously: mobile apps! “Apps” is short for “applications“, apparently everything needs to be short in the mobile online web 2.0 age. Apps are installed on mobile devices, they run locally making use of the hardware, operating system and user friendly interface of the device, and they only connect to the internet for retrieving data from a database system in the cloud (on a remote server).
A disadvantage of this solution obviously is that you have to multiply development and maintenance in order to support all mobile platforms that your customers are using, or just support the most used platform (iPhone) and ignore the rest of your end users. Alternatively you can support one mobile platform with an app, and the rest with a mobile web site. Organisations have the choice of developing apps themselves from scratch, or using one of the commercial parties that offer library apps, such as Boopsie, Blackboard or the recently announced LibraryThing Anywhere, that is meant to offer both mobile web and apps for iPhone, Blackberry and Android.
- TU Delft mobile app for iPhone (powered by Blackboard). University wide, including library. Haven’t been able to test this because I have an Android phone. An Android version will be developed. For other devices they offer a mobile website.
- Duke University mobile app for iPhone. University wide, including library. For other devices they offer a mobile website.
- Santa Clara County Library mobile apps for iPhone and Android (powered by Boopsie).
- WorldCat Mobile (powered by Boopsie).
An alternative solution to the client-server and “dumbed down” models would be to use the new HTML5 and CSS3 options to create websites that can easily be handled by all PC and mobile webbrowsers alike. HTML 5 has geolocation options, and browsers are made location aware this way too. The iWebKit Framework is a free and easy package to create web apps compatible for all mobile platforms. See this demo on PC, iPhone, Android, etc.
Some say that HTML5/CSS3 will make apps disappear, but I suspect performance may still be a problem, due to slow connections. But it’s not only a technology issue. It’s also a matter of business models, as Owen Stephens and Till Kinstler pointed out.
Apps can be distributed for free by organisations that want to draw traffic to their own data, ignoring the open web. This method fits their clasic business model, as Till remarked, mentioning the newspaper business as an example.
But there is also another side to this: apps can be created by anybody, making use of APIs to online systems and databases, and be shared with others for free or for a small fee, as is the case with the iPhone Apps Store, the Android Market, the Nokia Ovi Store, or the newly announced Wholesale Applications Community (WAC). This model will never be possible with web based apps (like HTML5), because nobody has access to a system’s web server other than the system administrators. It is also much too complicated for developers and consumers of apps to host web apps on a server that mobile device users can connect too.
And there is more: independent developers are more likely to look beyond the boundaries of the classic model of giving access to your own data only. Third party apps have the opportunity to connect data from a number of data sources in the cloud in order to satisfy mobile user needs better. To take the newspaper business example, I mentioned this in my post “Mobile reading“: general news apps vs dedicated newspaper apps. The rise of the open linked data movement will only boost the development and use of the mobile client server model.
In my view there will be a hybrid situation: HTML5/CSS3 based web apps and local mobile apps will coexist, depending on developer, audience, and objectives.
What services library mobile apps should offer, including location awareness and linking data, is the topic of another post.
Posted on February 16th, 2010 11 comments
The connection between information technology and library information systems
This is the first post in a series of three
The functions, services and audience of library information systems, as is the case with all information systems, have always been dependent on and determined by the existing level of information technology. Mobile devices are the latest step in this development.
If you made a typo (puncho?), you were not informed until a day later when you collected the printout, and you could start again. System and data files could be stored externally on large tape reels or small tape cassettes, identical to music tapes. Tapes were also used for sharing and copying data between systems by means of physical transportation.
Suddenly there was a human operable terminal, consisting of a monitor and keyboard, connected to the central computer. Now you could type in your code and save it as a file on the remote server (no local processing or storage at all). If you were lucky you had a full screen editor, if not there was the line editor. No graphics. Output and errors were shown on screen almost immediately, depending on the capacity of the CPU (central processing unit) and the number of other batch jobs in the queue. The computer was a multi-user time sharing device, a bit like the “cloud”, but every computer was a little cloud of its own.
There was no email. There were no end users other than systems administrators, programmers and some staff. Communication with customers was carried out by sending them printouts on paper by snail mail.
I guess this was the first time that some libraries, probably mainly in academic and scientific institutions, started creating digital catalogs, for staff use only of course.
Then came the PC (Personal Computer). Terminal and keyboard were now connected to the computer (or system unit) on your desk. You had the thing entirely to yourself! Input and output consisted of lines of text only, one colour (green or white on black), and still no graphics. Files could be stored on floppy disks, 5¼-inch magnetic things that you could twist and bend, but if you did that you lost your data. There was no internal storage. File sharing was accomplished by moving the floppy from one PC to another and/or copy files from one floppy to another (on the same floppy drive).
Later we got smaller disks, 3½-inch, in protective cases. The PC was mainly used for early word processing (WordStar, WordPerfect) and games. Finally there was a hard disk (as opposed to “floppy” disk) inside the PC system unit, which held the operating system (mainly MS-DOS), and on which you could store your files, which became larger. Time for stand-alone database applications (dBase).
Then there was Windows, a mouse, and graphics. And of course the Internet! You could connect your PC to the Internet with a modem that occupied your telephone line and made phone calls impossible during your online session. At first there was Gopher, a kind of text based web.
Then came the World Wide Web (web 0.0), consisting of static web pages with links to other static web pages that you could read on your PC. Not suitable for interactive systems. Libraries could publish addresses and opening hours.
But fortunately we got client-server architecture, combining the best of both worlds. Powerful servers were good at processing, storing and sharing data. PC’s were good at presenting and collecting data in a “user friendly” graphical user interface (GUI), making use of local programming and scripting languages. So you had to install an application on the local PC which then connected to the remote server database engine. The only bad thing was that the application was tied to the specific PC, with local Windows configuration settings. And it was not possible to move the thing around.
Now we had multi-user digital catalogs with a shared central database and remote access points with the client application installed, available to staff and customers.
Luckily dynamic creation of HTML pages came along, so we were able to move the client part of client-server applications to the web as well. With web applications we were able to use the same applications anywhere on any computer linked to the world wide web. You only needed a browser to display the server side pages on the local PC.
Now everybody could browse through the library catalog any time, anywhere (where there was a computer with an internet connection and a web browser). The library OPAC (Online Public Access Catalog) was born.
The only disadvantage was that every page change had to be generated by the server again, so performance was not optimal.
In the meantime the portable PC appeared, system unit, monitor and keyboard all in one. At first you needed some physical power to move the thing around, but later we got laptops, notebooks, netbooks, getting smaller, lighter and more powerful all the time. And wifi of course, no need to plug the device in to the physical network anymore. And USB-sticks.
Access to OPAC and online databases became available anytime, anywhere (where you carried your computer).
The latest development of course is the rise of mobile phones with wireless web access, or rather mobile web devices which can also be used for making phone calls. Mobile devices are small and light enough to carry with you in your pocket all the time. It’s a tiny PC.
Finally you can access library related information literally any time, anywhere, even in your bedroom and bathroom.
It’s getting boring, but yes, there is a drawback. Web applications are not really accommodated for use in mobile browsers: pages are too large, browser technology is not really compatible, connections are too slow.
Available options are:
- creating a special “dumbed down” version of a website for use on mobile devices only: smaller text based pages with links
- creating a new HTML5/CSS3 website, targeted at mobile devices and “traditional” PC’s alike
- creating “apps”, to be installed on mobile devices and connect to a database system in the cloud; basically this is the old client-server model all over again.
A comparison of mobile apps and mobile web architecture is the topic of another post.
Posted on October 16th, 2009 3 comments
Metasearch vs. harvesting & indexing
The other day I gave a presentation for the Assembly of members of the local Amsterdam Libraries Association “Adamnet“, about the Amsterdam Digital Library search portal that we host at the Library of the University of Amsterdam. This portal is built with our MetaLib metasearch tool and offers simultaneous access to, at the moment, 20 local library catalogues.
A large part of this presentation was dedicated to all possible (and very real) technical bottlenecks of this set-up, with the objective of improving coordination and communication between the remote system administrators at the participating libraries and the central portal administration. All MetaLib database connectors/configurations are “home-made”, and the portal highly depends on the availability of the remote cataloging systems.
I took the opportunity to explain to my audience also the “issues” inherent in the concept of metasearch (or “federated search“, “distributed search“, etc.), and compare that to the harvesting & indexing scenario.
Because it was not the first (nor last) time that I had to explain the peculiarities of metasearch, I decided to take the Metasearch vs. Harvesting & Indexing part of the presentation and extend it to a dedicated slideshow. You can see it here, and you are free to use it. Examples/screenshots are taken from our MetaLib Amsterdam Digital Library portal. But everything said applies to other metasearch tools as well, like Webfeat, Muse Global, 360-Search, etc.
The slideshow is meant to be an objective comparison of the two search concepts. I am not saying that Metasearch is bad, and H&I is good, that would be too easy. Some five years ago Metasearch was the best we had, it was a tremendous progress beyond searching numerous individual databases separately. Since then we have seen the emergence of harvesting & indexing tools, combined with “uniform discovery interfaces”, such as Aquabrowser, Primo, Encore, and the OpenSource tools VuFind, SUMMA, Meresco, to name a few.
Anyway, we can compare the main difference between Metasearch and H&I to the concepts “Just in time” and “Just in case“, used in logistics and inventory management.
With Metasearch, records are fetched on request (Just in time), with the risk of running into logistics and delivery problems. With H&I, all available records are already there (Just in case), but maybe not the most recent ones.
Objectively of course, H&I can solve the problems inherent in Metasearch, and therefore is a superior solution. However, a number of institutions, mainly general academic libraries, will for some time depend on databases that can’t be harvested because of technical, legal or commercial reasons.
In other cases, H&I is the best option, for instance in the case of cooperating local or regional libraries, such as Adamnet, or dedicated academic or research libraries that only depend on a limited number of important databases and catalogs.
But I also believe that the real power of H&I can only be taken advantage of, if institutions cooperate and maintain shared central indexes, instead of building each their own redundant metadata stores. This already happens, for instance in Denmark, where the Royal Library uses Primo to access the national DADS database.
We also see commercial hosted H&I initiatives implemented as SaaS (Software as a Service) by both tool vendors and database suppliers, like Ex Libris’ PrimoCentral, SerialSolutions’ Summon and EBSCOhost Integrated Search.
The funny thing is, that if you want to take advantage of all these hosted harvested indexes, you are likely to end up with a hybrid kind of metasearch situation where you distribute searches to a number of remote H&I databases.
Posted on October 6th, 2009 1 comment
What will library staff do 5 years from now?
I attended the IGeLU 2009 annual conference in Helsinki September 6-9. IGeLU is the International Group of Ex Libris Users, an independent organisation that represents Ex Libris customers. Just to state my position clearly I would like to add that I am a member of the IGeLU Steering Committee.
These annual user group meetings typically have three types of sessions: internal organisational sessions (product working groups and steering committee business meetings, elections), Ex Libris sessions (product updates, Q&A, strategic visions), and customer sessions (presentations of local solutions, addons, developments).
Not surprisingly, the main overall theme of this conference was the future of library systems and libraries. The word that characterises the conference best in my mind (besides “next generation“and “metaphor“) is “roadmap“. All Ex Libris products but also all attending libraries are on their way to something new, which strangely enough is still largely uncertain.
Ex Libris presented the latest state of design and development of their URM (Unified Resource Management) project, ‘A New Model for Next-generation Library Services’. In the final URM environment all back end functionality of all current Ex Libris products will be integrated into one big modular system, implemented in a SaaS (“Software as a Service“) architecture. In the Ex Libris vision the front end to this model will be their Primo Indexing and Discovery interface, but all URM modules will have open API’s to enable using them with other tools.
The goal of this roadmap apparently is efficiency in the areas of technical and functional system administration for libraries.
In the mean time development of existing products is geared towards final inclusion in URM. All future upgrades will result in what I would like to call “intermediate” instead of “next generation” products . MetaLib, the metasearch or federated search tool, will be replaced by MetaLib Next Generation, with a re-designed metasearch engine and a Primo front end. The digital collection management tool DigiTool will be merged into its new and bigger nephew Rosetta, the digital preservation system. The database of the OpenUrl resolver SFX will be restructured to accommodate the URM datamodel. The next version of Verde (electronic resource management) will effectively be URM version 1, which will also be usable as an alternative for both ILS’es Voyager and Aleph.
Here we see a kind of “intermediate” roadmap to different “base camps” from where the travelers can try to reach their final destination.
From the perspective of library staff we see another panorama appearing.
In one of the customer presentations Janet Lute of Princeton University Library, one of the three (now four) URM development partners, mentioned a couple of “holy cows” or library tasks they might consider stopping doing while on their way to the new horizon:
- managing prediction patterns for journal issues
- checking in print serials
- maintaining lots of circulation matrices and policies
- collecting fines
- cataloging over 80% of bibliographic records
I would like to add my own holy cow MARC to this list, about which I have written a previous post Who needs MARC?. (Some other developments in this area are self service, approval plans, shared cataloging, digitisation, etc.)
This roadmap is supposed to lead to more efficient work and less pressure for acquisitions, cataloging and circulation staff.
Eldorado or Brave New World?
To summarise: we see a sketchy roadmap leading us via all kinds of optional intermediate stations to an as yet still vague and unclear Eldorado of scholarly information disclosure and discovery.
The majority of public and professional attention is focused on discovery: modern web 2.0 front ends to library collections, and the benefits for the libraries’ end users. But it is probably even more important to look at the other side, disclosure: the library back end, and the consequences of all these developments for library staff, both technically oriented system administrators and professionally oriented librarians.
Future efficient integrated and modular library systems will no doubt eliminate a lot of tasks performed by library staff, but does this mean there will be no more library jobs?
Will the university library of the future be “sparsely staffed, highly decentralized, and have a physical plant consisting of little more than special collections and study areas“, as was stated recently in an article in “Inside Higher Education”? I mentioned similar options in “No future for libraries?“.
Personally I expect that the two far ends of the library jobs spectrum will merge into a single generic job type which we can truly call “system librarian“, as I stated in my post “System librarians 2.0“. But what will these professionals do? Will they catalog? Will they configure systems? Will they serve the public? Will they develop system add-ons?
This largely depends on how the new integrated systems will be designed and implemented, how systems and databases from different vendors and providers will be able to interact, how much libraries/information management organisations will outsource and crowdsource, how much library staff is prepared to rethink existing workflows, how much libraries want to distinguish themselves from other organisations, how much end users are interested in differences between information management organisations; in brief: how much these new platforms will allow us to do ourselves.
We have come up with a realistic image of ourselves for the next couple of decades soon, otherwise our publishers and system vendors will be doing it for us.
Posted on August 20th, 2009 12 comments
On August 17, after I tested a search in our new Aleph OPAC and mentioned my surprise on Twitter, the following discussion unfolded between me (lukask), Ed Summers of the Library of Congress and Till Kinstler of GBV (German Union Library Network):
- lukask: Just found out we only have one item about RDF in our catalogue: http://tinyurl.com/lz75c4
- edsu: @lukask broaden that search http://is.gd/2l6vB
- lukask: @edsu Ha! Thanks! But I’m sure that RDF will be mentioned in these 29 titles! A case for social tagging!
- edsu: @lukask or better cataloging
- edsu: @lukask i guess they both amount to the same thing eh?
- lukask: @edsu That’s an interesting position…”social tagging=better cataloging”. I will ask my cataloguing co-workers about this specific example
- edsu: @lukask make sure to wear body-armor
- lukask: @edsu Yes I know! I will bring it up at tomorrow’s party for the celebration of our ALEPH STP (after some drinks…)
- tillk: @edsu @lukask or fulltext search… SCNR…
- edsu: @tillk yeah, totally — with projects like @googlebooks and @hathitrust we may look back on the age of cataloging with different eyes …
- lukask: @tillk @edsu Fulltext search yes, or “implicit automatic metadata generation”?
What happened here was:
- A problem with findability of specific bibliographic items was observed: although it is highly unlikely that books about the Semantic Web will not cover RDF-Resource Description Framework, none of the 29 titles found with “Semantic Web” could be found with the search term “Resource Description Framework“; on the other side, the only item found with “Resource Description Framework” was NOT found with “Semantic Web“. I must add that the “Semantic web” search was an “All words” search. Only 20 of the results were indexed with the Dutch subject heading “Semantisch web” (which term is never used in real life as far as I know; the English term is an international concept). Some results were off topic, they just happened to have “semantic” and “web” somewhere in their metadata. A better search would have been a phrase search (adjacent) with “semantic web” in actual quotes, which gives 26 items. But of these, a small number were not indexed with subject heading “Semantisch web“. Another note: searching with “RDF” gets you all kinds of results. Read more on the issue of searching and relevance in my post Relevance in context.
- Four possible solutions were suggested:
- social tagging
- better cataloging
- fulltext searching
- automatic metadata generation
Clearly, the 26 items found with the search “Semantic web” are not indexed by the “Resource description framework” or “RDF” subject heading. There is not even a subject heading for “Resource description framework” or “RDF“. In my personal view, from my personal context, this is an omission. Mind you, this is not only an issue in the catalogue of the Library of the University of Amsterdam, it is quite common. I tried it in the British Library Integrated Catalogue with similar results. Try it in your own OPAC!
I presume that our professional cataloging colleagues can’t know everything about all subjects. That is completely understandable. I would not know how to catalog a book about a medical subject myself either! But this is exactly the point. If you allow end users to add their own tags to your bibliographic records, you enhance the findability of these records for specific groups of end users.
I am not saying that cataloguing and indexing by library specialists using controlled vocabularies should be replaced by social tagging! No, not at all. I am just saying that both types of tagging/indexing are complementary. Sure, some of the tags added by end users may not follow cataloging standards, but who cares? Very often the end users adding tags of their own will be professional experts in their field. In any case, items with social tags will be found more often because specific end user groups can find them searching with their own terms.
I suppose Ed Summers was trying to say the same thing as I just did above, when he commented “or better cataloging, I guess they both amount to the same thing eh?“, which I summarised as “social tagging=better cataloging“, but he can correct me if I’m wrong.
Anyway, I hope I made it clear that I would not say “social tagging=better cataloging“, but rather “controlled vocabularies+social tagging=better cataloging“.
Or alternatively, could we improve cataloging by professional library catalogers? I must admit I do not know enough about library training and practice to say something about that. I am not a trained librarian. Don’t hesitate to comment!
Is fulltext searching the miracle cure for findability problems, as Till Kinstler seems to suggest? Maybe.
Suppose all our print material was completely digitised and available for fulltext search, I have no doubt that all 26 items mentioned above (the results of the “semantic web” all words search) would be found with the “resource description framework” or “rdf” search as well. But because fulltext search is by its very nature an “all words” search, the “rdf” fulltext search would also give a lot of “noise”, or items not having any relation to “semantic web” at all (author’s initials “R.D.F”, other acronyms “R.D.F.”, just see RDF in the BL catalogue). Again, see my post Relevance in context for an explanation of searching without context.
Also, there will be books or articles about a subject that will not contain the actual subject term at all. With fulltext search these items will not be found.
Moreover, fulltext searching actually limits the findable items to text, excluding other types, like images, maps, video, audio etc.
This brings me to the “final solution”:
Automatic metadata generation
Of course this is mostly still wishful thinking. But there are a number of interesting implementations already.
What I mean when I say “(implicit) automatic metadata generation” is: metadata that is NOT created deliberately by humans, but either generated and assigned as static metadata, or generated on the fly, by software, applying intelligent analysis to objects, of all types (text, images, audio, video, etc.).
In the case of our “rdf” example, such a tool would analyse a text and assign “rdf” as a subject heading based on the content and context of this text, even if the term “rdf” does not appear in the text at all. It would also discard texts containing the string “rdf” that refer to something completely different. Of course for this to succeed there should be some kind of contextual environment with links to other records or even other systems to be able to determine if certain terminology is linked to frequently used terms not mentioned in the text itself (here the Linked Data developments could play a major role).
The same principle should also apply to non-textual objects, so that images, audio, video etc. about the same subject can be found in one go. Google has some interesting implementations in this field already: image search by colour and content type: see for example the search for “rdf” in Google Images with colour “red”and content type “clip art”.
But of course there still needs a lot to be done.