One single web page as the single identifier of every book, author or subject
I like the concept of “the web as common publication platform for libraries“, and “every book its own url“, as described by Owen Stephens in two blog posts:
“Its time to change library systems “
I’d suggest what we really need to think about is a common ‘publication’ platform – a way of all of our systems outputting records in a way that can then be easily accessed by a variety of search products – whether our own local ones, remote union ones, or even ones run by individual users. I’d go further and argue that platform already exists – it is the web!
and “The Future is Analogue “
If every book in your catalogue had it’s own URL – essentially it’s own address on your web, you would have, in a single step, enabled anyone in the world to add metadata to the book – without making any changes to the record in your catalogue.
This concept of identifying objects by URL:Unified Resource Locator (or maybe better URI: Unified Resource Identifier) is central to the Semantic Web, that uses RDF (resource Description Framework) as a metadata model.
As a matter of fact at ELAG 2008 I saw Jeroen Hoppenbrouwers (“Rethinking Subject Access “) explaining his idea of doing the same for Subject Headings using the Semantic Web concept of triplets. Every subject its own URL or web page. He said: “It is very easy. You can start doing this right away“.
To make the picture complete we only need the third essential component: every author his or her or its own URL!
This ideal situation would have to conform to the Open Access guidelines of course. One single web page serving as the single identifier of every book, author or subject, available for everyone to link their own holdings, subscriptions, local keywords and circulation data to.
In real life we see a number of current initiatives on the web by commercial organisations and non commercial groups, mainly in the area of “books” (or rather “publications”) and “authors”. “Subjects” apparently is a less appealing area to start something like this, because obviously stand-alone “subjects” without anything to link them to are nothing at all, whereas you always have “publications” and “authors”, even without “subjects”. The only project I know of is MACS (Multilingual Acces to Subjects), which is hosted on Jeroen Hoppenbrouwers’ domain.
For publications we have OCLC’s WorldCat, Librarything, Open Library, to name just a few. And of course these global initiatives have had their regional and local counterparts for many years already (Union Catalogues, Consortia models). But this is again a typical example of multiple parallel data stores of the same type of entities. The idea apparently is that you want to store everything in one single database aiming to be complete, instead of the ideal situation of single individual URI’s floating around anywhere on the web.
Ex Libris’ new Unified Resource Management development (URM, and yes: the title of this blog post is an ironic allusion to that acronym), although it promotes sharing of metadata, it does this within another separate system into which metadata from other systems can be copied.
The same goes for authors. We have WorldCat Identities, VIAF, local authority schemes like DAI, etc. Again, we see parallel silos instead of free floating entities.
Of course, the ideal picture sketched above is much too simple. We have to be sure which version of a publication, which author and which translation of a subject for instance we are dealing with. For publications this means that we need to implement FRBR (in short: an original publication/work and all of its manifestations), for authors we need author names thesauri, for subjects multilingual access.
I have tried to illustrate this in this simplified and incomplete diagram:
In this model libraries can use their local URI-objects representing holdings and copies for their acquisitions and circulation management, while the bibliographic metadata stay out there in the global, open area. Libraries (and individuals of course) can also attach local keywords to the global metadata, which in turn can become available globally (“social tagging”).
It is obvious that the current initiatives have dealt with these issues with various levels of success. Some examples to illustrate this:
- Work: Desiderius Erasmus – Encomium Moriae (Greek), Laus Stultitiae (Latin), Lof der Zotheid (Dutch), Praise of Folly (English)
- Author: David Mitchell
- Erasmus in WorldCat Identities (one ID, many forms)
- David Mitchell in WorldCat Identities (one id per author)
- David Mitchell in VIAF (one id per author)
- Erasmus in OpenLibrary (one id, one incomplete form)
- Erasmus in VIAF (one id, although from The Netherlands, preferred forms are Swedish, French and German)
- Erasmus in Librarything (no identifier, numerous forms and occurrences)
- David Mitchell in Librarything (one form, “David Mitchell is composed of at least 12 distinct authors“, no way to distinguish)
- David Mitchell in OpenLibrary (one id for multiple authors)
- Erasmus “Praise of folly” in Librarything (numerous entries for all different title variations)
- Erasmus “Praise of folly” in OpenLibrary (numerous entries for all different title variations)
These findings seem to indicate that some level of coordination (which the commercial initiatives apparently have implemented better than the non-commercial ones) is necessary in order to achieve the goal of “one URI for each object”.
Who wants to start?
7 thoughts on “UMR – Unified Metadata Resources”
Het kan niet beter gezegd en gepresenteerd worden als wat ik hierboven lees en zie! Zet Web3.0 ook nog maar als tag bij je artikel..
En als het mag ga ik jouw schema (met naamvermelding natuurlijk) gebruiken bij iets wat ik dit jaar op Elag wil tonen.
Not too long ago I made another blog specifically about MACS: http://www.hoppie.nl/pub/node/89
I intend to shortly create non-login permalinks on the LMI site that allow external web sites (or browsers) to directly fetch relevant linking information from any authority number. As soon as the actual authorities (RAMEAU, LCSH, SWD…) formally publish static URLs for all their subjects (and some already do), these will be added as well. The result should be a linking resource that can be simply integrated into nearly anything.
Which format the XML or HTML under the URL will be, still needs to be decided. Simple RDF sounds okay, but SKOS is another possibility. Plus, of course, some human-readable stuff… plenty of options here.
Sorry if I waffle…red wine can do that!
Some time ago when I heard that ExLibris were to start using Oracle 10g I did wonder at the time if this would be the catalyst for some kind of ‘grid’ initiative. By this I mean; develop a system whereby institutions would share data in a grid model rather than replicate it over and over again. Unfortunately, this wasn’t the case (as yet) and we are still in a position of replicating data in every organization with all the idiosyncrasies and erroneous entries this entails.
So the concept of a single authoritative source of bib info for every publication is very interesting and seems very logical. This system seems to have all the benefits of the grid model above, but also incorporates the concepts that embodies the semantic web. So what I believe you are saying with your diagram is that you are separating the bib record part of the LMS from the circulation and holding part. The institution would control circulation and holdings info, but get it’s bib info from the cloud. It does seem like a logical model, but I have a couple of questions:
Who would be the author of the single web pages? the publisher? the vendor? the author? a consortium? a private enterprise?…and who would ultimately be responsible for the integrity of the data? At one of the Q & A sessions at the JISC ‘Libraries of the Future’ conference (LOTF09) there was a discussion not too dissimilar to this. One of the presenters said that he would be extremely wary of handing over control of library data to an organisation (such as Google) as their agenda was different to the library’s agenda. My fear would be that whoever controlled the data would end up manipulating it for its own purposes.
The other issues is how the link is made between the holdings and circulation data which must be held locally and the bib data in the cloud. Is the idea that, when the bib information is needed the local system would search for it in the authoritative database out in the cloud or would this information be harvested on a regular basis like the Primo model? If the former, then what would happen when the internet connection was down or an authoritative source was unknown or unreachable? And if the latter could you see applications like Primo being developed to incorporate a system like this?
One last question…do you think that current (or even future…URM?) LMS systems could cope with this model? Or would libraries need to purchase/develop new systems?
Great post by the way
Andy, yes this idea is about separating bibliographic data from local transaction data.
It is still a very conceptual idea, your good points touch upon the practical implementations.
– Who would be the author of the web pages: well this could be anyone! Of course there would need to be some kind of authoritative control on different levels, but I can’t tell how this will turn out. I could think of international library cooperation, together with individual authors, publishers, etc.
– Link to global data: again: I guess this can be done in various ways. The whole ideas is of course that global data (in the “cloud”) would prevent everyone from duplicating these data in local systems. “Internet connection down” is what currently is already a risk for lots of systems that we use.
– Which systems: I have no idea. Libraries or vendors should enable their systems to link to URL’s in order to use and present data from these URL’s for their own staff and end users
Nice post 🙂
One of the ideas I haven’t yet managed to get into a blog post is that I don’t really believe in a single unique webpage per book/author/subject. I’m not clear from this post if you are arguing we should be trying for this or not? I haven’t managed to get my own thoughts organised enought to do my own blog post – but this seems like a good opportunity to try out some of my thinking…
What I mean is that having several different URIs for David Mitchell is OK – what a library would have to do is decide which one(s) it wanted to use in its local representation. If VIAF presents David Mitchell well then point to that. However, if there are better representations of other authors elsewhere, you can use alternative sources to link to for those authors. We can’t (and wouldn’t want to) stop anyone publishing a web page representing ‘David Mitchell’ in some way – what we need to do is start embracing this. Although this sounds like I’m promoting a chaotic approach (and to some extent I am!) the truth is that we would quickly see key URIs emerging – most libraries would choose to link to the same sources of information for a specific entity (work/author/etc.) – giving them lots on inbound links, and so impacting on relevance ranking in Google etc.
Also remember that the web is a network of links – so there is nothing to stop LibraryThing linking to VIAF and VIAF linking to the wikipedia entry (incidentally I’d suggest for many well-known authors their wikipedia entry has more useful information than ‘library’ focussed pages). The type of analysis you have done here is interesting, as it starts to show the kind of thinking you might do when deciding which to link to – but my contention is that you don’t need it to be the same place each time.
Also, if you link to one URI for an author, and another library links to a different one, but you both link to the same URI for the related Works then it would be possible to start inferring some kind of equivalence for the author URIs. You could even make an explicit link to say ‘these are the same entity’ if it was valuable (and again, the more people who did this, the more you could believe it)
In terms of putting together a searchable index – if we start using the web properly we can start using crawling techniques to build our indexes. Your local information will seed the crawler – i.e. tell it where to start crawling, and you can tell it how ‘deep’ to go on the web – if you are just interested in the URIs to link to directly from your local information, then you can tell it to ignore any further links.
You could also decide how far you go in terms of caching what you crawl. If you want resilience against Internet connectivity failure (as Andy suggests you might) you could cache everything and keep local copies (I’m not convinced you’d want to, but it is a possible approach).
Something you would need to accept is that the information you crawl may change and be updated – and that you don’t have control. This is probably the most difficult thing for libraries to deal with – and as you suggest perhaps makes the decision of who you link to for various bits of metadata a key question.
There are issues as well – what if a URI you linked to disappears? How would you know? What would you do? These are issues that need some further thought, but I’m convinced they are surmountable (although I’d say we have to be careful not to invent new library specific things when tackling these issues – that feels a bit like OAI-PMH, which has not been well adopted outside the library/repository world). There are probably other issues that I haven’t mentioned/thought of – but at heart my argument is – the web works well, lets start using it properly!
I think all your comments have to do with one single issue: who has control over quality? My initial “ideal” picture was: one single point of definition for each object. That’s my “normalised datamodel designer’s” hangup, maybe.
This would require some kind of authority control as I suggest. But the nature of the web is completely different, as you observe. Chaotic. But the idea of emerging key URI’s is not unrealistic. This would constitute some kind of “authority of the masses”?
Would there be a role for international consortia (commercial and non-commercial organisations) in monitoring quality?
Anyway, I like your motto “the web works well, lets start using it properly!”
As you say, emerging key URIs would be about “authority of the masses” – the more libraries that link to a record, the more you would accept that this was an ‘authoritative record’.
I argue in my ‘Future is Analogue’ post http://www.meanboyfriend.com/overdue_ideas/2009/02/the-future-is-analog.html that we need to think more about spectrums of ‘aboutness’ – and I would say the same with authority about a ‘quality’ cataloguing record – across libraries you won’t find a single answer to what is the ‘right’ catalogue record – but some will be used more than others.
If we did have linked bibliographic records in the way I describe I think we would actually find that libraries tended to use ‘authoritative’ sources to link to anyway – just in the same way that most libraries do copy cataloguing from a few, well known, sources. So, we might expect (for example) National Library catalogues to become a focus for large numbers of incoming links. We might well see consortia taking this role as well – or at least the idea of ‘trusted partners’ within consortia (i.e. we know that x catalogues to standards we are happy with and meet our needs)