Linking library and cultural heritage data
“Interested to publishing a test collection as linked open data to help @StichtingDEN with practical guide for heritage institutions?” That’s what my former colleague at the Library of the University of Amsterdam, now project manager at DEN (Digital Heritage Foundation The Netherlands), Marco Streefkerk asked me in April 2010.
Was I interested? Of course I was. I had written a blog post “Linked data for libraries” almost a year before, and I had been very interested in the subject since then. Unfortunately in my day job at the Library of the University of Amsterdam (UBA) until very recently there was no opportunity to put my theoretical knowledge to practice. However, in the Library’s “Action plans 2010-2011” (January 2010), the Semantic Web is mentioned in the Innovation chapter as one of the areas with room for a small pilot involving linked data and RDF. I like to think it was me who managed to get it in there 😉
To come back to Marco’s question, I was at the time actually trying to think of a linked data/RDF test, and it so happened that I had talked to Ad Aerts of the Theater Institute of The Netherlands (TIN) about organising such a test the day before! So that’s what I told Marco. And things started from there.
The first idea was to publish a small test set of records from one of the University Library’s own heritage collections. The goal from the point of view of DEN was to publish a short practical guide how to publish heritage collection as linked data, targeted at heritage institutions.
But after some email discussions and meetings we decided to incorporate TIN in this test and apply both sides of the linked data concept: publish linked data and use linked data.
Apart from a library catalogue, TIN also has a large database containing metadata on theater performances and a large collection of audiovisual material related to these performances. The plan was to publish the performance metadata and related digital material as linked data.
The UBA would then use this TIN linked data in their traditional MARC based OPAC to enrich the plain bibliographic metadata if the OPAC search results related to theater plays.
We decided to name our little proof of concept project “Dutch Culture Link”. The people involved for DEN are Marco Streefkerk, Annelies van Nispen and Monika Lechner. For TIN it’s Ad Aerts. For UBA: Roxana Popistasu and myself. Of these five people I knew four already face to face and one (Monika) on Twitter. I think this helps.
To start with, we described the data model of the TIN Productions and Performances database (in terms of relationships or triples) as follows:
- a Play is written by one or more Persons (as author)
- a Play can be ‘effectuated’ in one or more Productions
- a Production can be ‘staged’ in one or more Performances
- a Performance takes place in one Venue on a specific date and time
- a Person can be producer of a Production
- a Person can be director of a Production
- a Person can play a character in a Production, or even in an individual Performance
Besides the metadata TIN also has links from the database to digital collections (sound and video recordings, photographs, reviews). The model is strikingly similar to the bibliographic FRBR model. The Play is a FRBR Work, the Production is a FRBR Expression and/or Manifestation, the Performance is a FRBR Item.
Now we knew who and what, but not yet how. We needed to know how to actually apply the theoretical concepts of linked data to our subject area. Questions we had were:
- which ontology/vocabulary (‘data model’) do we need for publishing the production data?
- how to format URIs (the linked data unique identifiers)
- how do we implement RDF?
- which publication techniques and platforms do we use?
- which scripting languages can we use?
- how do we find and get the published linked data?
- how do we process and present the retrieved linked data?
We definitely needed some practical hands-on tutorials or training. We could not find an institution organising practical linked data training courses in The Netherlands at short notice. Via Twitter Ian Davis referred us to their TALIS training options. Unfortunately, because we are only an informal proof of concept pilot project without any project funding, we were unable to proceed on this track.
However, through a contact at The European Library we managed to enter two members of our project team as participants in the free Linked Data Workshop at DANS in The Hague, with Herbert Van De Sompel, Ivan Herman and Antoine Isaac as trainers. This workshop proved to be very useful. Unfortunately I could not attend myself.
After the workshop we decided to adopt an “agile” aproach: just start and proceed with small steps. For the short term this meant on the TIN side: implementing a script that accesses the XML gateway of the Adlib system underlying the Theater Production Database and produces result in JSON format. The script accepts as input URIs of the form <baseurl>/person/<name>, <baseurl>/play/<person>/<title>, etc. For now only the <baseurl>/person/<name> works, but there are more to come.
An example: the request <baseurl>/person/joost van den vondel gives the JSON result:
“key”:”vondel, joost van den”,
“name”:”Vondel, Joost van den”,
“birth.date”:”17 november 1587*”,
“death.date”:”5 februari 1679*”,
Next steps in this project:
- select/adapt/create a vocabulary for the Production/Performance subject area
- select/adapt/create vocabularies for Persons (FOAF?) and Subjects (SKOS?)
- add internal relationships with the other entities (Play, Production, etc.) in the JSON structure (implement RDF in JSON)
- Add RDF/XML as output option, besides JSON
- add external relationships (to other linked data sources like DBPedia, etc.)
- extend the number of possible URI formats (for Play, Production, etc.)
- add content negotiation to serve both human and machine readable redirects
- extend the options on the OPAC side
- publish UBA bibliographic data as linked open data (probably an entirely new project)
To be continued…