Posted on June 19th, 2014 1 comment
Lingering gold at ELAG 2014
Libraries tend to see themselves as intermediaries between information and the public, between creators and consumers of information. Looking back at the ELAG 2014 conference at the University of Bath however, I can’t get the image out of my head of libraries standing in the way between information and consumers. We’ve been talking about “inside out libraries”, “libraries everywhere”, “rethinking the library” and similar soundbites for some years now, but it looks like it’s been only talk and nothing more. A number of speakers at ELAG 2014 reported that researchers, students and other potential library visitors wanted the library to get out of their way and give them direct access to all data, files and objects. A couple of quotes:
- “We hide great objects behind search forms” (Peter Mayr, “EuropeanaBot”)
- “Give us everything” (Ben O’Steen, “The Mechanical Curator”).
[Lingering gold: data, objects]
In a cynical way this observation tightly fits this year’s conference theme “Lingering Gold”, which refers to the valuable information and objects hidden and locked away somewhere in physical and virtual local stores, waiting to be dug up and put to use. In her keynote talk, Stella Wisdom, digital curator at the British Library, gave an extensive overview of the digital content available there, and the tools and services employed to present it to the public. However, besides options for success, there are all kinds of pitfalls in attempting to bring local content to the world. In our performance “The Lord of the Strings”, Karen Coyle, Rurik Greenall, Martin Malmsten, Anders Söderbäck and myself tried to illustrate that in an allegorical way, resulting in a ROADMAP containing guidelines for bringing local gold to the world.
In recent years it has become quite clear that data, dispersed and locked away in countless systems and silos, once liberated and connected can be a very valuable source of new information. This was very pertinently demonstrated by Stina Johansson in her presentation of visualization of research and related networks at Chalmers University using available data from a number of their information systems. Similar network visualizations are available in the VIVO open source linked data based research information tool, which was the topic of a preconference bootcamp which I helped organize (many thanks especially to Violeta Ilik, Gabriel Birke and Ted Lawless who did most of the work).
[Systems, apis, technology trap]
The point made here also implies that information systems actually function as roadblocks to full data access instead of as finding aids. I have come to realize this some time ago, and my perception was definitely confirmed during ELAG 2014. In his lightning talk Rurik Greenall emphasized the fact that what we do in libraries and other institutions is actually technology driven. Systems define the way we work and what we publish. This should be the other way around. Even APIs, intended for access to data in systems without having to use end user system functions, are actually sub-systems, giving non transparent views on the data. When Steve Meyer in his talk “Building useful and usable web services” said “data is the API” he was right in theory, yet in practice the reverse is not necessarily true. Also, APIs are meant to be used by developers in new systems. Non-tech end users have no use for it, as is illustrated by one of the main general reactions from researchers to the British Library Labs surveys, as reported by Ben O’Steen: “API? What’s that? I don’t care. Just give me the files.”.
[Commercial vs open source]
This technology critique essentially applies to both commercial/proprietary and open source systems alike. However, it could be that open source environments are more favorable to open and findable data than proprietary ones. Felix Ostrowski talked about the reasons for and outcomes of the Regal project, moving the electronic objects repository of the State Library of Rheinland-Pfalz from an environment based on commercial software to one based on open source tools and linked data concepts. One of the side effects of this move was that complaints were received from researchers about their output being publicly available on the web. This shows that the new approach worked, that the old approach was effectively hiding information and that certain stakeholders are completely satisfied with that.
On the side: one of the open source components of the new Regal environment is Fedora , only used for digital objects, not any metadata, which is exactly what is currently happening in the new repository project at the Library of the University of Amsterdam. A legitimate question asked by Felix: why use Fedora and not just the file system in this case?
All these observations also imply that, if libraries really want to disseminate and share their lingering gold with the world, alternative ways of exposing content are needed, instead of or besides the existing ones. Fortunately some libraries and individuals have been working on providing better direct access and even unguided and unsolicited publication of data and objects that might be available but not really findable with traditional library search tools. The above mentioned EuropeanaBot (and other twitter bots) and the British Library Labs’ Mechanical Curator are a case in point. Every hour EuropeanaBot sends a tweet about a random digital object, enriching it with extra information from Wikipedia and other sources.
In the case of the British Library Labs Ben O’Steen described an experiment with free access to large amounts of data that by chance led to the observation that randomly excavated images from that vast amount of content drew people’s attention. As all content was in the public domain anyway, they asked themselves “what’s the harm in making it a bit more acessible?”. So the Mechanical Curator was born, with channels on tumblr, twitter and flickr.
Another alternative way to expose and share library content, a game, was presented by Ciaran Talbot and Kay Munro: LibraryGame. In brief, students are encouraged to use and visit the library and share library content with others by awarding them points and badges as members of an online community. The only two things students didn’t like about the name LibraryGame were “library” and “game”, so the name was changed to “BookedIn”.
No matter if you like bots and games or not, the important message here is that it is worthwhile exploring alternative ways by which people can find the content that libraries consider so valuable.
In the end, it’s people that libraries work for. At Utrecht University Library they realised that they needed simpler ways to make it possible for people to use their content, not only APIs. Marina Muilwijk described how they are experimenting with the Lean Startup method. In a continuous cycle of building, measuring and learning, simple applications are released to end users in order to test if they use them and how they react to them.
“Focus on the user” was also the theme of the workshop given by Ken Chad around the Jobs-to-be-done methodology.
Interestingly, “How people find” instead of: “How people search” was one of the perspectives of the Jisc “Spotlight on the Digital” project, presented by Owen Stephens in his lightning talk.
[Collections and findability]
Another perspective of that Jisc project was how to make collections discoverable. It turns out that collections as such are represented on the web quite well, whereas items in these collection aren’t.
Valentine Charles of The European Library demonstrated the benefits of collection level metadata for the discoverability of hidden content, using the CENDARI project as example.
What’s a library technology conference without linked data? Implicitly and explicitly the instrument of connecting data from different sources relates quite well to most of the topics presented around the theme of lingering gold, with or without the application of the official linked data rules. I have already mentioned most cases, I will only go into a couple of specific sessions here.
Niklas Lindström and Lina Westerling presented the developments with the new linked data based cataloguing system for the Swedish LIBRIS union catalogue. This approach is not simply a matter of exposing and consuming linked data, but in essence the reconstruction of existing workflows using a completely new architecture.
The data management and integration platform d:swarm, a joint open source project of SLUB State and University Library Dresden and the commercial company AvantgardeLabs was presented in a lightning talk by Jan Polowinski. This tool aims at harvesting and normalising data from various existing systems and datastores into an intermediate platform that in turn can be used for all kinds of existing and new front end systems and services. The concept looks very useful for library environments with a multitude of legacy systems. Some time ago I visited the d:swarm team in Dresden together with a group of developers from the KOBV library consortium in Berlin, two of whom (Julia Goltz and Viktoria Schubert) presented their own new K2 portal solution for the data integration challenge in a lightning talk.
Linked data is all about unique identifiers on the web. The recent popular global identifier for researchers ORCiD, at last year’s ELAG topic of one of the workshops, was explained by Tom Demeranville. As it happened, right after the conference it became clear that ORCiD implemented the Turtle linked data format.
The problem of matching string based personal names from various data sources without matching identifiers was tackled in the workshop “Linking Data with sameAs” which I attended. Jane and Adrian Stevenson of the ArchivesHub UK showed us hands-on how to use tools like LOD-Refine and Silk for reconciling string value data fields and producing “sameAs” relationships/triples to be used in your local triple store. They have had substantial experience with this challenge in their Linking Lives project. I found the workshop very useful. One of the take-aways was that matching string data is hard work.
Hard work also goes on in the caves and basements of the library world, as was demonstrated by Toke Eskildsen in his war stories of the Danish State Library with scanning companies, and by Eva Dahlbäck and Theodor Tolstoy in their account of using smartphones and RFID technology in fetching books from the stacks.
Once again I have to say that a number of unofficial sessions, at breakfast, dinner, in pubs and hotel bars, were much more informative than the official presentations. These open discussions in small groups, fostering free exchange of ideas without fear of embarrassment, while being triggered by the talks in the official programme, can simply not take place within a tight conference schedule. Nevertheless, ELAG is a conference small and informal enough to attract people inclined to these extracurricular activities. I thank everybody who engaged in this. You know who you are. Or check Rurik Greenall’s conference report, which is a very structured yet personal account of the event.
Lots of thanks to the dedicated and very helpful local organisation team of the Library of the University of Bath, who have done a wonderful job doing something completely new to them: organising an international conference.
Posted on December 1st, 2013 No comments
Struggling towards usable linked data services at SWIB13
Paraphrasing some of the challenges proposed by keynote speaker Dorothea Salo, the unofficial theme of the SWIB13 conference in Hamburg might be described as “No more ontologies, we want out of the box linked data tools!”. This sounds like we are dealing with some serious confrontations in the linked open data world. Judging by Martin Malmsten’s LIBRIS battle cry “Linked data or die!” you might even think there’s an actual war going on.
Looking at the whole range of this year’s SWIB pre-conference workshops, plenary presentations and lightning talks, you may conclude that “linked data is a technology that is maturing” as Rurik Greenall rightly states in his conference report. “But it has quite a way to go before we can say this stuff is ready to roll out in libraries” as he continues. I completely agree with this. Personally I got the impression that we are in a paradoxical situation where on the one hand people speak of “we” and “community”, and on the other hand they take fundamentalist positions, unconditionally defending their own beliefs and slandering and ridiculing other options. In my view there are multiple, sometimes overlapping, sometimes irreconcilable “we’s” and “communities”. Sticking to your own point of view without willingness to reason with the other party really does not bring “us” further.
This all sounds a bit grim, but I again agree with Rurik Greenall when he says that he “enjoyed this conference immensely because of the people involved”. And of course on the whole the individual workshops and presentations were of a high quality.
Before proceeding to the positive aspects of the conference, let me first elaborate a bit on the opposing positions I observed during the conference, which I think we should try to overcome.
Developers disagree on a multitude of issues:
Developers hate MARC. Everybody seems to hate RDF/XML, JSON-LD seems to be the thing for RDF, but some say only Turtle should be used, or just JSON.
Tools and languages
Perl users hate Java, Jave users hate PHP, there’s Python and Ruby bashing.
Create your own, reuse existing ones, yes or no upper ontologies, no ontologies but usable tools.
Windows/UNIX/Linux/Apple… it’s either/or.
Open source vs. commercial software
Need I say more?
Belgians hate German beer, or any foreign beer for that matter.
(Not to mention PDF).
OK, I hope I made myself clear. The point is that I have no problem at all with having diverse opinions, but I dislike it when people are convinced that their own opinion is the only right one and refuse to have a conversation with those who think otherwise, or even respect their choices in silence. The developer “community” definitely has quite a way to go.
Apart from these internal developer disagreements I noticed, there is the more fundamental gap between developers and users of linked open data. By “users” I do not mean “end users” in this case, but the intermediary deployers of systems. Let’s call them “libraries”.
Linked Data developers talk about tools and programming languages, metadata formats, open source, ontologies, technology stacks. Librarians want to offer useful services to their end users, right now. They may not always agree on what kind of services and what kind of end users, and they may have an opinion on metadata formats in systems, but their outlook is slightly different from the developers’ horizon. It’s all about expectations and expectation management. That is basically Dorothea Salo’s keynote’s point. Of course theoretical, scientific and technical papers and projects are needed to take linked data further, but libraries need linked data tools, focused on providing new services to their end users/customers in the world of the web, that can easily be implemented and maintained.
In this respect OCLC’s efforts to add linked data features to WorldCat is praiseworthy. OCLC’s Technology Evangelist Richard Wallis presented his view on the benefits of linked open data for libraries, using Google’s Knowledge Graph as an example. His talk was mainly focused at a librarian audience. At SWIB, where the majority of attendees are developers or technology staff, this seemed somewhat misplaced. By chance I had been present at Richard’s talk at the Dutch National Information Professional annual meeting two weeks earlier, where he delivered almost the same presentation for a large room full of librarians. There and then that was completely on target. For the SWIB audience this all may have been old news, except for the heads up about OCLC’s work on FRBR “Works” BIBFRAME type linked data which will result in published URIs for Works in WorldCat.
An important point here is that OCLC is a company with many library customers worldwide, so developments like this benefit all of these libraries. The same applies to customers of one of the other big library system vendors, Ex Libris. They have been working on developing linked data features for their so called “next generation” tools since some time now, in close cooperation with the international user groups’ Linked Open Data Special Interest Working Group, as I explained in the lightning talk I gave. Also open source library systems like Koha are working on adding linked open data features to their tools. It’s with tools like these, that reach a large number of libraries, that linked open data for libraries can spread relatively quickly.
In contrast to this linked data broadcasting, the majority of the SWIB presentations showed local proprietary development or research projects, mostly of high quality notwithstanding. In the case of systems or tools that were built all the code and ontologies are available on GitHub, making them open source. However, while it is commendable, open source on GitHub doesn’t mean that these potentially ground breaking systems and ontologies can and will be adopted as de facto standards in the wider library community. Most libraries, both public and academic, are dependent on commercial system and content providers and can’t afford large scale local system development. This also applies up to a point to libraries that deploy large open source tools like Koha, I presume.
It would be great if some of these many great open source projects could evolve into commonly used standard tools, like Koha, Fedora and Drupal, just to name a few. Vivo is another example of an open source project rapidly moving towards an accepted standard. It is a framework for connecting and publishing research information of different nature and origin, based on linked data concepts. At SWIB there was a pre-conference “VivoCamp”, organised by Lambert Heller, Valeria Pesce and myself. Research information is an area rapidly gaining importance in the academic world. The Library of the University of Amsterdam, where I work, is in the process of starting a Vivo pilot, in which I am involved. (Yes, the Library of the University of Amsterdam uses both commercial providers like OCLC and Ex Libris, and many open source tools). The VivoCamp was a good opportunity to have a practical introduction in and discussion about the framework, not in the least by the presence of John Fereira of Cornell University, one of the driving forces behind Vivo. All attendees (26) expressed their interest in a follow-up.
Vivo, although it may be imperfect, represents the type of infrastructure that may be needed for large scale adoption of linked open data in libraries. PUB, the repository based linked data research information project at Bielefeld University presented by Vitali Peil, is aimed at exactly the same domain as Vivo, but it again is a locally developed system, using another smaller scale open source framework (LibreCat/Catmandu of Bielefeld, Ghent and Lund universities) and a number of different ontologies, of which Vivo is just one. My guess is that, although PUB/LibreCat might be superior, Vivo will become the de facto standard in linked data based research information systems.
Instead of focusing on systems, maybe the library linked data world would be better served by a common user-friendly metadata+services infrastructure. Of course, the web and the semantic web are supposed to be that infrastructure, but in reality we all move around and process metadata all the time, from one system and database to another, in order to be able to offer new legacy and linked data services. At SWIB there was mention of a number of tools for ETL, which is developer jargon for Extract, Transform, Load. By the way, jargon is a very good way to widen the gap between developers and libraries.
There were pre-conference workshop for the ETL tools Catmandu and Metafacture, and in a lightning talk SLUB Dresden, in collaboration with Avantgarde Labs, presented a new project focused on using ETL for a separate multi-purpose data management platform, serving as a unified layer between external data sources and services. This looks like a very interesting concept, similar to the ideas of a data services hub I described in an earlier post “(Discover AND deliver) OR else”. The ResourceSync project, presented by Simeon Warner, is trying to address the same issue by a different method, distributed synchronisation of web resources.
One can say that the BIBFRAME project is also focused on data infrastructure, albeit at the moment limited to the internal library cataloguing workflow, aimed at replacing MARC. An overview of the current state of the project was presented by Lars Svensson of the German National Library.
The same can be said for the National Library of Sweden’s new LIBRIS linked data based cataloguing system, presented by Martin Malmsten (Decentralisation, Distribution, Disintegration – towards Linked Data as a First Class Citizen in Libraryland). The big difference is that they’re actually doing what BIBFRAME is trying to plan. The war cry “Linked data or die!” refers to the fact that it is better to start from scratch with a domain and format independent data infrastructure, like linked data, than to try and build linking around existing rigid formats like MARC. Martin Malmsten rightly stated that we should keep formats outside our systems, as is also the core statement of the MARC-MUST-DIE movement. Proprietary formats can be dynamically imported and exported at will, as was demonstrated by the “MARC” button in the LIBRIS user interface. New library linked data developments will have to coexist with the existing wider library metadata and systems environment for some time.
Like all other local projects, the LIBRIS source code and ontology descriptions are available on GitHub. In this case the mere scope of the National Library of Sweden and of the project makes it a bit more plausible that this may actually be reused on a larger scale. At least the library cataloguing ontology in JSON-LD there is worth having a look at.
To return to our starting point, the LIBRIS project acknowledges the fact that we need actual tools besides the ontologies. As Martin Malmsten quoted: “Trying to sell the idea of linked data without interfaces is like trying to sell a fax without the invention of paper”.
The central question in all this: what is the role of libraries in linked data? Developers or implementers, individually or in a community? There is obviously not one answer. Maybe we will know more at SWIB14. Paraphrasing Fabian Steeg and Pascal Christoph of hbz and Dorothea Salo, next years theme might be “Out of the box data knitting for great justice”.
Posted on November 11th, 2013 4 comments
Using discovery tools for presenting integrated information
There has been a lot of discussion in recent years about library discovery tools. Basically, a library discovery tool provides a centrally maintained shared scholarly material metadata index, a system for searching and an option for adding a local metadata index. Academic libraries use it for providing a unified access platform to subscribed and open access databases and ejournals as well as their own local print and digital holdings.
I would like to put forward that, despite their shortcomings, library discovery tools can also be used for finding and presenting other scholarly information in the broadest sense. Libraries should look beyond the narrow focus on limitations and turn imperfection into benefits.
The two main points of discussion regarding discovery tools are the coverage of the central shared index and relevance ranking. For a number of reasons of a practical, technical and competitive nature, none of the commercial central indexes cover all the content that academic libraries may subscribe to. Relevance ranking of search results depends on so many factors that it is a science in itself to satisfy each and every end user with their own specific background and context. Discovery tool vendors spend a lot of energy in improving coverage and relevance ranking.
These two problems are the reason that not many academic libraries have been able to achieve the one-stop unified scholarly information portals for their staff and students that discovery tool providers promised them. In most cases the institutional discovery portal is just one of the solutions for finding scholarly publications that are offered by the library. A number of libraries are reconsidering their attitude towards discovery tools, or have even decided to renounce these tools altogether and focus on delivery instead, leaving discovery to external parties like Google Scholar.
I fully support the idea that libraries should reconsider their attitude towards discovery tools, but I would like to stress that they should do so with a much broader perspective than just the traditional library responsibility of providing access to scholarly publications. Libraries must not throw away the baby with the bathwater. They should realise that a discovery tool can be used as a platform for presenting connected scholarly information, for instance publications with related research project information and research datasets, based on linked open data principles. You could call this the “poor person’s linked open data platform”, because the library has already paid the license fee for the discovery platform, and it does not have to spend a lot of extra money on additional linked open data tools and facilities.
Of course this presupposes a number of things: the content to be connected should have identifiers, preferably in the form of URIs, and should be openly available for reuse, preferably via RDF. The discovery tools should be able to process URIs and RDF and present the resolved content in their user interfaces. We all know that this is not the case yet. Long term strategies are needed.
Content providers must be convinced of the added value of adding identifiers and URIs to their metadata and providing RDF entry points. In the case of publishers of scholarly publications this means identifiers/URIs for the publications themselves, but also for authors, contributors, organisations, related research projects and datasets. A number of international associations and initiatives are already active in lobbying for these developments: OpenAIRE, Research Data Alliance, DataCite, the W3C Research Object for Scholarly Communication Community Group, etc. Universities themselves can contribute by adding URIs and RDF to their own institutional repositories and research information systems. Some universities are implementing special tools for providing integrated views on research information based on linked data, such as VIVO.
There are also many other interesting data sources that can be used to integrate information in discovery tools, for instance in the government and cultural heritage domain. Many institutions in these areas already provide linked open data entry points. And then there is WikiPedia with its linked open data interface DBpedia.
On the other side of the scale discovery tool providers must be convinced of the added value of providing procedures for resolving URIs and processing RDF in order to integrate information from internal and external data sources into new knowledge. I don’t know of any plans for implementing linked open data features in any of the main commercial or open source discovery tools, except for Ex Libris’ Primo. OCLC provides a linked data section for each WorldCat search result, but that is mainly focused on publishing their own bibliographic metadata in linked data format, using links to external subject and author authority files. This is a positive development, but it’s not consumption and reuse of external information in order to create new integrated knowledge beyond the bibliographic domain.
With the joint IGeLU/ELUNA Linked Open Data Special Interest Working Group the independent Ex Libris user groups have been communicating with Ex Libris strategy and technology management on the best ways to implement much needed linked open data features in their products. The Primo discovery tool (with the Primo Central shared metadata index) is one of the main platforms in focus. Ex Libris is very keen on getting actual use cases and scenarios in order to identify priorities in going forward. We have been providing these for some time now through publications, presentations at user group conferences, monthly calls and face to face meetings. Ex Libris is also exploring best practices for the technical infrastructure to be used and is planning pilots with selected customers.
The Austrian national library service OBVSG for instance has integrated WikiPedia/DBpedia information about authors in their Primo results.
The Saxon State and University Library Dresden (SLUB) has implemented a multilingual semantic search tool for subjects based on DBpedia in their Primo installation.
At the University of Amsterdam I have been experimenting myself with linking publications from our Institutional Repository (UvA DARE) in Primo with related research project information. This has for now resulted in adding extra external links to that information in the Dutch National Research portal NARCIS, because NARCIS doesn’t provide RDF yet. We are communicating with DANS, the NARCIS provider, about extending their linked open data features for this purpose.
Of course all these local implementations can serve as use cases for discovery tool providers.
I have only talked about the options of using discovery tools as a platform for consuming, reusing and presenting external linked open data, but I can imagine that a discovery tool can also be used as a platform for publishing linked open data. It shouldn’t be too hard to add extra RDF options besides the existing HTML and internal record format output formats. That way libraries could have a full linked open data consumption and publishing workbench at their disposal at minimal cost. Library discovery tools would from then on be known as information discovery tools.
Posted on November 9th, 2013 2 comments
Tools and methods for PhD research and writing
After my decision was made to attempt a scholarly career and aim for a PhD (see my first post in this series), I started writing a one page proposal. When I finished I had a working title, a working subtitle, a table of content with 6 chapter titles, and a very general and broad text about some contradictions between the state of technology and the integration of information. I can’t really go into any details, because the first thing you learn when you aim for a scholarly publication, is to keep your subject to yourself as much as possible in order to prevent someone else from hijacking it and get there first.
I used Google Docs/Drive for writing this proposal. I have been using Google Docs since a couple of years for all my writing, both personal and professional, because that way I have access to my documents wherever I go on any device I want, I can easily share and collaborate, and it keeps previous versions.
I shared my proposal with my advisor Frank Huysmans, using the Google Docs sharing options. We initially communicated about it through Twitter DMs, Google Docs comments and email. Frank’s first reaction was that it looked like a good starting point for further exploration, and he saw two related perspectives to explore: Friedrich Kittler’s media ontology and Niklas Luhmann’s system theory. I had studied a book by Luhmann on the sociology of legal trials during my years in university, but he only became one of the major theoretical sociologists after that. I hadn’t heard of Kittler at all. He is a controversial major media and communications philosopher. Both are (or rather were) Germans.
Next step: I needed to get hold of literature, books and articles, both by and about Luhmann and Kittler, preferably in English, although I read German very well. Some background information about the scholars and their work would also be useful.
An important decision I had to make was whether I was going to use print, digital or both formats for my literature collection. I didn’t have to think long. I decided to try and get everything in digital format, either as online material, or downloadable PDFs, EPUBs etc. The main reason is that this way I can store the publications in an online storage facility like Dropbox, and have access to all my literature wherever I am, at work, at home and on the road (either on an ereader or my smartphone).
Frank, who is a Luhmann expert, gave me some pointers on publications by and about Luhmann. Kittler I had to find on my own. Of course, working for the Library of the University of Amsterdam and being responsible for our Primo discovery tool, I tried finding relevant Luhmann and Kittler publications there. I also tried Google and Google Scholar. I used my own new position as a library consumer to perform a basic comparison test between library discovery and Google. My initial conclusions were that I got better results using Google than our own Primo. This was in March 2013. But in the mean time both Ex Libris and the Library have made some important adjustments to the Primo indexing procedures and relevance ranking algorithm. Repeating similar searches in November 2013 provided much better results.
Anyway, I mostly ignored print publications. As a staff member of the University of Amsterdam I have access to all subscription online content, whether I find it through our own discovery tools or via Google. What I can’t find in our library discovery tools are ‘unofficial’ digital versions of print books. Here Google can help. For instance I found a complete PDF version of Luhmann’s main work “Die Gesellschaft der Gesellschaft” (“Society of society”, in German). Also Frank was so kind to digitise/copy for me some chapters of relevant print books about Luhmann.
I discovered that I needed some sort of mechanism to categorise or catalogue my literature in such a way that this is available to me wherever I am, other than just put files in a Dropbox folder. After some research I decided on Mendeley, which was originally presented as a reference manager, but is now more a collaboration, publication sharing and annotation environment. I am not using Mendeley as a reference manager for now, only for categorising digital local and online publications and making annotations attached to the publications.
An important feature that I need in research is to have access to my notes everywhere. With Mendeley I can attach notes to PDFs in the Mendeley PDF reader which are synchronised with my other Mendeley installations on other workstations, if the ‘synchronise files’ option is checked everywhere. Actually Mendeley distinguishes between “notes” and “annotations”. Notes are made and attached on the level of the publication, annotations are linked to specific text sections in PDFs. Annotations don’t work with EPUB files, because there is no embedded Mendeley EPUB reader, or with online URLs. I can add notes to specific EPUB text sections in my ereader which is synchronised between my ereader software instances. So I still need an independent annotation tool that can link my notes to all my research objects and synchronises them between my devices independent of format or sofware.
A final note about digital formats. PDFs are the norm in digital scholarly dissemination of publications. PDFs are fine to read on a PC, and attaching notes in Mendeley, but they’re horrible to read on my ereader. There I prefer EPUB. I would really like to have more standard options for downloading publications to select from, at least PDF and EPUB.
Summarising, the functions I need for my PhD research and writing in this stage, and the tools and methods I am currently using:
- Find publications, information: library discovery tools, Google Scholar, the web
- Acquire digital copies of publications: access/authorisation; PDF, EPUB
- Store material independent of location and device: Dropbox, Mendeley
- Save references: Mendeley
- Notes/annotations: Mendeley, Kobo ereader software
- Write and share documents independent of location and device: Google Docs
- Communicate: Twitter, email, Google Docs comments
Posted on August 18th, 2013 1 comment
My final career? Facing the challenge
This is the first in hopefully a series of posts about a new phase in my professional life, in which I will try to pursue a scholarly career with a PhD as my first goal. My intention with this series is to document in detail the steps and activities needed to reach that goal. In this introduction I will describe the circumstances and considerations that finally led to my decision to take the plunge.
First some personal background. I graduated in Sociology at the University of Amsterdam in 1987. I received the traditional Dutch academic title of “Doctorandus” (“Drs.”) which in the current international higher education Bachelor/Masters system is equivalent to a Master of Science (MSc). I specialised in sociology of organisations, labour and industrial relations, with minors in economics and social science “informatics”. I wrote my thesis “Arbeid onder druk” (“Labour under pressure”) on automation, the quality of labour and workers’ influence in the changing printing industry in the 20th century.
The job market for sociologists in the 1980s and 1990s was virtually non-existent, so I had to find an alternative career. One of the components of the social science informatics minor was computer programming. I learned to program in Pascal on a mainframe using character based monochrome terminals. I actually liked doing that, and I decided to join the PION IT Retraining programme for unemployed (or underemployed) academics organised in the 1980s by the Dutch State to overcome the growing shortage of IT professionals. After a test I was accepted and I finished the course, during which I learned programming in COBOL, with success in 1988. However, it took me another two years to finally find a job. From 1990 I worked as a systems designer and developer (programming in PL/1, Fortran, Java among others) for a number of institutions in the area of higher education and scholarly information until 2002. Then I was sent away with a year’s salary from the prematurely born Institute for Scholarly Information Services NIWI, which was terminated in 2005. From its ruins the now successful Data Archiving and Networking Services institute (DANS) emerged.
It was then that I tried to take up a new academic career for the first time, and I enrolled in Cultural Studies at the Open University. I enjoyed that very much, I took a lot of courses and even managed to pass a number of exams.
By the end of that year, after a chance meeting with a former colleague in the tram on my way to take an exam, I found myself working at the Dutch Royal Library in The Hague as temporary project staff member for implementing the new federated search and OpenURL tools MetaLib and SFX from Ex Libris. This marked the start of my career in library technology. It was then that I first learned about bibliographic metadata formats, cataloguing rules and MARC madness.
I soon discovered that it’s very hard to combine work and studying, and because I started to like working for libraries and networking and exchanging knowledge I silently dropped out of Open University.
After three years on temporary contracts I moved to the Library of the University of Amsterdam to do the same work as I did at the Royal Library. I got involved in the International Ex Libris User Group IGeLU and started trying to make a difference in library systems, working together with an enthusiastic bunch of people all over the world. I made it to head of department for a while, until an internal reorganisation gave me the opportunity to say goodbye to that world of meetings, bureaucracy and conflicts. Now I am Library Systems Coordinator, which doesn’t mean that I am coordinating library systems, by the way. My main responsibility at the moment is our Ex Libris Primo discovery tool. The most important task with that is coordinating data and metadata streams. A lot of time, money and effort is spent on streamlining the moving around of metadata between a large number of internal and external systems. Since a couple of years I have been looking into, reading about, writing and presenting on linked open data and metadata infrastructures, with now and then a small pilot project. But academic libraries are so slow in realising that they should move away from thinking in new systems for new challenges and investing in metadata and data infrastructures instead, that I am not overly enthusiastic about working in libraries anymore. It was about a year ago that I suddenly realised that with Primo I was still doing exactly the same detailed configuration stuff and vendor communications that I was doing ten years earlier with MetaLib, and nothing had changed.
It was then that I said to myself: I need to do something new and challenging. If this doesn’t happen at work soon, then I have to find something satisfying to do besides that.
I can’t reproduce the actual moment anymore, but I think it happened when I was working with getting scholarly publications (dissertations among others) from our institutional repository into Primo. Somehow I thought: I can do that too! Write a dissertation, get a PhD! My topic could be something in the field of data, information, knowledge integration, from a sociological perspective.
So I started looking into the official PhD rules and regulations at the University of Amsterdam, to find out what possibilities there are for people with a job getting a PhD. It turns out there are options, even with time/money compensation for University staff. But still not everything was clear to me. So I decided to ask Frank Huysmans, part time Library Science professor at the University of Amsterdam, and also active on twitter and in the open data and public libraries movement, if he could help me and explain the options and pros and cons of writing a dissertation. He agreed and we met in a pub in Amsterdam to discuss my ideas over a couple of nice beers.
The good thing was that Frank thought that I should be able to pull this off, looking at my background, work experience and writing. The bad thing is that he asked me “Are you sure that you don’t just want to write a nice popular science book?”. Apparently scholarly writing is subject to a large amount of rules and formats, and not meant for a pleasant read.
An encouraging thing that Frank told me is that it is possible to compile a dissertation from a number of earlier published scholarly peer reviewed articles. Now this really appealed to me, because this means that I can attempt to write a scholarly article and try to get it published first, and get the feel of the art of scholarly research, writing and publication, in a relatively short period. This way I can leave the options open to leave it at that or to continue with the PhD procedure later. I agreed to write a short dissertation proposal and send it to Frank and to discuss that in a next meeting.
My decision was made. Although in the meantime a couple of interesting perspectives at work appeared on the horizon involving research information and linked data, I was going to try and start a scholarly career.
Next time: the first steps – reading, thinking, writing a draft proposal and how to keep track of everything.
Posted on August 9th, 2013 2 comments
Connecting real books, metadata and people
I spent one day in the stacks of the off-site central storage facility of the Library of the University of Amsterdam as one of the volunteers helping the library perform a huge stock control operation which will take years. Goal of this project is to get a complete overview of the discrepancy between what’s on the shelves and what is in the catalogue. We’re talking about 65 kilometer of shelves, approximately 2.5 million items, in this central storage facility alone.
To be honest, I volunteered mainly for personal reasons. I wanted to see what is going on behind the scenes with all these physical objects that my department is providing logistics, discovery and circulation systems for. Is it really true that cataloguing the height of a book (MARC 300$c) is only used for determining which shelf to put it on?
The practical details: I received a book cart with a pencil, a marker, a stack of orange sheets of paper with the text “Book absent on account of stock control” and a printed list of 1000 items from the catalogue that should be located in one stack. I stood on my feet between 9 AM and 4 PM in a space of around 2-3 meter in one aisle between two stacks, with one hour in total of coffee and lunch breaks, in a huge building in the late 20th century suburbs of Amsterdam without mobile phone and internet coverage. I must say, I don’t envy the people working there. I’m happy to sit behind my desk and pc in my own office in the city centre.
Most of my books were indeed in a specific size range, 18-22 cm. approximately, with a couple of shorter ones. I found approximately 25-30 books on the shelves that were not on my list, and therefore not in the catalogue. I put these on my cart, replacing them with one of the orange sheets on which I pencilled the shelfmark of the book. There were approximately 5-10 books on my list missing from the shelves, which I marked on the list. One book had a shelfmark on the spine that was identical to the one of the book next to it. Inside was a different code, which seemed to be the correct one (it was on the list, 10 places down). I put 10 books on the cart of which I thought the title on the list didn’t match the title on the book correctly, but this is a tricky thing, as I will explain.
The title printed on the list was the “main title”, or MARC 245$a. It is very interesting to see how many differences there are between the ways that main and subtitles have been catalogued by different people through the ages. For instance, I had two editions on my list (1976 and 1980) of a German textbook on psychiatry, with almost identical titles and subtitles (title descriptions taken from the catalogue):
- Psychiatrie, Psychosomatik, Psychotherapie : Einführung in die Praktika nach der neuen Approbationsordnung für Ärzte mit 193 Prüfungsfragen : Vorbereitungstexte zur klinischen und Sozialpsychiatrie, psychiatrischen Epidemiologie, Psychosomatik,Psychotherapie und Gruppenarbeit
- Psychiatrie : Psychosomatik, Psychotherapie ; Einf. in d. Praktika nach d. neuen Approbationsordnung für Ärzte mit Schlüssel zum Gegenstandskatalog u.e. Sammlung von Fragen u. Antworten für systemat. lernende Leser ; Vorbereitungstexte zur klin. u. Sozialpsychiatrie, psychiatr. Epidemiologie, Psychosomatik, Psychotherapie u. Gruppenarbeit
The first book, from 1976 (which actually has ‘196’ instead of ‘193’ on the cover), is on the list and in the catalogue with the main title (MARC 245$a) “Psychiatrie, Psychosomatik, Psychotherapie :”.
The second book, from 1980, is on the list with the main title “Psychiatrie :”.
Evidently it is not clear without a doubt what is to be catalogued as main title and subtitle just by looking at the cover and/or title page.
I have seen a lot of these cases in my batch of 1000 books in which it is questionable what constitutes the main and subtitle. Sometimes the main title consists of only the initial part, sometimes it consists of what looks like main and subtitle taken together. At first I put all parts of a serial on my cart because in my view the printed titles were incorrect. They only contained the title of the specific part of the serial, whereas in my non-librarian view the title should consist of the serial title + part title. On the other hand I also found serials for which only the serial title was entered as main title (5 items “Gesammelte Werke”, which means “Collected Works” in German). No consistency at all.
What became clear to me is that in a lot of cases it is impossible to identify a book by the catalogued main title alone.
Another example of problematic interpretation I came across: a Spanish book, with main title “Teatro de Jacinto Benavente” on my list, and on the cover the author name “Jacinto Benavente” and title “Teatro: Rosas de Otoño – Al Natural – Los Intereses Creados”. On the title page: “Teatro de Jacinto Benavente”.
In the catalogue there are two other books with plays by the same author, just titled “Teatro”. All three have as author “Jacinto Benavente”. All three are books containing a number of theatre plays by the author Jacinto Benavente. There were a lot of similar books with as recorded main title ‘Theatre‘ in a number of languages.
A lot of older books on my shelves (pre 20th century mainly, but also more recent ones) have different titles and subtitles on their spine, front and title page. Different variations depending on the available print space I guess. It’s hard to determine what the actual title and subtitles are. The title page is obviously the main source, but even then it looks difficult to me. Now I understand cataloguers a little better.
Works on the shelves
So much for the metadata. What about the actual works? There were all kinds of different types mixed with each other, mostly in batches apparently from the same collection. In my 1000 items there were theatre related books, both theoretical works and texts of plays, Russian and Bulgarian books of a communist/marxist/leninist nature, Arab language books of which I could not check the titles, some Swedish books, a large number of 19th century German language tourist guides for Italian regions and cities, medical, psychological and physics textbooks, old art history works, and a whole bunch of social science textbooks from the eighties of which we have at least half at our home (my wife and I both studied at the University of Amsterdam during that period ). I can honestly say that most of the textbooks in my section of the stacks are out of date and will never be used for teaching again. The rest was at least 100 years old. Most of these old books should be considered as cultural heritage and part of the Special Collections. I am not entirely sure that a university library should keep most of these works in the stacks.
Apart from this neutral economical perspective, there were also a number of very interesting discoveries from a book lover’s viewpoint, of which I will describe a few.
A small book about Japanese colour prints containing one very nice Hokusai print of Mount Fuji.
A handwritten, and therefore unique, item with the title (in Dutch) a “Bibliography of works about Michel Angelo, compiled by Mr. Jelle Hingst”, containing handwritten catalogue type cards with one entry each.
A case with one shelfmark containing two items: a printed description and a what looks like a facsimile of an old illustrated manuscript.
An Italian book with illustrations of ornaments from the cathedral of Siena, tied together with two cords.
And my greatest discovery: an English catalogue of an exhibition at the Royal Academy from 1908: “Exhibition of works by the old masters and by deceased masters of the British school including a collection of water colours” (yes, this is one big main title).
But the book itself is not the discovery. It’s what is hidden inside the book. A handwritten folded sheet of paper, with letterhead “Hewell Grange. Bromsgrove.” (which is a 19th century country house, seat of the Earls of Plymouth, now a prison), dated Nov. 23. 192. Yes, there seems to be a digit missing there. Or is it “/92”? Which would not be logical in an exhibition catalogue from 1908. It definitely looks like a fountain pen was used. It also has some kind of diagonal stamp in the upper left corner “TELEGRAPH OFFICE FINSTALL”. Finstall is a village 3 km from Hewell Grange.
The paper also has a pencil sketch of a group of people, probably a copy of a painting. At first I thought it was a letter, but looking more closely it seems to be a personal impression and description of a painting. There are similar handwritings on the pages of the book itself.
I left the handwritten note where I found it. It’s still there. You can request the book for consultation and see for yourself.
End users, patrons, customers or whatever you want to call them, can’t find books that the library owns if they are not catalogued. They can find bibliographic descriptions of the books elsewhere, but not the information needed to get a copy at their own institution. This confirms the assertion that holdings information is very important, especially in a library linked open data environment.
The majority of books in an academic library are never requested, consulted or borrowed. Most outdated textbooks can be removed without any problem.
There are a lot of cultural heritage treasures hidden in the stacks that should be made accessible to the general public and humanities researchers in a more convenient way.
In the absence of open stacks and full text search for printed books and journals it is crucial that the content of books, and articles too, is described in a concise, yet complete way. Not only formal cataloguing rules and classification schemes should be used, but definitely also expert summaries and end user generated tags.
Even with cataloguing rules it can be very hard for cataloguers to decide what the actual titles, subtitles and authors of a book are. The best source for correct title metadata are obviously the authors, editors and publishers themselves.
Book storage staff can’t find requested books with incorrect shelfmarks on the spine.
Storing, locating, fetching and transporting books does not require librarian skills.
All in all, a very informative experience.
Posted on June 23rd, 2013 1 comment
Lessons from Cycling for Libraries
As I am writing this, more than 100 people working in and for libraries from all over the world are cycling from Amsterdam to Brussels in the Cycling for Libraries 2013 event, defying heat, cold, wind and rain. And other cyclists ;-). Cycling for Libraries is an independent unconference on wheels that aims to promote and defend the role of libraries, mainly public libraries, in and for society. This year it’s the third time the trip is organised. In 2011 (Copenhagen-Berlin) I was only able to attend the last two days in Berlin. In 2012 (Vilnius-Tallinn) I could not attend at all. This year I was honoured and pleased to be able to contribute to the organisation of and to actively participate in the first two days of the tour, in the area where I live (Haarlem) and work (Amsterdam).
I really like the Cycling for Libraries concept and the people involved, and I will tell you why, because it is not so obvious in my case. You may know that I am rather critical of libraries and their slowness in adapting to rapidly changing circumstances. And also that I am more involved with academic and research libraries than with public libraries. Moreover I have become a bit “conference tired” lately. There are so many library and related conferences where there is a lot of talk which doesn’t lead to any practical consequences.
The things I like about Cycling for Libraries are: the cycling, the passion, the open-mindedness, the camaraderie, the networking, the un-organisedness, the flexibility, the determination and the art of achieving goals and more than those.
I like cycling trips very much. I have done a few long and far away ones in my time, and I know that this is the best way to visit places you would never see otherwise. While cycling around you have unexpected meetings and experiences, you get a clear mind, and in the end there is the overwhelming satisfaction of having overcome all obstacles and having reached your goal. It is fun, although sometimes you wonder why on earth you thought you were up to this.
The organisers and participants of Cycling for Libraries are all passionate about and proud of their library and information profession, without being defensive and introverted. As you can see from their “homework assignments” they’re all working on innovative ways, on different levels and with varying scopes, to make sure the information profession stays relevant in the ever changing society where traditional libraries are more and more undervalued and threatened. So, the participants really try to make a difference and they’re willing to cycle around Europe in order to attract attention and spread the message.
Open minds are a necessity if you embark on an adventure that involves hundreds of people from different backgrounds. For collaborating both in decentralised organisation teams and in the core group of 120 people on the move, you need to appreciate and respect everybody’s ideas and contributions. Working towards one big overall goal requires giving and taking. Especially in the group of 120 people on wheels doing the hard work this leads to an intense form of camaraderie. These comrades on wheels are depending on each other to get where they’re going. A refreshing alternative for the everyday practice of competition and struggle between vendors, publishers and libraries.
The event offers an unparallelled opportunity for networking. With ordinary conferences the official programme offers a lot of interesting information and sometimes discussion, but let’s be honest, the most useful parts are the informal meetings during lunches, coffee breaks and most importantly in the pubs at night. It is then that relevant information is exchanged, new insights are born and valuable connections are made. Cycling for Libraries turns this model completely upside down and inside out. It is one long networking event with some official sessions in between.
Cycling for Libraries is an un-conference. As I have learned, this specific type on wheels depends on un-organisation and flexibility. It is impossible to organise an event like this following a strict and centralised coordination model where everybody has to agree on everything. For us Dutch people this can be uneasy. Historically we have depended on talking and agreeing on details in order to win the struggle against the water. The cyclists have ridden along the visible physical results of this struggle between Delft and Brugge, the dykes, dams and bridges of the Delta Works. The need to agree led to what became known as the Polder Model. On the other hand we also have a history of decentralised administration. “The Netherlands” is not a plural name for nothing, the country was formed out of a loose coalition of autonomous counties, cities and social groups, united against a common enemy.
Anyway, the organisation of Cycling for Libraries 2013 started in February with a meeting in The Hague with a number of representatives and volunteers from The Netherlands and Belgium (or Flanders I should say), when Jukka Pennanen, Mace Ojala and Tuomas Lipponen visited the area. After that the preparations were carried out by local volunteers and institutions without any official central coordination whatsoever. This worked quite well, with some challenges of course. I myself happened to end up coordinating events in the Amsterdam-Haarlem-Zandvoort area. I soon learned to take things as they came, delegate as much as possible and rely on time and chance. In the end almost everything worked out fine. In Amsterdam we planned the start event at OBA Central Public Library and the kick-off party at the University of Amsterdam Special Collections building. I only learned about the afternoon visit to KIT Royal Tropical Institute a couple of weeks before. I had nothing to do with that, but it turned out be a successful part of day one. Actually the day after KIT staff told the Cycling for Libraries participants that the museum and library faced closing down, a solution was reached and museum and library were saved. Coincidence?
The next day, the first actual cycling day between Amsterdam and The Hague, I cycled along for part of the route, from my home town Haarlem to halfway to The Hague. The visit of the Haarlem Station Library started an hour later than planned, and during lunch in Zandvoort on the coast we received word that the local public library were waiting for us to visit them. This was a surprise for me and Gert-Jan van Velzen (who helped plot the route from Amsterdam to The Hague). But we decided to go there anyway, and we were welcomed with free drinks and presents by the friendly librarians in their brand new building. At 4 o’clock we were expected to arrive at Noordwijk Public Library, but we were still in Zandvoort. No problem for Jeanine Deckers (airport librarian and regional librarian) who was waiting in Noordwijk with stroopwafels. Unfortunately I didn’t make it to Noordwijk, because I had to go back home and work the next day. But it was great to experience one day of actual cycling for libraries.
This loose, distributed and flexible organisation might be seen as an example of resilience, the concept that was introduced by Beate Rusch in her talk about the future of German regional library service centres at the recent ELAG 2013 conference in Ghent. Resilience means something like “the ability of someone or something to return to its original state after being subjected to a severe disturbance”, or simply put “something doesn’t break, but adapts under unexpected serious outside influences”. I completely agree that it would be better if organisations and infrastructures in the library and information profession were more loosely organised and connected. By the way, Beate was also involved in organising the Berlin part of the first Cycling for Libraries.
One final thing I want to say is that I admire the way in which Cycling for Libraries manages to reach their goals and more by means of this loose, distributed and flexible organisation. Depending on local coordination teams they succeeded in meeting Dutch members of parliament in The Hague and the European Parliament in Brussels to promote their cause. Which is a remarkable result for a small group of crazy Fins.
Posted on June 10th, 2013 17 comments
The inside-out library at ELAG 2013
This year marked my fifth ELAG conference since 2008 (I skipped 2009), which is not much if you take into account that ELAG2013 was the 37th one. I really enjoyed the 2013 conference, not in the least because of the wonderful people of the local organising committee at the Ghent University Library, who made ELAG2013 a very pleasant event.This year’s theme was “the inside-out library”, a concept coined by Lorcan Dempsey, which in brief emphasises the need for libraries to shift focus 180 degrees.
In my personal overall conference experience major emphasis was on research support in libraries. This was partly due to my attendance of the pre-conference Joint OpenAIRE/LIBER Workshop ‘Dealing with Data – what’s the role for the library?’ on May 28. It was good to have sessions focusing on different perspectives: data management, data publication, the researchers’ needs, library support and training. I was honoured to be invited to participate in the closing round table panel discussion together with two library directors Wilma van Wezenbeek (TU Delft Library) and Wolfram Horstmann (Bodleian Library), under the excellent supervision of Kevin Ashley (DDC). An important central concept in the workshop was the research life cycle, which consists of many different tasks of a very diverse nature. Academic and research libraries should focus on those tasks for which they are or can easily become qualified.
Looking from another angle we can distinguish two main perspectives in integrating research: the research ecosystem itself, which can be seen as the main topic of the OpenAIRE/LIBER workshop, and the research content, the actual focus of researchers and research projects. I will try to address both perspectives here.
On the first day of the actual conference Herbert Van de Sompel gave the keynote speech with the title “A clean slate”. Rurik Greenall aptly describes the scope and meaning of Herbert’s argument. Herbert has been involved in a number of important and relevant projects in the domain of scholarly communication. My impression this time was: now he’s bringing it all together around the fairly new concept of the “research object”, integrating a number of projects and protocols, like ORE, Memento, OpenAnnotation, Provenance, ResourceSync. It’s all about connections between all components related to research on the web in all dimensions.
This linking of input, output, procedures and actors of research projects in various temporal and contextual dimensions in a machine readable way is extremely important in order to be able to process all relevant information by means of computer systems and present it to the human consumer. In this respect I think it is essential that data citations in scholarly articles should not only be made available in the article text, but also as machine readable metadata that can be indexed by external aggregators.
Moreover, it would be even better if it was possible to provide links to research projects that would serve as central hubs for linking to all associated entities, not only datasets. This is the role that the research object can fulfill. During the OpenAIRE/LIBER workshop I tried to address this issue a number of times, because I am a bit surprised that both researchers and publishers appear to be satisfied with having text only clickable dataset citations. That is even the case the other way around with links to articles in dataset repositories like Dryad. I think there is a role here for information professionals and metadata experts in libraries. This is exactly the point that Peter van Boheemen made in his talk about producing better metadata for research output. Similarly Jing Wang stressed the importance of investigating the role of metadata specialists and data librarians for interoperability and authority control in her presentation on the open source linked data based research discovery tool Vivo.
Again there are two perspectives here. Even if we have machine readable metadata on research projects and datasets, most systems are not adequately equipped with functionality to process or present this information. It is not so easy to update complex systems with new functionality. Planned update cycles, including extensive testing, are necessary in order to adhere to the system’s design and architecture and to avoid breaking things. This equally applies to commercial, open source and home grown systems. Joachim Neubert’s presentation of the use of the open source CMS Drupal for linked data enhanced publishing for special collections illustrated this. Some very specialist custom extensions to the essentially quite flexible system were needed to make this a success. (On a different note, it was nice to see that Joachim used a simple triple diagram from my first library linked data blog post to illustrate the use of different types of predicates between similar subjects and objects.)
Anyway, a similar point can be made about systems and identifiers for people (authors, researchers, etc.). I participated in the workshop on ISNI, ORCID and VIAF : Examining the fundamentals and application of contributor identifiers led by Anila Angjeli and Thom Hickey, one of six ELAG workshops this year. Thom and Anila presented a very complete and detailed overview of the similarities and differences of these three identifier schemes. One of the discussion topics was the difference in adoption of these schemes by the community on the one hand and as machine readable metadata and their application in library systems on the other.
Here comes “resilience” into play, a concept introduced by Beate Rusch in her talk on the changing roles of the German regional library consortia and service centres in the world of cloud computing and SaaS. Rurik Greenall captures the essence of her talk when he says “… homogenous, generic solutions will not work in practice because they are at odds with how things are done …” and that “messy, imperfect systems… are smart and long lived”. Since Beate’s presentation the term “resilience” popped up in a number of discussions with colleagues, during and after the conference, mainly in the sense that most systems, communities, infrastructures are NOT resilient. Resilience is a concept mainly used in psychology and physics, meaning the ability of someone or something to return to its original state after being subjected to a severe disturbance. Beate’s idea with resilience is that we can adapt better to changing circumstances and needs in the world around us if we are less perfect and rigid than we usually are. In this sense I think resilience can also mean that a structure could permanently change instead of returning to its original state.
In the library world resilience can be applied to librarians, libraries, library infrastructure and library systems alike. In my view “resilience” might apply to the alternative architecture I have described in a recent blog post, where I argue that we should stop thinking systems and start thinking data. In order to be resilient we need an open, connected infrastructure, that is of the web (not on the web). The SCAPE infrastructure for processing large datasets for long term preservation, presented by Sven Schlarb, might fit this description.
A number of presentations focused on infrastructure and architecture. The new version of the Swedish union catalogue LIBRIS could be described as a resilient system. Martin Malmsten, Markus Sköld and Niklas Lindström showed their new linked open data based integrated library framework which was built from the ground up, from ”a clean slate” so to speak. I can only echo Rurik’s verdict “ With this, Libris really are showing the world how things are done”. Contrary to the Library of Congress BibFrame development which started very promising, but now seems to evolve into an inward looking rigid New Marc. This was illustrated by Martin Malmsten when he revealed to us that Marc is undead, and by Becky Yoose, who wrote a very pertinent parable telling the tale of the resurrection of Marc.
Rurik Greenall described the direction taken at his own institution NTNU Library: getting rid of old legacy library and webpage formats and moving towards being part of the web, providing information for the web, being data driven. It’s a slow and uphill struggle, but better than the alternative. A clean slate again!
Dave Pattern presented a different approach in connecting data from a number of existing systems and databases by means of APIs, and combining these into a new and well received reading list service at the University of Huddersfield.
Back to research. In our presentation, or rather performance, Jane Stevenson and I tried to present the conflicting perspectives of collection managers and researchers in a theatrical way, showing parallel developments in the music industry. Afterwards we tried to analyse the different perspectives, argued that researchers need connected information of all types and from all sources and concluded that information professionals should try and learn to take the researcher’s perspective in order to avoid becoming irrelevant in that area.
The relationship between libraries and researchers was also the subject of the talk “Partners in research. Outside the library, inside the infrastructure“, by Sally Chambers and Saskia Scheltjens. Here the focus was on providing comprehensive infrastructures for research support, especially in the digital humanities. Central question: large top-down institutionalised structures, or bottom-up connected networks? Bottom line is: the researcher’s needs have to be met in the best possible way.
A very interesting example of an actual digital humanities research and teaching project in collaboration between researchers and the library is the Annotated Books Online project that was presented by Utrecht University staff. The collection of rare books is made available online in order to crowdsource the interpretation of handwritten annotations present in these books.
Besides research support there were presentations on other “inside out library” topics: publishing, teaching, data analysis and GLAM.
Anders Söderbäck presented the Stockholm University Press, a new publishing house for open access digital and print on demand books. I was pleasantly surprised that Anders included two quotes of my aforementioned blog post in his talk: “...in the near future we will see the end of the academic library as we know it” and “According to some people university libraries are very suitable and qualified to become scholarly publishers … I am not sure that this is actually the case. Publishing as it currently exists requires a number of specific skills that have nothing to do with librarian expertise“. But of course Anders’ most important achievement was winning the Library Automation Bingo by including all required terms in one slide in a coherent and meaningful way.
Merrilee Proffitt presented an overview of MOOCs and libraries, Sarah Brown described the way that learning materials at the Open University in the UK are successfully connected and integrated in the linked data based STELLAR project. Looking at these developments the question arises if there are already efforts to come to a Teaching Object model, similar to the Research Object?
Andrew Nagy described the importance of analysing huge amounts of usage data in order to improve the usability and end user front end of the Summon discovery tool. Dan Chudnov presented the Social Media Manager prototype, used for collecting data from twitter in order to be used in social science research.
Valentine Charles described the activities carried out by Europeana to contribute large amounts of digitised library heritage resources to Wikimedia Commons by means of the GLAMwiki toolset in order to improve visibility of these resources the Open Access way. The GLAMwiki toolset currently appears to offer a number of challenges for the interoperability and integration of metadata standards between the library and the Wikimedia world. Another plea for resilience.
Then there were the workshops. The combination of these parallel hands-on and engaging group activities and the plenary sessions makes ELAG a unique experience. Although I only participated in one, obviously, I have heard good reports from all other workshops. I would like to give a special mention to Ade and Jane Stevenson’s “Very Gentle Linked Data” workshop, where they managed to teach even non-tech people not only the basic principles of linked data, but also how to create their own triple store and query it with SPARQL.
Summarising: looking at the ELAG2013 presentations, are we ready for the inside out library? Sometimes we can start with a clean slate, but that is not always possible. Resilience seems to be a requirement if we want to cope with the dramatic changes we are facing. But you can’t simply decide to be resilient, either something is resilient or it isn’t. A clean slate might be the only option. In any case it seems obvious that connections are key. The information profession needs to invest in new connections on every level, creating new forms of knowledge, in order to stay relevant.
Posted on March 22nd, 2013 46 comments
The BeyondThePDF2 conference, organised by FORCE11, was held in Amsterdam, March 19-20. From the website: “...we aim to bring about a change in modern scholarly communications through the effective use of information technology”. Basically the conference participants discussed new models of content creation, content dissemination, content consumption, funding and research evaluation.
Because I work for an academic library in Amsterdam, dealing with online scholarly information systems and currently trying to connect traditional library information to related research information, I decided to attend.
Academic libraries are supposed to support university students, teaching and research staff by providing access to scholarly information. They should be somewhere in the middle between researchers, authors, publishers, content providers, students and teachers. Consequently, any big changes in the way that scholarly communication is being carried out in the near and far future definitely affects the role of academic libraries. For instance, if the scholarly publication model would change overnight from the current static document centered model to a dynamic linked data model, the academic library discovery and delivery systems infrastructure would grind to a halt.
So I was surprised to see that the library representation at the conference was so low compared to researchers, publishers, students and tech/tools people (thanks to Paul Groth for the opening slides). No Dutch university library directors were present. Maybe that’s because they all attended the Research Data Alliance launch in Gothenburg which was held at the same time. I know of at least one Dutch university library director who was there. Maybe an official international association is more appealing to managers than an informal hands on bunch like FORCE11.
A number of questions arise from this observation:
Are academic libraries talking to researchers?
Probably (or maybe even apparently) not enough. Besides traditional library services like providing access to publications and collections, academic libraries are more and more asked to provide support for the research process as such, research data management, preservation and reuse, scholarly output repositories and research information systems. In order to perform these new tasks in an efficient way for both the library and the researcher, they need to communicate about needs and solutions.
I took the opportunity and talked to a couple of scholars/researchers at BeyondThePDF2, asking among other things: “When looking for information relevant to your research topic, do you use (our) library search tools?” Answer: “No. Google.” or similar. Which brings me to the next question.
Do researchers know what academic libraries are doing?
Probably (or maybe even apparently) not enough. Same answer indeed. It struck me that of the few times libraries were mentioned in talks and presentations, it was almost always in the form of the old stereotype of the stack of books. Books? I always say: “Forget books, it’s about information!”. One of the presenters whose visionary talk I liked very much even told me that they hoped the new Amsterdam University Library Director would know something about books.That really left me speechless.
Fortunately the keynote speaker on the second day, Carol Tenopir, had lots of positive things to say about libraries. One remark was made (not sure who said it) that has been made before: “if libraries do their work properly, they are invisible”. This specifically referred to academic libraries’ role in selecting, acquiring, paying for and providing technical access to scholarly publications from publishers and other content providers.
Another illustration of this invisibility is the in itself great initiative that was started during the conference: “An open alternative to Google Scholar”, which could just as well have been called “An open alternative to Google Scholar, Primo Central, WorldCatLocal, Summon, EDS”. These last four are the best known commercial global scholarly metadata indexes that lots of academic libraries offer.
Anyway, my impression that academic libraries need to pay attention to their changing role in a changing environment was once again confirmed.
Publishers and researchers talk to each other!
(Yes I know that’s not a question). In the light of the recent war between open access advocates and commercial publishers it was good to see so many representatives of Elsevier, Springer etc. actively engaged in discussions with representatives of the scholarly community about new forms of content creation and dissemination. Some of the commercial content providers/aggregators are also vendors of the above mentioned Google Scholar alternatives (OCLC-WorldCatLocal, Proquest/SerialsSolutions-Summon, EBSCO-EDS). All of these are very reluctant to contribute their own metadata to their competitors’ indexes. Academic libraries are caught in the middle here. They pay lots of money for content that apparently they can only access through the provider’s own channels. And in this case the publishers/providers do not listen to the libraries.
Why so many tools/tech people?
Frankly I don’t know. However, I talked to a tools/tech person who worked for one of the publishers. So there obviously is some overlap in the attendee provenance information. Speaking about myself, working for a library, I am not a librarian, but rather a tools/tech person (with an academic degree even). Tools/tech people work for publishers, universities, university libraries and other types of organisations.
There is a lot of interesting innovative technical work being done in libraries by tools/tech people. We even have our own conferences and unconferences that have the same spirit as BeyondThePDF. If you want to talk to us, come for instance to ELAG2013 in Ghent in May, where the conference theme will be “The inside-out library”. Or have a look at Code4Lib, or the Library Linked Data movement.
Besides the good presentations, discussions and sessions, the most striking result of BeyondThePDF2 was the start of no less than three bottom-up revolutionary initiatives that draw immediate attention on the web:
- The Scholarly Revolution – Peter Murray-Rust
- The Open Alternative to Google Scholar – Stian Håklev
- The Amsterdam Manifesto on Data Citation Principles - Merce Crosas, Todd Carpenter, Jody Schneider
We can make it work.
Posted on January 7th, 2013 98 comments
The future of the academic library as a data services hub
Is there a future for libraries, or more specifically: is there a future for academic libraries? This has been the topic of lots of articles, blog posts, books and conferences. See for instance Aaron Tay’s recent post about his favourite “future of libraries” articles. But the question needs to be addressed over and over again, because libraries, and particularly academic libraries, continue to persevere in their belief that they will stay relevant in the future. I’m not so sure.
I will focus here on academic libraries. I work for one, the Library of the University of Amsterdam. Academic libraries in my view are completely different from public libraries in audience, content, funding and mission. As far as I’m concerned, they only have the name in common. For a vision on the future of public libraries, see Ed Summer’s excellent post “The inside out library”. As for research and special libraries, some of what I am about to say will apply to these libraries as well.
So, is there a future for academic libraries? Personally I think in the near future we will see the end of the academic library as we know it. Let’s start with looking at what are perceived to be the core functions of libraries: discovery and delivery, of books and articles.
For a complete overview of the current library ecosystem you should read Lorcan Dempsey’s excellent article “Thirteen Ways of Looking at Libraries, Discovery, and the Catalog: Scale, Workflow, Attention”.
“Discovery happens elsewhere”. Lorcan Dempsey said this already in 2007 . What this means is that the audience the library aims at, primarily searches for and finds information via other platforms than the library’s website and search interfaces. Several studies (for instance OCLC’s “Perceptions of libraries, 2010“) show that the most popular platforms are general search engines like Google and Wikipedia but also specific databases. And of course, if you’re looking for instant information, you don’t go to the library catalogue, because it only points you to items that you have to read in order to ascertain that they may or may not contain the information you need.
And if you are indeed looking for publications (books, articles, etc.) you could of course search your library’s catalogue and discovery interface. But you can find exactly the same and probably even more results elsewhere: in other libraries’ search interfaces, or aggregators that collect bibliographic metadata from all over the world. Moreover, academic libraries are doing their best to get their local holdings metadata in WorldCat and their journal holdings in Google Scholar. As I said in my EMTACL12 talk: you can’t find all you need with one local discovery tool.
Also, the traditional way of discovery through browsing the shelves is disappearing rapidly. The physical copies at the University of Amsterdam Library for instance are all stored in a storage facility in a suburb. Apart from some reference works and latest journal issues there is nothing to find in the library buildings. There is no need for a university library building for discovery purposes anymore.
Utrecht University Library has taken the logical next step: they decided not to acquire a new discovery tool, discontinue their local homegrown article search index and focus on delivery. See the article “Thinking the unthinkable: a library without a catalogue” .
So, if discovery is something that academic libraries should not invest in anymore, is delivery really the only core responsibility left? Let’s have a closer look.
Delivery in the traditional academic library sense means: giving the customer access to the publications he or she selected, both in print and digital form. In the case of subscription based e-journal articles, delivery consists of taking a subscription and leading the customer to the appropriate provider website to obtain the online article. Taking subscriptions is an administrative and financial activity. For historical reasons the university library has been taking care of this task. Because they handled the print subscriptions, they also started taking care of the digital versions. But actually it’s not the library that holds the subscription, it’s the university. And it really does not require librarian skills to handle subscriptions. This could very well be taken care of by the central university administration. For free and open access journals you don’t even need that.
The selection and procurement of journal packages from a large number of publishers and content providers is a different issue. Specific expertise is required for this. I will come to that later.
The task of leading the customer to the appropriate online copy is only a technical procedure, involving setting up link resolvers. Again, no librarian skills needed. This task could be done by some central university agency, maybe even using an external global linking registry.
As for the delivery of physical print copies, this is obviously nothing more than a logistics workflow, no different from delivery of furniture, tools, food, or any other physical business. The item is ordered, it is fetched from the shelf, sometimes by huge industrial robot installations, put in a van or cart, transported to the desired location and put in the customer’s locker or something similar. Again: no librarian skills whatsoever. Physical delivery only needs a separate internal or external logistics unit.
So, if discovery and delivery will cease to be core activities of the central university library organisation, what else is there?
Selection of print and digital material was already mentioned. It is evident that the selection of printed and digital books and journal subscriptions needs to be governed by expert knowledge and decisions in order to provide staff and students with the best possible material, because there is a lot of money involved. Typically this task is carried out by subject specialists (also called subject librarians), not by generalists. These ‘faculty liaisons’ usually have had an education in the disciplines they are responsible for, and they work closely together with their customers (academic staff and students). Many universities have semiautonomous discipline oriented sublibraries. The recent development of Patron Driven Acquisition (PDA) also fits into this construction.
The actual comparison, selection and procurement of journal packages from a large number of publishers and content providers requires a certain generic specific expertise which is not discipline dependent. This is a task that could well continue to be the responsibility of some central organisational unit, which may or may not be called the university library.
And what about cataloguing, a definite librarian skill? If discovery happens elsewhere, and libraries don’t need to maintain their own local catalogues, then it seems obvious that libraries don’t need to catalogue anything anymore. In fact, in the current situation most libraries don’t catalogue that much already. All the main bibliographical metadata for books (title, author, date, etc.) are already provided by publishers, by external central library service centres, or by other libraries in a shared cataloguing environment. And libraries have never catalogued journal articles anyway, only journals and issues. Article metadata are provided by the publishers or aggregators. Libraries pay for these services.
It is usual for libraries to add their own subject headings and classification terms to the already existing ones. But as Karen Coyle said at EMTACL12: “Library classification is a knowledge prevention system“, because it offers only one specific object oriented view on the information world. So maybe libraries should stop doing this, which would be in line with the “discovery happens elsewhere” argument anyway.
What remains of cataloguing is adding local holdings, items and subscription information. This is very useful information for library customers, but again this doesn’t seem to require very detailed librarian skills. As a matter of fact most of these metadata are already provided in the selection and acquisition process by acquisition staff and vendors.
The recent Library of Congress BIBFRAME initiative developments in theory make it possible to replace all local cataloguing efforts by linking local holdings information to global metadata.
There is still one area that may require the full local cataloguing range: the university’s own scientific output, as long as it is not published in journals or as books. The fulltext material is made available through institutional repositories, which obviously requires metadata to make the publications findable. However, the majority of the institutional publications are made available through other channels as well, as mentioned, so the need for local cataloguing in these cases is absent.
More and more students are coming to the library buildings every day, that’s what you hear all the time. Large amounts of money are spent on creating new study centres and meeting places in existing library buildings, even on new buildings. But that’s exactly the point: students don’t come to the library for discovery anymore, because the building no longer provides that. They come for places to study, use network pc’s or the university wifi, meet with fellow students, pick up their print items on loan, or view not-for-loan material. The physical locations are nothing more or less than study centres. There’s absolutely nothing wrong with that, they are very important, but they do not have to be associated with the university library, but can be provided by the university, on any location.
The reference desk, or its online counterpart, is a weird phenomenon. It seems to emphasise the fact that if you want instant information, books are of no use. On the other hand, it suggests that you should come to the library if you need specific information right now. In my view, although the reference desk partly embodies the actual original objective of a library, namely giving access to information, this could function very well outside the library context.
The reference desk service is also somewhat ambiguous. In some cases subject specialist expertise is needed, other cases require a more general knowledge of how to search and find information.
Statistics of the use of library holdings, both print and electronic, are an important source of information for making decisions on acquisitions and subscriptions. These statistics are provided by local and remote delivery systems and vendors. Usage statistics can also be used for other purposes, like identifying certain trends in scholarly processes, mapping of information sources to specific user groups, etc. Administering and providing statistics once again is not a librarian task, but can be done by internal or external service providers.
Special Collections are a Special Case. Most university libraries have a Special Collections division, for historical reasons. But of course Special Collections divisions are nothing less than a Museum and Archive division with specific skills, expertise and procedures. Most of the time they are autonomous units within the university anyway.
Now, if the traditional library tasks of selection, cataloguing, discovery and delivery will increasingly be carried out by non-librarian staff and units inside and outside the university, is there still a valid reason for maintaining an autonomous central university library organisation? Should academic libraries shift focus? There are a number of possible new services and responsibilities for the library that are being discussed or already being implemented.
Content curation can be seen as the task of bringing together information on a specific subject, of all kinds, from different sources on the web to be consumed by people in an easy way. This is something that can be done and is already done by all kinds of organisations and people. Libraries, academic, public and other types, can and should play a bigger role in this area. This involves looking at other units and sources of information than just the traditional library ones: books and journals. This new service type evidently is closely related to the traditional reference desk service.
Obviously this can best be taken care of by subject specialists. To do this, they need tools and infrastructure. These tools and infrastructure are of a generic nature and can be provided by technical specialists inside or outside the libraries or universities.
Techniques are often referred to as “mashups” or “linked data”, depending on the background of the people involved.
Linked data deserves its own section here, because it has been an ever widening movement since a number of years. It finally reached the library world the last couple of years with developments like the W3C Library Linked Data Incubator Group, the Library of Congress BIBFRAME initiative and the IFLA Semantic Web Special Interest Group. Linked data is a special type of data source mashup infrastructure. It requires the use of URIs for all separately usable data entities, and triples as the format for the actual linking (subject-predicate-object), mostly using the RDF structure.
There are two sides to linked data: the publishing of data in RDF and consequently the consumption of data elsewhere. A special case is the linked data based infrastructure, combining both publication and consumption in a specific way, as is the objective of the above mentioned BIBFRAME project.
Again, we need both subject specialists and generic technology experts to make this work in libraries, both academic and public ones.
University libraries are more and more expected to increase the level of support for researchers. It’s not only about providing access to scholarly publications anymore, but also about maintaining research information systems, virtual research environments, and long term preservation, availability and reusability of research data sets.
Again, here we see the need for discipline specific support because the needs of researchers for communication, collaboration and data varies greatly per discipline. And again, for the technical and organisational infrastructure we need internal or external generic technology experts and services. Apart from metadata expertise there are no traditional librarian skills required.
The Final Frontier: the library turning 180 degrees and switching from consumption to production of publications. According to some people university libraries are very suitable and qualified to become scholarly publishers (see for instance Björn Brembs‘ “Libraries Are Better Than Corporate Publishers Because…”). I am not sure that this is actually the case. Publishing as it currently exists requires a number of specific skills that have nothing to do with librarian expertise. A number of universities already have dedicated university press publishing agencies. But of course the publishing process can and probably will change. There is the open access movement, there is the rebellion against large scientific publishers, and last but not least, there is the slow rise of nanopublications, which could revolutionise the form that scholarly publishing will take. In the future publishing can originate at the source, making use of all kinds of new technologies of linking different types of data into new forms of non-static publications. Universities or university libraries could play a role here. Again we see here the need for both subject specialists and generic technology.
Special and general
So what is the overall picture? Of the current academic library tasks, only a few may still be around in the university in the future: selection, acquisition, cataloguing (if any), reference desk, usage statistics, and only a small part actually requires traditional librarian skills. Together with the new service areas of content curation, linked data, research support and publishing, this is rather an odd collection of very different fields of expertise. There does not seem to be a nice matching set of tasks for one central university division, let alone a library.
But what all these areas have in common is that they depend on linking and coordination of data from different sources.
And another interesting conclusion is that virtually all of these areas have two distinct components:
- Discipline or subject specific expertise
- Generic technical and organisational data infrastructure
I see a new duality in the realm of information management in universities. Selection, content curation, reference desk, linking data, cataloguing and research support will all be the domain of subject specialists directly connected to departments responsible for teaching and research in specific disciplines. These discipline related services will depend on generic technological and organisational infrastructures, available inside and outside the university, maintained by generic technical specialists.
These generic infrastructures could function completely separately, or they could somehow be interlinked and coordinated by some central university organisational unit. This would make sense, because there is a lot of overlap in information between these areas. Some kind of central data coordination unit would make it possible to provide a lot more useful data services than can be imagined now. Also, usage statistics, acquisition and the potential new publishing framework, yes even the special collections, could benefit from a central data services unit.
Such a unit would be different from the existing university ICT department. The latter mainly provides generic hardware, network, storage and security, and is focused on the internal infrastructure, trying to keep out as much external traffic as possible.
The new unit would be targeted at providing data services, possibly built on top of the internal technical infrastructure, but mainly using existing external ones. And it is obvious that there is added value in cooperation with similar bodies outside the university.
“Data services” then stands for providing storage, use, reuse, creation and linking of internal and external metadata and datasets by means of system administration, tools selection and implementation, and explicitly also programming when needed.
Such a unit would up to a point resemble current library service providers like the German regional library consortia and service centres such as hbz, KOBV or GBV, or high level organisations like the Dutch National Library Catalogue project.
Paraphrasing the conclusion of my own SWIB12 talk: it is time to stop thinking publications and start thinking data. This way the academic library could transform itself into a new central data services hub.
(Subject expertise AND data infrastructure) OR else!