During my vacation I saw this tweet by LIBER about topics to address, as suggested by the participants of the LIBER 2019 conference in Dublin:
It shows a word cloud (yes, a word cloud) containing a large number of terms. I list the ones I can read without zooming in (so the most suggested ones, I guess), more or less grouped thematically:
Linked open data
Fair (probably meaning FAIR)
The last topic (“Better food“) probably says something about the conference organisation, not about global health or climate change.
Although being on holiday I could not resist reacting on twitter with “And for all that you need interoperability and a sound digital infrastructure.“. My tweet got a small number of likes, retweets and replies, mostly from the more library technology inclined.
The reason for my tweet was that LIBER2019 was another example of a library conference focusing on global ideals and objectives without paying any attention to the means that are needed to actually achieve those. Let’s have a look at the grouped suggested topics. We see “Openness“, “Scholarship/research“, “Data/digital objects curation“, “Skills” and more general “Management” buzzwords. Ignoring “Better food” we have two acronyms left, of which “AI” (Artificial Intelligence, I presume) could be grouped with Scholarship/research. Leaving only “IIIF” as a specific technical/infrastructural topic that could serve as a means to achieve the objectives outlined in the other topic groups,
Now, it is understandable that LIBER conference participants mention these topics, because LIBER is the Association of European Research Libraries. But what to my mind is not understandable is that these Research Libraries conference participants do not talk about the practical issues involved to achieve those goals. Even more so because the first line in LIBER’s mission is “Provide an information infrastructure to enable research in LIBER Institutions to be world class“.
To be clear: by “digital infrastructure” I don’t mean the hardware layer underlying all digital communication (servers, workstations, cables, routers, etc.), but the layers on top of that (systems, databases, data and record formats, digital object formats, identifiers, communication protocols, data flows, API’s, export and import tools, etc.).
I have never attended a LIBER conference myself, so I can’t say anything about the nature of the event from personal experience, but people who have attended tell me that the conference has been mainly targeted at library management. Looking at the LIBER2019 programme however, there are a small number of presentations that look like they may have been of a more practical or technical nature.
Anyway, having been to many library and library related conferences and events over the years I think I can safely say that most “general” library conferences focus more on missions and objectives, ignoring the practical and technical conditions and requirements that are essential to achieve just those. And of course the more “technical” library conferences tend to do the opposite, ignoring organisational, social and financial conditions. We really need conferences that take into account both sides.
The fact remains however that a sound digital infrastructure, both internally within the individual institutions and externally between institutions, is essential. And I prefer going to the more practical events, because I’m a bit allergic to events where people say “We should do [fill in whatever you think we should be doing as libraries]“, “We are so great“, “We are so inspired“.
As a follow up to my original tweet, in a reply to Christina Harlow and Rurik Greenall, I said: “Let’s propose a presentation for liber2020 about teaching the librarians how infrastructure is essential for their word cloud ideas to get real.“. Christina replied “I’m on board“, to which Saskia Scheltjens, of Rijksmuseum Amsterdam, reacted “I’ll hold you all to that, you know 😉“. But someone else (Timo Borst) warned me: “LIBER is not the right place to address those challenges. It is rather a club of feelgood-librarians, reinforcing themselves in what they are doing and always have done. There are many skilled people engaged at #LIBER, but no common ground and agenda for tackling infra issues.“, confirming more or less my experiences with library conferences.
I am not sure what will happen next year at LIBER2020, but let’s take this subject a bit further and move away from conferences to the actual institutions (libraries, archives, museums). After working in libraries for 16 years now (of which the last 13 years for the Library of the University of Amsterdam) it is my experience that “digital infrastructure” is not a topic that has been the subject of much attention from the people who decide about funding, resources and policies in libraries and other heritage institutions.
Since I started working for the Library of the University of Amsterdam in 2006, in the Digital Services/Systems department, I have been trying to get the library focus more on the underlying infrastructure instead of only on end user services, individual dedicated systems and data formats, without success. Time and again decisions were made to either replace one proprietary system with another, solve a problem with a new system, or create a new database with metadata copied from another one, thereby increasing a huge unmanageable landscape of data formats, systems and user interfaces without possibilities to actually innovate. Even in 2018 in a meeting about establishing the new strategic policy plan for the Library one of the attending management team members said “Infrastructure is a difficult word“.
Until recently, after my colleague Saskia Woutersen-Windhouwer (who now works for Leiden University Library) and I managed to get a memo accepted about “Open Collections”, in which we argued that the Library should adopt “FAIR principles for collections”. An adapted English version of that is available online as an article in Code4Lib Journal, Issue 40, 2018. “FAIR” stands for Findable, Accessible, Interoperable, Reusable, and the original FAIR principles are targeted at scholarly output, in particular research data sets. We adapted these original principles to apply to heritage collections, distinguishing FAIR principles for Objects, Metadata and Metadata Records.
In the official advice following the original memo we also distinguished three additional aspects of FAIR principles, making clear that infrastructure is not only technical: Licensing, Infrastructure, Data Quality (LID). Obviously there is a certain overlap between these three aspects: for instance a licence must also be entered in the data and must be machine-readable. Besides that we also stressed the need for organisational change. The way that workflows are organised is part of the infrastructure. Departments have always been focused on traditional activities that were separated, such as metadata, systems, user services. A more integrated approach is needed.
To make a longer story short, in the Library’s new Strategy Plan for 2019-2022 a “coherent and future proof digital infrastructure” is presented as an essential precondition for all other strategic objectives (Open Collections, Open Science and Education, Open Campus, Open Knowledge). And from this year on I will be coordinating the planning and projects to realise this new streamlined digital infrastructure, together with a specially assembled core team of representative library employees with required expertise from various departments.
Given my earlier remarks about heritage institutions and infrastructure, I have the impression that the challenges we are facing are not unique for our situation. Maybe other institutions can benefit from the approach described here, while at the same time I hope we can benefit from other institutions’ experiences.
In our planning we distinguish between ongoing, structural activities that can already be executed now, and short term projects that will implement clearly described goals and also lead to ongoing, structural workflows.
The currently ongoing activities are:
- Monitoring and advising on infrastructural aspects of new projects
- Maintaining a structured dynamic overview of the current systems and dataflow environment
- Communication about the principles, objectives and status of the programme
For the short term projects we determined dependencies, made a planning and assigned core project teams that can be extended with internal and external experts as needed. We also chose a defined and limited use case as core pilot to focus on and use as test bed before wider implementation of the results. This pilot consists of a set of over 300 old maps in a 19th century unique “collector’s atlas” in the possession of the Library. A set of high resolution digitised images of the maps is available, that are catalogued but not yet presented directly on a library website.
The project topics (with very brief descriptions) are:
- Establish and implement default and dedicated licenses for objects and metadata
- Object PID’s
- Decide on and implement PID schemes to be used for physical and digital objects
- Controlled Vocabularies
- Decide on and implement authority schemes using PID’s for people, subjects etc.
- Metadata Set
- Decide on and implement the standard minimum required metadata for various use cases, based on data quality guidelines
- ETL Hub
- Implement a uniform central Extract Transform Load platform for streamlining data flows and data conversions
- Object Platform
- Decide on and implement a platform for storing, distributing and preserving digital objects
- Digital Objects/IIIF
- Decide on and implement formats, types, resolution for digital objects, focusing on IIIF
- Digitisation Workflow
- Implement workflow for digitising physical objects
- Implement methods, protocols and platforms for accessing and reusing objects and metadata
- Data Enrichment
- Investigate and implement methods of enriching metadata through text and data mining etc.
- Linked Data
- Investigate and implement methods of publishing linked data
- Investigate options of Alma (which is our new main backoffice platform) as central data and object hub
- Especially for digital maps: investigate and implement georeferencing options and use cases
For some of these topics individual pilots and projects are already planned or have been carried out. The idea is to connect and integrate these existing plans and projects in order to avoid redundant work and conflicting results.
There is a natural dependency scheme between the project topics. For instance licensing, PID’s, protocols, controlled vocabularies and a good metadata set are required before you can actually publish your data for access and reuse. The same applies to Linked Open Data. To publish objects for reuse you need to have the formats, platform, protocols and licensing sorted out.We can’t find out everything by ourselves, obviously. We will gladly use experiences from other institutions. We will contact you soon. And if you have any valuable advice to give, don’t hesitate to contact me.
2 thoughts on “Infrastructure for heritage institutions”
Since you mention ETL & storage together with “implement”: Please have a look at GitLab’s Meltano, as well as Git-Large-File-Storage. Since all Git-products use SHA, something close to PIDs is already built-in, and with GitLab or one of the other Git hosting platforms, “methods, protocols and platforms for accessing and reusing” are also just there.
The picture you paint looks all too familiar to me. Very curious to learn about your progress and to share ideas about putting this topic higher on the agenda in our sector.