Infrastructure for heritage institutions – change of course

Permalink: http://purl.org/cpl/3069


In July 2019 I published the first post about our planning to realise a “coherent and future proof digital infrastructure” for the Library of the University of Amsterdam. In February I reported on the first results. As frequently happens, since then the conditions have changed, and naturally we had to adapt the direction we are following to achieve our goals. In other words: a change of course, of course.

 Projects 

I will leave aside the ongoing activities that I mentioned, and focus on the thirteen short term projects, which were originally planned like this:

  • Licensing
  • Object PID’s
  • Controlled Vocabularies
  • Metadata Set
  • ETL Hub
  • Object Platform
  • Digital Objects/IIIF
  • Digitisation Workflow
  • Access/Reuse
  • Data Enrichment
  • Linked Data
  • Alma
  • Georef

In my first results post these were already grouped together based on status and dependencies:

  • Object PID’s
  • Object Platform/Digital Objects/IIIF
  • Licensing
  • Metadata Set/Controlled Vocabularies
  • Data Enrichment/Georeference
  • Other projects (dependent on the results in the main projects):
    • ETL Hub
    • Digitisation Workflow
    • Access/Reuse
    • Linked Data

Investigating the options of Alma as a separate project was abandoned, because it became very clear that Alma fulfils a central role in almost all other aspects of the digital infrastructure.

 Developments 

In the mean time the exploratory study into the options for a digital object platform have resulted in a recommendation to procure a long term digital preservation (DP) solution, in compliance with the OAIS reference model, which takes descriptive metadata from Alma and other systems and also serves as the source for publication of digital objects through various channels (Digital Asset Management – DAM). Given the expected procurement and implementation time for such a system, a working digital object platform will not be available until the end of 2021 the earliest. Since the digital object focused projects are all closely interlinked with the availability of a digital object platform, and also because of a number of experiences in the other projects, we have decided to restructure the original planning completely.

 Adapted planning 

Firstly we have defined two separate main project clusters, a data cluster and a digital object cluster. This involved joining and splitting some of the existing project ideas. Secondly, we have separated both clusters in time. We will implement the data cluster first, as far as possible in 2020, and after that the digital objects cluster starting in 2021.

Two projects have a bit of both, they have been grouped together and will be assessed separately. Finally a new project was defined, focusing on streamlining the full digital infrastructure system and database landscape, with the objective of eliminating redundancies in both systems and data.

  • Data Cluster (2020)
    • Data Licences
    • Data Quality sub cluster
      • Object PID’s
      • Controlled Vocabularies
      • Metadata Set
    • Data Publication sub cluster
      • Data Access and Reuse
      • ETL
      • Linked Data
  • Digital Objects Cluster (2021-2022)
    • Object Licenses
    • Digital Objects Platform
    • Digital Object Representations
    • Digitisation Workflow
    • Digital Objects Access and Reuse
  • Data + Digital Objects (2020-2022?)
    • Data Enrichment
    • Georeferencing
  • Digital Infrastructure Streamlining (2020-2022)

 Dependencies 

In the Data Cluster, results of the Data Licences and Data Quality projects must be available for implementing Data Publication options. Linked Data can only be implemented if there is already a data publication facility available, including ETL procedures.

In the Digital Objects Cluster the Digital Objects Platform (DP/DAM) must be available in order to implement a full blown Digitisation Workflow. Access and reuse of digital objects depend on the availability of the platform with relevant object representations and licenses.

The Data Enrichment and Georeferencing projects are both aimed at generating additional metadata for digitised maps based on the digital objects themselves. For a full and serious implementation high quality digital object representations in relevant formats should be available on a fully functioning digital object platform, and this will not be available before the end of 2021. In the mean time a pilot could be executed with currently available offline digital maps. Planning this will be considered independently of the main project clusters.

Streamlining the digital infrastructure is obviously targeted at existing and future systems and data, and dependent on developments in the digital infrastructure program. The project will start as soon as possible nonetheless, with an exploratory and definition phase.

 Current status 

In the Data Cluster we are ready to start implementing persistent identifiers for collection objects in the broadest sense. This PID project will be the subject of another more detailed post. In brief: we will adopt a pragmatic approach and maintain a hybrid environment, keeping our existing handles and DOI’s and implementing ARK as the new default PID system, using rule based PID assignment based on identifiers available in the target systems. This entails copying the identifiers used to new systems in case of future migrations in order to keep the identifiers persistent.

For Data Licences we are inclined to use a public domain ODC PDDL licence as the default licence for data. An exception will have to be made for data originating from OCLC WorldCat, which applies to the bulk of our data in Alma and derivatives thereof. For WorldCat data an ODC-BY licence must be used acknowledging the OCLC WorldCat origin. It will be a bit of a challenge how to use both licences simultaneously for our Alma instance, since part of the Alma data does not derive from WorldCat.

The results of both Data Licences and the Data Quality projects (Object PID’s, Controlled Vocabularies, Metadata Set) will go into the new Data Publication project, which will be undertaken in the second half of 2020. This project is aimed at publishing our collection data as open and linked data in various formats via various channels. A more detailed post will be published separately.

As mentioned before, the Digital Objects Platform and related projects will take some time. In the mean time a IIIF pilot has already been completed successfully. IIIF is available for the current online image repository.Last but not least, the exploratory phase of the Infrastructure Streamlining project will start in the second half of 2020.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.