Data. The final frontier.
RSS icon Home icon
  • Looking for data tricks in Libraryland

    Posted on September 5th, 2014 Lukas Koster No comments

    IFLA 2014 Annual World Library and Information Congress Lyon – Libraries, Citizens, Societies: Confluence for Knowledge

    DSC01405

    After attending the IFLA 2014 Library Linked Data Satellite Meeting in Paris I travelled to Lyon for the first three days (August 17-19) of the IFLA 2014 Annual World Library and Information Congress. This year’s theme “Libraries, Citizens, Societies: Confluence for Knowledge” was named after the confluence or convergence of the rivers Rhône and Saône where the city of Lyon was built.

    This was the first time I attended an IFLA annual meeting and it was very much unlike all conferences I have ever attended. Most of them are small and focused. The IFLA annual meeting is very big (but not as big as ALA) and covers a lot of domains and interests. The main conference lasts a week, including all kinds of committee meetings, and has more than 4000 participants and a lot of parallel tracks and very specialized Special Interest Group sessions. Separate Satellite Meetings are organized before the actual conference in different locations. This year there were more than 20 of them. These Satellite Meetings actually resemble the smaller and more focused conferences that I am used to.

    A conference like this requires a lot of preparation and organization. Many people are involved, but I especially want to mention the hundreds of volunteers who were present not only in the conference centre but also at the airport, the railway stations, on the road to the location of the cultural evening, etc. They were all very friendly and helpful.

    Another feature of such a large global conference is that presentations are held in a number of official languages, not only English. A team of translators is available for simultaneous translations. I attended a couple of talks in French, without translation headset, but I managed to understand most of what was presented, mainly because the presenters provided their slides in English.

    It is clear that you have to prepare for the IFLA annual meeting and select in advance a number of sessions and tracks that you want to attend. With a large multi-track conference like this it is not always possible to attend all interesting sessions. In the light of a new data infrastructure project I recently started at the Library of the University of Amsterdam I decided to focus on tracks and sessions related to aspects of data in libraries in the broadest sense: “Cloud services for libraries – safety, security and flexibility” on Sunday afternoon, the all day track Universal Bibliographic Control in the Digital Age: Golden Opportunity or Paradise Lost?” on Monday and “Research in the big data era: legal, social and technical approaches to large text and data sets” on Tuesday morning.

    Cloud Services for Libraries

    It is clear that the term “cloud” is a very ambiguous term and consequently a rather unclear concept. Which is good, because clouds are elusive objects anyway.

    In the Cloud Services for Libraries session there were five talks in total. Kee Siang Lee of the National Library Board of Singapore (NLB) described the cloud based NLB IT infrastructure consisting of three parts; a private, public and hybrid cloud. The private (restricted access) cloud is used for virtualization, an extensive service layer for discovery, content, personalization, and “Analytics as a service”, which is used for pushing and recommending related content from different sources and of various formats to end users. This “contextual discovery” is based on text analytics technologies across multiple sources, using a Hadoop cluster on virtual servers. The public cloud is used for the Web Archive Singapore project which is aimed at archiving a large number of Singapore websites. The hybrid cloud is used for what is called the Enquiry Management System (EMS), where “sensitive data is processed in-house while the non-sensitive data resides in the cloud”. It seems that in Singapore “cloud” is just another word for a group of real or virtual servers.

    In the talk given by Beate Rusch of the German Library Network Service Centre for Berlin and Brandenburg KOBV the term “cloud” meant: the shared management of data on servers located somewhere in Germany. KOBV is one of the German regional Library Networks involved in the CIB project targeted at developing a unified national library data infrastructure. This infrastructure may consist of a number of individual clouds. Beate Rusch described three possible outcomes: one cloud serving as a master for the others, a data roundabout linking the other clouds, and a cross cloud dataspace where there is an overlapping shared environment between the individual clouds. An interesting aspect of the CIB project is that cooperation with two large commercial library system vendors, OCLC and Ex Libris, is part of the official agreement. This is of interest for other countries that have vested interests in these two companies, like The Netherlands.

    Universal Bibliographic Control in the Digital Age

    The Universal Bibliographic Control (UBC) session was an all day track with twelve very diverse presentations. Ted Fons of OCLC gave a good talk explaining the importance of the transition from the description of records to the modeling of entities. My personal impression lately is that OCLC all in all has been doing a good job with linked data PR, explaining the importance and the inevitability of the semantic web for libraries to a librarian audience without using technical jargon like URI, ontology, dereferencing and the like. Richard Wallis of OCLC, who was at the IFLA 2014 Linked Data Satellite Meeting and in Lyon, is spreading the word all over the globe.

    Of the rest of the talks the most interesting ones were given in the afternoon. Anila Angjeli of the National Library of France (BnF) and Andrew MacEwan of the British Library explained the importance, similarities and differences of ISNI and VIAF, both authority files with identifiers used for people (both real and virtual). Gildas Illien (also one of the organizers of the Linked Data Satellite Meeting in Paris) and Françoise Bourdon, both BnF, described the future of Universal Bibliographic Control in the web of data, which is a development closely related to the topic of the talks by Ted Fons, Anila Angjeli and Andrew MacEwan.

    The ONKI project, presented by the National Library of Finland, is a very good example of how bibliographic control can be moved into the digital age. The project entails the transfer of the general national library thesaurus YSA to the new YSO ontology, from libraries to the whole public sector and from closed to open data. The new ontology is based on concepts (identified by URIs) instead of monolingual text strings, with multilingual labels and machine readable relationships. Moreover the management and development of the ontology is now a distributed process. On top of the ontology the new public online Finto service has been made available.

    The final talk of the day “The local in the global: universal bibliographic control from the bottom up” by Gordon Dunsire applied the “Think globally, act locally” aphorism to the Universal Bibliographic Control in the semantic web era. The universal top down control should make place for local bottom up control. There are so many old and new formats for describing information that we are facing a new biblical confusion of tongues: RDA, FRBR, MARC, BIBO, BIBFRAME, DC, ISBD, etc. What is needed are a number of translators between local and global data structures. On a logical level: Schema Translator, Term Translator, Statement Maker, Statement Breaker, Record Maker, Record Breaker. These black boxes are a challenge to developers. Indeed, mapping and matching of data of various types, formats and origins are vital in the new web of information age.

    DSC01385

    Research in the big data era

    The Research in the big data era session had five presentations on essentially two different topics: data and text mining (four talks) and research data management (one talk). Peter Leonard of Yale University Library started the day with a very interesting presentation of how advanced text mining techniques can be used for digital humanities research. Using the digitized archive of Vogue magazine he demonstrated how the long term analysis of statistical distribution of related terms, like “pants”, “skirts”, “frocks”, or “women”, “girls”, can help visualise social trends and identify research questions. To do this there are a number of free tools available, like Google Books N-Gram Search and Bookworm. To make this type of analysis possible, researchers need full access to all data and text. However, rights issues come into play here, as Christoph Bruch of the Helmholtz Association, Germany, explained. What is needed is “intelligent openness” as defined by the Royal Society: data must be accessible, assessable, intelligible and usable. Unfortunately European copyright law stands in the way of the idea of fair use. Many European researchers are forced to perform their data analysis projects outside Europe, in the USA. The plea for openness was also supported by LIBER’s Susan Reilly. Data and text mining should be regarded as just another form of reading, that doesn’t need additional licenses

    IdeasBox

    IdeasBox

    IdeasBox packed

    A very impressive and sympathetic library project that deserves everybody’s support was not an official programme item, but a bunch of crates, seats, tables and cushions spread across the central conference venue square. The whole set of furniture and equipment, that comes on two industrial pallets, constitutes a self supporting mobile library/information centre to be deployed in emergency areas, refugee camps etc. It is called IdeasBox, provided by Libraries without Borders. It contains mobile internet, servers, power supplies, ereaders, laptops, board games, books, etc., based on the circumstances, culture and needs of the target users and regions. The first IdeasBoxes are now used in Burundi in camps for refugees from Congo. Others will soon go to Lebanon for Syrian refugees. If librarians can make a difference, it’s here. You can support Libraries without Borders and IdeadBox in all kinds of ways: http://www.ideas-box.org/en/support-us.html.

    IdeasBox

    IdeasBox unpacked

    Conclusion

    The questions about data management in libraries that I brought with me to the conference were only partly addressed, and actual practical answers and solutions were very rare. The management and mapping of heterogeneous and redundant types of data from all types of sources across all domains that libraries cover, in a flexible, efficient and system independent way apparently is not a mainstream topic yet. For things like that you have to attend Satellite Meetings. Legal issues, privacy, copyright, text and data mining, cloud based data sharing and management on the other hand are topics that were discussed. It turns out that attending an IFLA meeting is a good way to find out what is discussed, and more importantly what is NOT discussed, among librarians, library managers and vendors.

    The quality and content of the talks vary a lot. As always the value of informal contacts and meetings cannot be overrated. All in all, looking back I can say that my first IFLA has been a positive experience, not in the least because of the positive spirit and enthusiasm of all organizers, volunteers and delegates.

    (Special thanks to Beate Rusch for sharing IFLA experiences)

    Leave a reply