Are you a librarian with responsibility for data management? If so, you may gain valuable insights from a recent SSHOC investigation into research data and metadata interoperability problems among project partners.
The report, Mapping (meta)data Interoperability Problems, Building the SSHOC Interoperability Hub, is based both on desk research and individual interviews with representatives of four domains (social sciences, art and humanities, language science, heritage science) as well as with research infrastructures.
It aims to inform the building of an Interoperability Hub for the SSH, consisting at least partly of a portal with usable conversion services or advice and links to those. Findings from the report with relevance for libraries include the following:
Dublin Core As A Favoured Metadata Standard
- Of the 19 metadata standards mentioned by interviewees, Dublin Core was the only standard mentioned by representatives of all domains. It was also named as one of the two (in some cases three) most important standards by seven informants. Its simplicity was highly praised. “While the expressiveness of most standards is viewed as necessary and useful, their high complexity, difficulty of use or a high workload involved is clearly perceived as the downside.”
- Two general metadata standards — Dublin Core and DataCite — were recommended to be used by all domains. “Dublin Core and/or DataCite are especially important because they can provide a bridge between the diverse domains, supporting metadata interoperability as well as data discovery - the F and I of the FAIR Principles. The domain-specific standards provide the refinement that is necessary to satisfy the needs within each domain (the R of the FAIR Principles).”
- Specific metadata standards were also named for the domains covered by the report (p. 25):
- Social Sciences / CESSDA, ESS, SHARE - DDI Codebook, DDI Lifecycle
- Heritage Sciences / E-RIHS - CIDOC-CRM (and its extensions, especially PEM)
- Language Sciences / CLARIN - CMDI
- Arts and Humanities / DARIAH - CIDOC-CRM (and its extensions), EDM, TEI (teiHeader)
The pros and cons of all standards, as judged by interviewees, can be found on p. 15 of the report.
Problems With Metadata Mapping
- The mapping of very rich library-based metadata formats with very complex structures and recursion in metadata elements to CMDI can be problematic as it can force users to have a determined approach to describe things, the report says.
- Several interviewees highlighted the problems that can occur with older metadata records not using modern standards. Examples include entries from old archives in legacy formats which need to be converted or mapped to newer formats, and heterogeneous date formats which need to be unified.
- Locally made assumptions may not be obvious to a global user community. For example, the language of a material may often not be explicitly stated when it is the only expected language of a certain collection.
Recommended Data Formats
- Open formats should be used for media such as images, videos and audio, rather than proprietary formats. This lowers the risk of a format only being supported by a limited number of applications. A specific list of data formats per media type is given in Table 7 (p. 26).
- The use of the SKOS specification for expressing vocabularies is recommended. It is a model and an RDF vocabulary for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists and other types of controlled vocabularies (SKOS 2009).
Wider Interoperability Challenges
Beyond outlining interoperability challenges as they relate to SSHOC, the report also pointed out that the need to make systems interoperable is increasing all the time, driven in part by trends such as Open Science and the desire of researchers to have direct access to data.
“As we seek to defragment our scientific practices and embrace common research infrastructures across multiple disciplines, the range of actors and systems that need to interoperate expands rapidly. This is particularly applicable at a time when archives and other data repositories are moving from mediating access to data (from discovery to download) to mediating use of the data: providing specialist analytical environments that can support the size, complexity and sensitivity of the data.” (p. 20)
Legal issues, such as copyright, may also make it difficult for data to be made interoperable across national boundaries, the report said.
Overall, it was noted that fewer interoperability problems were reported than had been expected — perhaps because organisations are already acting on such issues.
Investigation into this topic will continue, with the next objective being to chart solutions to metadata and data interoperability problems.
To follow the work of SSHOC in this area, please sign up to our newsletter and training network