01 July 2020


Applying FAIR principles to research data is crucial for assuring effective and efficient data re-use which is becoming a prerequisite in contemporary research endeavors. However, despite the catchy acronym, ensuring FAIRness of research data can be tricky and tedious, discouraging rearchers from publishing their data under these constraints. This represents a great loss for the scientific community since the already acquired data will not be available for replication and reuse in the future.

With the goal of ensuring data FAIRness as the new standard, the webinar Tools and Resources for FAIR Data, organised by SSHOC and delivered by Anca Vlad (UK Data Service) on May 18, 2020, offered an overview of FAIR principles and available tools and resources that can help researchers easily prepare their data in adherence with FAIR guidelines. The webinar also showcased the QAMyData tool which enables researchers across scientific domains to check and improve the quality of their datasets.

The four principles of FAIR

F: To ensure that data is FINDABLE, check whether it has a globally unique and persistent digital identifier (such as a DOI in the citation). This  video explains why citing data appropriately using such identifiers is so important. If data is published with an accredited archive or data repository, it will be assigned this unique identifier before publishing. Check whether the archive or repository of your choice is CoreTrustSeal accredited, since this guarantees their trustworthiness. If you are unsure where you can publish your research outputs (data, code or other), check out the platform which contains over 2000 data repositories you can choose from.

A: To be ACCESSIBLE, (meta)data should be retrievable using an open, universally applicable and standardized protocol. Data does not need to be open to ensure its accessibility; what is essential is that the process of obtaining data is made clear and described in plain language for each dataset. Depending on the content of the data, this protocol would provide information about the access level, terms and conditions applicable. There is not much variation in terms of access restrictions across data providers (generally the options are open, safeguarded, or restricted), however the process and conditions for obtaining the data can be different depending on the archive/repository/data publisher.

I: To be INTEROPERABLE, (meta)data needs to be in specific formats, and use community agreed standards and vocabularies/ontologies, such as the DDI Schema. Specific formats should be respected because digital data is very software dependent and as such prone to corruption/loss due to the obsolescence of software/hardware. The EMM Survey Registry is a good example here as it uses a multitude of machine readable metadata fields to describe its datasets (such as scope, region(s), start date, end date of survey, target population, etc.).

R: And finally, to be REUSABLE, the quality of supporting documentation, metadata and of course data itself, as well as the applicable license, are important. The license should allow the data to be available to the widest possible audience with the widest possible range of uses. This webinar looked in particular at the QAMyData tool that can help researchers check and increase the quality of quantitative data, and, therefore, its reusability.

FAIR-boosting tools and resources

The FAIR concept pertains to the whole data lifecycle. In order to simplify editing at the end, it is important to keep the requirements of FAIR in mind from study design onwards. In this process of ensuring FAIRness of data, certain tools and resources can help overcome the challenges.

  1. SSH Open Marketplace can be used: (i) to find relevant software and services, (ii) to check whether your project aligns with standards and open science principles, (iii) to find links to tutorials and other training materials and (iv) as a forum where peers can comment on tools/software.
  2. The CESSDA Data Management Guide was designed by European experts to help social science researchers make their research data FAIR. It supports the entire research data lifecycle from planning, organizing, documenting, processing, storing and protecting your data to sharing and publishing.
  3. DMP Online is provided by the Digital Curation Centre (DCC). It helps with creation, review, and sharing of data management plans that meet institutional and funder requirements. Writing a data management plan (or DMP) at the beginning of your project is an important step that can help throughout the project with various aspects: collection, storage, documentation, formatting, ethics, copyright, transfer, de-identification/anonymization and sharing. A DMP should be a living document, updated when needed and consulted with throughout the project.
  4. Go FAIR starter kit is a set of resources addressing research data management and ways to offer an open and inclusive ecosystem for individuals, institutions and organizations working together. The kit can be used to assess the ‘FAIRness’ of a dataset as well as determine how to enhance FAIRness, if needed. The tool has been designed predominantly for data librarians and IT staff, but could be used by software engineers developing FAIR data tools and services, and researchers provided they have assistance from research support staff.
  5. UK Data Service provides data management guidance and training events. Our experts have recently published the second edition of a handbook on Managing and Sharing Research data, A guide to Good Practice; this offers a comprehensive introduction to key topics in research data management applicable to the whole research lifecycle.
  6. QAMyData is a free open source tool that provides a ´health check´ for numeric data. The tool was designed to be easy to use by employing automated methods to detect and report some of the most common problems in survey data: missingness, duplication, outliers and direct identifiers (information that can point to one person in particular, such as name, address etc.). Outliers and direct identifiers are of particular concern when sharing data as researchers need to uphold confidentiality agreements. The tool offers a number of configurable tests, categorized by type: file, metadata, data integrity, and identifiers. It can run popular file formats, including SPSS, Stata, SAS and CSV.

View the webinar slide presentation

Watch a recording of the webinar

Contact Anca Vlad

If you’re part of the Archaeology or Heritage Science, you might also be interested in our webinar on Use and Re-use of Scientific Data in Archaeology and Heritage.