27 July 2021

On May 20th 2021, SSHOC hosted a Roundtable of Experts for Data Citation, to stimulate discussion on data citation in the Social Sciences and Humanities (SSH). The session was led by CNRS and had around 30 participants, including invited speakers from UGOE, CLARIN, CNR/ISTI, the Turing Institute, the Observatory of Paris - PSL Research University, Vienna - RDA data citation WG, OpenAIRE and CODATA. 

This event followed the discussions that began during a joint event between SSHOC, FREYA, and EOSC-hub - “Realising the European Open Science Cloud” - in November. The session was focused on data citation, and different approaches and experiences related to data citation were discussed by speakers from SSHOC and beyond.

After an inventory of SSH citation practices, we began developing a prototype to implement what we called “FAIR SSH Data Citation’’. Based on that work, we drafted a first set of recommendations about data citation, adapted to the specific needs of SSH. For these discussions we invited experts from known organizations (e.g. RDA, OpenAire, CODATA, Turing Institute, Observatory of Paris) in order to hear their thoughts and feedback on the work of our task as well as data citation in the SSH in general.


“Data Citation in Context” 


The session began with a brief contextual presentation by Nicolas Larrousse (CNRS), who also presented the Recommendations for FAIR Data Citation in the SHS. Indeed, he pointed out the reality that while there have been many initiatives to standardize data citation practices, there exist many communities of practice that have not yet harmonized.

Following this introduction, Carlo Maria Zwolf (Observatory of Paris) presented the VAMDC, a world-renowned e-infrastructure for Astrophysics data. Even though VADMC provides a reliable mechanism to generate a citation (and therefore give credit to the author) it lacks the context of a citation, or put in another way, the intention behind a citation. For instance, it cannot evaluate the question: “was the citation made in a positive or negative way?’’

Understanding the intention behind a citation is crucial for scientific reasons. There is therefore a need to provide the community with the capacity to define the ‘role’ of cited data in a machine actionable way. This topic was presented during a Birds of a Feather (BoF) session during the 17th RDA plenary conference. The BoF then discussed what kind of annotation would be needed to express, in a machine-actionable way, the reason why A cites B.

The outcome is to create an RDA Interest Group to discuss this matter in more detail (e.g. granularity and curation of annotation).


“Citation Service Prototype"


During the second part of the session, Cesare Concordia presented the Citation Service Prototype developed in the framework of task 3.4. The main goals are to:

  • make SSH datasets citable; 
  • provide facilities for curation and semantic annotation of citations; 
  • visualise and exploit citations.

Particular attention was paid to the “Citation Metadata Viewer” in order to demonstrate the diversity of existing information regarding datasets grabbed from API, embedded metadata in landing pages, and citations extracted from the abstracts of the DH conference (organized by ADHO) amongst others. 

This focus explains why we need to standardize and curate information to make it machine actionable, which can be done via the API component of the Citation Service Prototype 

The prototype was well-received by the round table, with several helpful suggestions being made for its future development.


Question & Answers and general discussions


Following each presentation, the experts asked questions of the presenters and themselves, as well as sharing their experiences with data citation and their hopes for the future. These rich discussions gave rise the following points:

  • Nowadays, identifying datasets with a PID is quite common, but descriptive metadata is still a problem.
  • Data citation can happen at several levels, and that needs to be agreed upon before proceeding to avoid complications. 
  • Why are we citing? One of the most important reasons is for transparency, scrutiny, and the validation of one's scientific contribution. Doing so provides credit to the parties responsible for that data. A secondary effect is allowing data to be reused, and access should not be confused with citation. 
  • What is the goal of the citation? Is it quality assessment, recognition, or reference to the sources/reproducibility/interpretation analysis? These are not the same things, nor the same needs.
  • Are we citing in a positive or a negative way? This information is not generally available.
  • There is a need for a clear definition of what data in SSH is, as anything could be considered as data. 
  • Citations represent a particular moment in time, a snapshot.
  • Semantic relationships are important. 
  • Quality has nothing to do with findability. 
  • The citation viewer could be useful to help researchers cite a data set properly. 
  • There is a difficulty in assessing the quality of data by a machine, therefore there is a need for human curation.
  • The social role of a citation may be considered. For instance, “recognition with attribution” (for funders, colleagues etc.). 
  • Regarding standard recommendations for citations, a perfect standard doesn’t exist. 
  • A lot of recommendations exist, as many working groups have focused on Data Citation, yet we still find ourselves in a situation with communities of practice but no concrete standards.
  • Granularity of the citation is an open question; it suits some and doesn’t suit others.
  • The citation of dynamic data (e.g. from social networks) is a new challenge. 
  • There is a need for trusted infrastructures which provide different levels of services (e.g. OpenAire with
  • Citation can be a good base for building data papers.

Conclusion and Expected Outcomes


Coming away from these presentations and rich discussions, we as a community need to reflect on why we cite: is it to provide evidence, to foster reuse, to give access, to give credit, or all these things combined? 

It seems that the citation prototype developed in the context of the task is going in the right direction. In particular, the citation viewer was met with strong interest. 

All these remarks, suggestions and references will be used to:

  • improve the citation prototype;
  • prepare the webinar planned for December;
  • feed into deliverable 3.5 and more generally into task 3.4 activities.

A possible output of this round table is the creation of an RDA Interest Group following the Bird of a Feather session -”Rich Metadata for annotation of citations contexts and data-citations contexts”- during the 17th RDA plenary.