01 September 2021

The findability of data and other resources is crucially dependent on good citation practices. But what are good citation practices and how well does the scientific community know and apply them? The SSHOC Workshop on Data citation which was organised in June by the SSHOC partners from WP3, WP7 and WP6, addressed these questions. The workshop also presented the work done in the SSHOC Task 3.4 which is especially focused on how to make data citations actionable.


Go to VCR


Crucial, yet often given little attention

Data citation is a crucial skill in ensuring FAIR data and advancing Open Science, but current situation in the scientific community shows that efficient data citation is still a complex and sometimes poorly understood process. Therefore, providing easy-to-use recommendations for data citation in the SSH domains seems crucial if we are to advance in this respect. But there are, of course, different needs and practices concerning citation in the broad range of SSH domains. This is why the workshop aimed to present solutions for efficient data citation from three different perspectives including experience from the CLARIN Virtual Collection RegistryCzech LINDAT Center and French CoCoon Center. The speakers addressed the general considerations with regards to citing practices and touched upon the challenges linked to creating citations and possible solutions.


CLARIN Virtual Collection Registry

Dieter Van Uytvanck (CLARIN ERIC) introduced CLARIN Virtual Collection – a coherent set of links to digital objects (e.g., annotated text/video) that can be easily created, accessed and cited. These virtual collections are accessible through a service – the CLARIN Virtual Collection Registry – where it is possible to register and publish virtual collections. All the collections get a Persistent Identifier (DOI or handle) and are provided with metadata which allows citation. There exist several methods of citing a virtual collection in this CLARIN service, such as (1) using a cite button (this option provides the user with a BibTeX snippet that can easily be copy-pasted), (2) via the browser plugin Zotero, or (3) using the DOI look-up function in reference manager that have this function.


LINDAT Center Repository

Pavel Stranak (LINDAT) addressed the issues with citations of data sets in the Czech LINDAT Center. The repository follows the Force-11 data citation principles and it provides formatted citation in simple text (ready to copy and paste) and the option to create BibTeX snippets. Despite these user-friendly citing methods, certain issues with citations remain. One such problem is that the handle system used in LINDAT Repository (in contrast with the DOI system) does not support altmetrics, CrossRef and other services, which are widely used because they allow for the citations to be found and formatted more easily. Luckily, this issue could be resolved by improving and adding support for Citeproc JSON format – this would allow the repository system to work properly with reference management software (i.e. Zotero or Mendeley) and would also offer more citation styles.


CoCoon Center Repository

Not all resources are created equal and there are some specialities when citing oral resources. These were presented by Nicolas Larrousse (Huma-Num/CNRS) who introduced the French CoCoon Center and the different types of data in their repository, such as recordings (audio and video) and associated annotations (transcriptions, translations and measures). CoCoon repository uses different types of PIDs (DOI, ARK and PURL) for different purposes like OAI-PMH identifiers, long term preservation etc. The user can cite the data in a variety of citation styles (APA, MLA, BibTeX and others). A very interesting and useful method of citing offered in the repository, is the citing of the specific part of the recording. This is done by using the specifications from the Media Fragments URI which is a recommendation from the World Wide Web Consortium (W3C). In practice, the researcher citing a part of the recording just enters the timestamp (e.g., #t=4,9) directly after the DOI, and the link automatically redirects to the exact part of the recording (e.g., the 4th second of the cited recording).

The second part of the workshop was dedicated to hands-on exercises allowing participants to observe and discuss some consequences of using good and bad citation protocols on the examples from Zenodo, LINDAT/CLARIAH-CZ and


Eager for the details?

Watch the webinar recording or scroll through the presentation slides: CLARIN Virtual Collection Registry, LINDAT Center Repository, CoCoon Center Repository. And if you are interested for more, watch out for our follow-up webinar on the topic of data citation in November.


Workshop Video


Virtual Collection Registry