22 December 2021

What does General Data Protection Regulation (GDPR) mean in practice for Social Sciences and Humanities researchers? How can we ensure that appropriate measures are taken when dealing with personal data in the day-to-day research activities? These are some of the main questions that were discussed during the hybrid workshop “Data Protection in research practice: The GDPR and the ELDAH Consent Form Wizard” organised by SSHOC and the DARIAH ELDAH Working Group on the 13th of October 2021.



Koraljka Kuzman Šlogar (Institute of Ethnology and Folklore Research / DARIAH-HR) introduced the workshop with a presentation of the Ethics and Legality in the Digital Arts and Humanities (ELDAH) Working Group. Created in 2017 to work on ethical and legal issues, this DARIAH Working Group gathers 40+ members from 18 countries. Most of them are researchers and cultural heritage experts, and a few legal experts complete this team. In addition, ELDAH works in close collaboration with other groups such as the CLARIN Legal and Ethical Issues Committee or CESSDA. The scope of expertise of ELDAH revolves around intellectual property rights and licensing, data protection, privacy, research ethics and scholarly conduct. Besides running regular workshops for scholars, ELDAH also produces recommendations, training and information materials, and has created the Consent Form Wizard that will be demonstrated and tested during the workshop.


ELDAH is open and you can become a member of this DARIAH WG.



Introduction to data protection and the GDPR

The second part of the workshop consisted of an introduction to data protection and the General Data Protection Regulation (GDPR) given by Walter Scholger (University of Graz / CLARIAH-AT / CLIC) and Pawel Kamocki (IDS Mannheim / CLARIN-D / CLIC). Following a short formal and legal introduction, Walter and Pawel highlighted some conceptual distinctions as well as basic concepts used in the GDPR.

  • Data protection vs. privacy: GDPR is about personal data protection, and even if the broader notion of ‘privacy’ is closely related, these concepts should be distinguished. While privacy encompasses a number of legal, ethical and technical concerns, the GDPR focuses specifically on the processing of personal data.
  • Personal data can be defined as any information about an identified or identifiable natural person whatever its form, digital or analogue, audio or text, etc. For instance, even without revealing the name, describing someone as a translator living in Berlin who works with rare languages, could be sufficient to identify this person.
  • Data processing: anything that can be done with data - storage, erasure, anonymization… - is a processing act.
  • The 3 main roles in the data workflow are:
    • Data subject: the identified or identifiable natural person whose data is being processed.
    • Data controller: the person who determines the purpose and means of data collection, natural or legal person.
    • Data processor: processes data on behalf of the controller. This role is not always filled.
  • In order to circumvent possible identification, anonymisation or pseudonymisation are possible:
    • Anonymisation is the result of breaking the links between person and data. Once this is done, data are no longer personal data and can be processed freely.
    • Pseudonymisation consists in replacing identifying elements in a dataset by identifiers. Pseudonymised data are still considered personal data, because the link between person and data still exists (in a secure environment), but this procedure is an appropriate safeguard in research contexts.    
  • Data processing in research and the need for appropriate safeguards. Certain privileges and exceptions to GDPR are possible and Article 89 is the flagship of academic exceptions: even if processing personal data for scientific or historical research purposes is allowed, these still require appropriate safeguards. Appropriateness of safeguards depends on the specific setting of the research done and can sometimes be challenging.

Data Protection principles in the GDPR

After the introduction, Pawel and Walter entered into the details of the principles that apply to data processing as they are described in Article 5 of the GDPR. The first of these principles is Lawfulness and means that in order to comply with the GDPR, data processing has to have a legal basis (listed in Art. 6). The most commonly used in research are consent and legitimate interest or public interest.

  • Consent is a declaration from the data subject by which they declare their consent. It must be freely given, specific, informed, and unambiguous. This is where the ELDAH Consent From Wizard can help researchers.
  • Legitimate interest is the balance of interest between the research and its community's interest, and that of the data subject. As public interest is not really used for SSH research (at least in the domains of people present during the workshop), it has not been detailed.


Fairness and Transparency are other important data protection principles. They ensure that data has to be processed in good faith and that any information about processing data must be freely accessible and easy to understand.

Another principle is Purpose limitation which states that personal data can only be processed for a clearly defined purpose. There is an exception for research with appropriate safeguards here, because if it is for research and/or archiving processes, data that were legitimately collected for other purposes can be reused.

Data minimisation comes next, covering the idea that data collection and processing are limited to what is necessary for purposes for which data are processed.

According to the Accuracy principle (or data quality), if data are not accurate, the data subject needs to have the possibility to rectify the data.

Storage limitation is an important principle in the sense that personal data can only be stored for a limited period of time. However, in the context of research and archiving, with appropriate safeguards, personal data can be stored for a longer period of time, if the purpose (for example long-term archiving of historical records or accountability of research data) justifies it. Finally, Integrity and Confidentiality (or security requirements for data storing), as well as Accountability (record of data processing activities) were presented by Walter and Pawel to close this presentation of the data protection principles.

Pawel then highlighted the rights of data subjects in the GDPR, detailing which rights have to be safeguarded by the data controller and which exceptions could be considered for archiving, research and statistical purposes.


To know more about the rights of data subjects, you can have a look at the CLARIN Café on the Rights of Data Subjects in Language Resources organised in March 2021, by the CLARIN legal & ethical issues committee.


Finally, mandatory information to be provided to data subjects was summarised by Pawel and Walter to better explain the background of the Consent Wizard Form and before closing the first part of the workshop. 

Information to be provided to data subjects - slide presented during the workshop


Group Discussion

The discussion, organised in breakout rooms online and with one group of participants onsite before gathering again in plenary, was an opportunity to highlight the following topics.  

  • Obtaining large-scale consent from social media data subjects is in practice almost impossible. Some pointers for participants interested in this topic were shared, esp. the CLARIN Café on Text and Data Mining Exceptions in the Directive on Copyright in the Digital Single Market.   
  • It is not always easy to understand where personal data starts - as an institutional address might for example be personal - and how to apply data minimization while working on anonymization or pseudonymisation.  
  • Participants discussed the limits of the many different scenarios for consent that data subjects have to give and how the overflow of GDPR consent forms is sometimes disqualifying GDPR protection efforts.
  • Participants also shared their experience with obtaining consent from data subjects (for example participants in a survey) “in the field” which is not always easy. This may be due to individuals' fear of giving their signature, or to the complexity of identifying and contacting the heirs of deceased persons.
  • Specificities of oral history and audio recordings were also discussed. As voice is deeply personal and can be easily recognised. Therefore, researchers and archivists need to pay particular attention to data subjects’ consent.
  • The interplay and overlap between data protection and other legal frameworks was also discussed: intellectual property enters sometimes into play; ethical dimensions are also always present (for example, it is ethical to re-inform people if data are used for another research purpose than the initial one) even if ethics committees in SSH (in comparison to the medical sector) are rare.
  • Finally, an interesting distinction between local flavours of GDPR implementations in Europe was highlighted by the participants: while Northern-European countries seem to be more focused on public or legitimate interest as a basis for exceptions to the GDPR, Central-European countries concentrate more on consent.

Consent Form Wizard

Because consent definition and collection is a central part of the GDPR application for researchers, ELDAH developed a Consent Form Wizard based on the most common scenarios encountered by SSH researchers to support and ease the obtention of valid consent for data processing. Vanessa Hannesschläger (Austrian National Library / CLARIAH-AT / CLIC), who developed the tool together with Pawel Kamocki and Norbert Czirjak (OEAW) presented the tool to the participants.

DARIAH ELDAH Consent Form Wizard homepage - screenshot

While presenting the CWF, Vannessa highlighted the following important points.

  • The CWF has been designed and is provided by ELDAH to help researchers but does not replace a lawyer, especially when researchers are dealing with complex consent situations.
  • Three scenarios are currently covered by the CFW: 1) gathering data about living people for research purposes, 2) communicating through mailing lists or other digital comm/media, or 3) gathering data/consent from participants as a host of an academic event.
  • Multilingualism is important, especially for research subjects. The CWF is currently available in English, German, Italian and Croatian. ELDAH is happy to accept help with other translations and programmers have implemented an easy-to-use interface for interested translators. Do not hesitate to get in touch:

After Vanessa’s presentation, participants were able to test the tool, in small groups, based on their own use-cases.


Closing session & take-away

Thanks to the testing session of the Consent Wizard Form, a couple of bugs were identified, and some suggestions to improve the website were made. For example, participants suggested changing the ‘scientific research’ wording into ‘research’ to be more inclusive towards GLAM in-house researchers. In addition, the recurring question of how to legally locate global institutions like UNESCO was brought up, and because international organizations in the GDPR are treated like third countries, this option should be added to the CWF. Beyond these specific questions, the last plenary part of the workshop was also an opportunity to sum up the main take-aways and share feedback between both participants and presenters.


If you would like to know more, we invite you to watch the workshop recording and check the presentation slides.

Written by Laure Barbot, with contributions from Edward Gray, Erzsébet Tóth-Czifra, Walter Scholger and Kristina Pahor de Maiti.