23 March 2021
Do you work with social media data? Are you interested in research about online communication or about users’ behavior and opinions?

While usage of social media data for research across academic disciplines is growing, a lot of questions are arising about research methods, and potential standards, e.g., representativeness and reproducibility.

Furthermore, researchers are concerned about ethics and data protection. These concerns, questions, and possible solutions were discussed during an online bootcamp session that took place on the 8th and 11th February 2021.


Train-the-Trainer RDM Bootcamp 

About 90 people attended the virtual SSHOC Train-the-trainer bootcamp, organized by project partners DANS, LIBER, GESIS, and DARIAH-EU to learn about Research Data Management.

After three short lectures participants received targeted training on the topics as costs of managing research data, GDPR, and ethics, and third-party data, followed by homework assignments and an interactive session about didactics.


Key Challenges for Data Protection and Ethics 

Among others, the session  on “General Data Protection Regulation and ethical issues with social media data” generated significant interest. The session was held by two GESIS researchers: Katrin Weller and Oliver Watteler. These experts presented a general overview of the concept of personal data and introduced some of the key challenges related to questions of research ethics with a focus on data protection concerns in the area of social media as research data. 

According to this,  key challenges for data protection and ethics in social media research with regards to the sensitive data, e.g., political orientation, protest movements, online dating, health information on personal health materials, “sharenting” (sharing parenting photos/information), etc. are:

  1. Usage of personal data/usage of sensitive data
  2. Missing consent from the social media users (because they are not aware of the usage of their data for research).
  3. Missing concept for public data and its usage.
  4. Algorithms and computational research methods add novel challenges, e.g., usage of the same usernames, IDs, passwords.
  5. Missing guidance, standards, best practices.

Ethical Perspective on the Research Process

In the following break-out room session trainers dived into providing insights for attendees about critical questions when working with this new data establishing a link between the GDPR, information security and data protection as a part of the person’s  right to privacy. 

The authors introduced the Legal framework in the European Union: Charter of Fundamental Rights of the EU (Art. 8) and GDPR, touched upon topics of national and sub-national data protection acts and specialized laws. Organizational and practical advice was given for different steps in the research process. Those included guiding questions from an ethical perspective such as: 

  • Will the project collect personal data?
  • What is the legal basis for data processing?
  • Who is responsible for data processing in the research project?
  • Who has access to the research data?
  • What type of personal data is processed?
  • What are special categories of personal data?
  • Does informed consent exist from the research participants aka the data subjects?
  • Have you tried to get in touch with the research participants aka the data subjects?
  • Can the data be anonymized?

For the study design and data collection process, a lifecycle of a project was introduced. It included questions to be considered at each step of the project and proactive examples. The project phases covered were ‘plan and design’, ‘collection of data’, ‘interpretation and analyses’, ‘management and preservation’, ‘release and publishing’, ‘discovery’, and ‘reuse’.


Assignments on the Recent Social Media Research

After the theoretical part of the bootcamp participants received course homework assignments to be delivered in working groups. Participants have chosen to work on the case studies within teams of 2-3 people. Each team selected one of the suggested case studies taken from recent social media research or have chosen the personal scenario connected to their work. The case studies included topics on:

  1. Collecting data from vulnerable groups - Sensitive information and interacting with user groups.
  2. Automated analyses and inferences - Ethical responsibilities in algorithmic inferences.
  3. Data releases or “The data is already public” - The “Tastes, Ties, and Time” Dataset and the “OK Cupid Dataset”.

All groups received a reading list with obligatory and suggested literature references for each case study as well as a general reading list with recommendations on research ethics, data protection, and RDM, additional useful references.

For the preparation of the homework assignments, participants have been given access to a Slack group created before the bootcamp. In the Slack channels, attendees could connect and communicate with each other. Additional information was provided for each group not only in the Slack community but also via CryptShare Web App. A coordinator from the bootcamp’s organizational team supported the bootcamp participants during the two days of homework assignments.


Case Studies

During the second bootcamp day, each group briefly presented their case studies. The participants from the first case study focused mainly on the study design and data collection process. They reflected on the role of social media users in the research process and placed a specific focus on vulnerable groups. The examples were chosen from the medical domain to illustrate challenges with the vulnerable groups.

The second case study had a focus on the data analysis perspective, where often algorithmic approaches are trained for automated analyses. The participants reflected on gender detection algorithms that are often trained on image data and criticized the fact that the algorithms do not perform equally for all cases. 

The participants of the third case study focused on the data sharing perspective. They analysed two different dataset examples: The Tastes, Ties, and Time dataset that contains Facebook data from university students released as anonymized data in 2008. And a dataset collected from the dating platform OK Cupid and publicly released in 2016.

At the end of the second bootcamp session, attendees and trainers had an open discussion about the problems that were detected in the presented studies and concluded on the need for guidance, standards, and a code of conduct for researchers using social media data.




The blog post was written by Veronika Keck, GESIS- Leibniz Institute for the Social Science, and edited by Ellen Leenarts, DANS and Tanya Yankelevich, LIBER.

SSH GDPR Code of Conduct