10 May 2021

Idea of collaboration between research infrastructures and disciplines is central to EOSC. An example of such collaboration within the SSHOC project was showcased in the recent Speech-to-text workshop. It presented the ongoing work undertaken by a partnership of two social science infrastructures (European Values Study and Generations and Gender Programme), and a linguistic infrastructure (CLARIN). Workshop was led by Judith Koops (Generations and Gender Programme), who provided an overview of the project. The workshop further included presentations by Joris Mulder (LISS panel), who demonstrated the tools used for collecting audio data through existing survey software in online interviews, and Henk van den Heuvel (CLARIN ERIC), who described tools which can be used for analysis of survey interviews. Giovanni Borghesan (European Values Study) was the moderator of the event. 


Speech-to-text in the LISS Panel

In the innovative project that started this month, data was collected by the social science infrastructures while the humanities infrastructure was used to automate the processing and analysis of data. This approach enriches the potential of data and provides a wealth of new information for both sides. 


Benefits of collaboration

To achieve this in the project, audio recordings were collected in a social survey conducted in the LISS panel (Longitudinal Internet studies for the Social Sciences), a probability based online panel in the Netherlands, representative of the Dutch population. Speech-to-text technology allows the respondents to answer verbally rather than type the answers, using external software. Audio files are then manually checked and anonymised and are stored securely. 

When collecting audio data, one must consider technical issues, questions relating to privacy of the respondents and methodological implications. There are pros and cons to the approach; for example, more answers can be gathered, but there is inevitably some selective non-response. The collected data will be available in the LISS repository later this year. 


Secure workflow in the LISS Panel


Linguistic analysis


The CLARIN infrastructure offers a range of tools for processing and analysis of the collected audio record, such as emotional analysis. Audio records can be automatically transcripted using an open-source transcription chain, free to use within CLARIN. The project is testing and improving this feature for the Dutch language. Resulting text can be further analysed as qualitative data by e.g. applying tools for topical modelling, annotations, measuring statistical relations, producing world clouds, and even automatic summarization. 


Transcription chain


Discussion then continued in smaller groups in the breakout rooms that were moderated by the speakers. Participants approached the topic from the survey developer perspective, user perspective, and information extraction perspective, and discussed methodological and analytical insights. For the over 45 participants that joined the workshop, majority of which are researchers or member of research institutions, the breakout rooms were a valuable opportunity to discuss their experiences and needs.


Want to know more?

Watch the video recording of the presentations and take a look at the slides


To stay updated on SSHOC's latests activities, sign up for our newsletter, follow us on Twitter @SSHopenCloud or get in touch at