SSHOC Speech-to-text Workshop - Linking Social Survey and Linguistic Infrastructures through speech interviews

Date:

16 April 2021 - 11:00 to 12:30

Location:

Online

Survey Infrastructures systematically interview tens of thousands of individuals across Europe each year. Respondents are selected at random from all walks of life, and the hour-long interviews provide a range of data which has value for researchers and subsequently policy makers.

While complex life histories or events may be coded into the structured taxonomies required for cutting-edge sociological research, a large proportion of the information conveyed in an interview is lost. A respondent's tone of voice, linguistic fluidity, and depth of vocabulary for example can provide insights about cognitive function, socio-economic status or verbal reasoning skills.

Making use of this lost data requires the integration of social survey and linguistic infrastructures. Such integration underpins the EOSC vision. As such, the basis for the work within SSHOC on analysing voice recorded interviews seeks to provide both a proof of concept and a framework for future research that explores this approach.

Agenda

Judith Koops from the Generations and Gender Programme, will provide an overview of the project. She will focus on the advantages of collaboration between the different infrastructures and new insights generated over the course of the project.
Joris Mulder from the LISS panel will demonstrate the tools used for collecting audio data through existing survey software in online interviews. He will provide an evaluation of the challenges encountered in this project as well as the way these issues were solved.
Henk van den Heuvel from the Speech and Tech, and in SSHOC associated to CLARIN ERIC, will then describe the tools used for analysis of Oral History data which could be adapted for analysis of survey interviews. In particular he will address the so-called Transcription Chain, which is based on automatic speech-to-text conversion. The resulting text can, after manual correction, be processed by NLP tools to obtain more insights into its linguistic structure, or for topic detection or text summarisation, amongst others.
Giovanni Borghesan from the European Values Study will lead the interactive session where participants will discuss potential applications for the tools, the use of the data for new avenues of scientific research, as well as ways to improve the collection, processing and archiving of audio data.

After the presentations, participants will have time for discussion with the presenters in separate breakout rooms.

Registration closed

Event presentation

Speakers

Giovanni Borghesan is a Junior Researcher employed at Tilburg University. He works for the European Values Study and cooperates with SSHOC in WP3 and WP4. His field of expertise is Survey Research.

Dr Judith Koops is a researcher at NIDI. Her research focuses on inequalities in family demography. She works for the Generations and Gender Programme (GGP). GGP is a social science research infrastructure that provides open access micro- and macro-level data with the aim to improve knowledge about family life and life course trajectories in low-fertility countries. In SSHOC she leads tasks 4.4 (Voice recorded interviews and audio analysis) and 4.5 (Social policy APIs for social surveys).

Joris Mulder is a researcher at CentERdata, Tilburg University. He is the Coordinator of the Dutch LISS panel and responsible for the quality of the panel (e.g. representativeness, data quality and survey response) and promoting the use and awareness of the LISS panel. With a background in both computer science and social sciences Joris focuses on managing innovative research projects, such as combining online survey research methods with sensor based data collection and data science techniques.

Dr Henk van den Heuvel is director of the Centre for Language and Speech Technolog (CLST) at Radboud University. CLST is participating in SSHOC as a linked third party of CLARIN ERIC. He has been involved in the collection, compilation and validation of many spoken and written language resources at the national and international level. In various projects automatic speech recognisers were built for Dutch and Frisian in the domains of audio mining, language learning, and healthcare. Currently he is involved in CLARIAH PLUS tasks for improving speech recognition of dialect speech and data curation. In SSHOC he works on tasks related to integrating language and speech technology into survey tools, and in sharing sensitive data through remote access.

SSHOC Speech-to-text Workshop - Linking Social Survey and Linguistic Infrastructures through speech interviews

Agenda

Registration closed

Event presentation

Speakers

News

SSHOC 2025 Updates

Science Clusters Position statement on operational commitment to EOSC and Open Research

SSHOC, the SSH Open Science Cluster has a New Chair and Vice-Chair in 2024

OSCARS project funded to foster the uptake of Open Science in Europe

Strengthening Cross-Cluster Collaboration: Highlights from the 2nd SSH Open Cluster Assembly