24 March 2020 - 10:00 to 12:00
Utrecht, The Netherlands


Because of the escalating health concerns relating to the spread of the Coronavirus (COVID-19) and the travel restrictions that have been announced across Europe, we have decided to cancel this SSHOC workshop. We regret the circumstances, but feel that a cancellation at this point in time is the only sensible option. We will re-evaluate possibilities to try again at a later date, but can of course not give any timelines yet.


Organised by SSHOC, the workshop "Linking Social Survey and Linguistic Infrastructures through EOSC" will be held in Utrecht on 24 March 2020 (10am-12pm CEST, registration from 9.30am), in co-location with the 3rd SSHOC Consortium meeting.

Workshop Description and Objectives

Survey Infrastructures systematically interview tens of thousands of individuals across Europe each year. Respondenta are selected at random from all walks of life, and the hour-long interviews provide a range of data which has value for researchers and subsequently policy makers.

While complex life histories or events may be coded into the structured taxonomies required for cutting-edge sociological research, a large proportion of the information conveyed in an interview is lost. A respondent's tone of voice, linguistic fluidity, and depth of vocabulary for example can provide insights about cognitive function, socio-economic status or verbal reasoning skills.

Making use of this lost data requires the integration of social survey and linguistic infrastructures. Such integration underpins the EOSC vision. As such, the basis for the work within SSHOC on analysing voice recorded interviews seeks to provide both a proof of concept and a framework for future research that explores this approach..

  • Tom Emery will present work conducted by the GGP ( on capturing audio data through existing survey software in online interviews, and will provide initial evaluations of data quality.
  • Henk van den Heuvel from the Oral History team ( will then describe the tools used for analysis of Oral History data which could be adapted for analysis of survey interviews. In particular he will address the so-called Transcription Chain, which is based automatic speech-to-text conversion. The resulting text can, after manual correction, be processed by NLP tools to obtain more insights into its linguistic structure, or for topic detection or text summarisation, amongst others.
  • In an interactive session participants will discuss potential applications for the tools as well as new avenues of scientific enquiry to be integrated into the next phase of work on voice recorded interviews and audio analysis.



The full agenda will be published soon



Utrecht University
Room: Sweelinckzaal 0.05
Drift 21, 3512 BR Utrecht
The Netherlands