The compilation of the [MCSQ]: Multilingual Corpus of Survey Questionnaires

Date:

09 July 2021 - 15:00 to 16:30

Location:

Online

Work Package 4 team working on the preparation of the tools for the use of Computer Assisted Translation (UPF, UVT-EVS, CentERdata, CLARIN/CUNI) is presenting a poster as a part of a session about innovative methods for different stages of multinational survey research - from invitations to participate, to the design and translation of questionnaires, to the analysis of survey results at the European Survey Research Association (ESRA) conference 2021.

Description

This presentation describes the design and compilation of the Multilingual Corpus of Survey Questionnaires (MCSQ), the first publicly available corpus of international survey questionnaires developed under the SSHOC project. The corpus was compiled from questionnaires from the European Social Survey (ESS), the European Values Study (EVS), and the Survey of Health, Ageing and Retirement in Europe (SHARE) in the (British) English source language and their translations into eight languages, Catalan, Czech, French, German, Norwegian Bokmål, Portuguese, Spanish and Russian, as well as 29 language varieties (e.g. Swiss French). As a case study, the MCSQ was used to extract information and exemplify two types of problematic translations in survey questionnaires: The first type relates to the choice of terms in the source document, which have resulted in poor translations. Specifically, these choices are related to idioms and fixed expressions. The second type relates to cases where the semantic variation of translation choices exceeds the scope allowed to maintain the psychometric properties across languages, concretely in the intensity attached to verbal labels of response scales. The presentation aims to demonstrate how the corpus methodology can be used to analyse past translation outcomes and to improve the questionnaire translation methodology.

How to register

The 9th ESRA 2021 conference will be held online. Additional information and registration are on the ESRA webpage.

Speakers

Lidun Hareide is a researcher at the Moreforsking Research Institute (Ålesund, Norway) working on the H2020 SSOHC project. She holds a doctorate in translation, corpus linguistics, and comparative grammar, (English, Spanish & Norwegian) from the University of Bergen. Lidun compiled the Norwegian-Spanish Parallel Corpus in cooperation with Knut Hofland, and lectures and does research on translation studies, language didactics, and corpus linguistics.

Diana Zavala-Rojas is a Senior Research Fellow at the Faculty of Political and Social Sciences at the Universitat Pompeu Fabra (Barcelona, Spain). She is a specialist in multinational, multiregional, and multilingual comparative surveys holding a doctorate in comparative survey methodology. She is a member of the Core Scientific Team (CST) of the European Social Survey (ESS) collaborating on questionnaire design, translation, measurement quality, and cross-national measurement equivalence. In the SSHOC, she is work package leader of the WP4: Innovations in data production.

Danielly Sorato is a master in computer science specialized in Natural Language Processing. She is a researcher in the Social Sciences & Humanities Open Cloud (SSHOC) at the Research and Expertise Centre for Survey Methodology (RECSM) of Universitat Pompeu Fabra (Barcelona, Spain).

Knut Hofland is formerly Senior Computational Linguist at the University of Bergen.

Poster:

SSHOC Poster MSCQ

The compilation of the [MCSQ]: Multilingual Corpus of Survey Questionnaires

Description

How to register

Speakers

News

SSHOC Announces New 2026 Leadership

SSHOC 2025 Updates

Science Clusters Position statement on operational commitment to EOSC and Open Research

SSHOC, the SSH Open Science Cluster has a New Chair and Vice-Chair in 2024

OSCARS project funded to foster the uptake of Open Science in Europe