Date: 
14 April 2020

 

Very often researchers acquire hours and hours of valuable spoken audio data as they conduct their work, but have very little time to process and analyse it. To address this challenge, the Oral History & Technology research group has developed a user-friendly transcription portal that helps scientists deal with spoken audio data in an efficient way.

In the “CLARIN Hands-on Tutorial on Transcribing Interview Data” webinar organized by SSHOC and CLARIN ERIC on the 3rd March 2020, Henk van den Heuvel (CLARIN-K centre ACE at Radboud University, Nijmegen) and Christoph Draxler (BAS CLARIN-D centre at Ludwig Maximilian University, Munich), discussed the challenges associated with transcribing spoken data, and demonstrated the workflow of machine-assisted interview transcription process. To start the general discussion, Henk van den Heuvel went over the cross-disciplinary nature of working with interview data, followed by Christoph Draxler showcasing the transcription portal itself. While the overview nicely set the context, the demonstration was detailed just enough for anyone interested in trying out the portal to immediately get their feet wet using their own data. From the amount of questions during the demonstration of the portal and the discussion afterwards, the urgency and need for such a tool became ever clear.

THE BEGINNINGS OF ‘THE-ONE-PLACE-TO-DO-ALMOST-ALL’ WITH INTERVIEW DATA

The initiative for a transcription portal, supported by CLARIN ERIC, stems from the needs of oral historians. They use interviews as their central research instrument. As such, they would greatly benefit from a manageable and easy-to-use workflow that would enable them to process high amounts of spoken audio data faster. Many other disciplines working with this type of data (social economists, law scholars, pharma-medical scientists, etc.) use similar spoken audio data processing methods. The value of a transcription chain would, therefore, extend beyond a single scientific domain. The challenge was taken up by a group of experts from CLARIN and several other organizations, who developed a helpful transcription chain. The chain determines various steps that bring researchers from the original recorded audio material to processed and annotated digital data. While the original focus of the work was Automatic Speech Recognition (ASR), the taskforce quickly recognized the need for running additional NLP tools over interview data to provide important added value to the final output. The OH-Portal transcription portal currently includes both access to external ASR services and a built-in transcription editor. These allow a smooth execution of the transcription chain and give a valuable final output for the researcher.

AUTOMATIC SPEECH TRANSCRIPTION AND BEYOND

A high-quality orthographic transcript is the basis for all types of analyses of spoken language data. However, transcribing speech manually is a time-consuming and frustrating task. This is where ASR comes into play through the OH-portal, which enables researchers to access external providers of ASR services for a number of languages (English, German, Dutch, etc.). The portal has additional functions, such as an editor for manual correction of transcriptions, a tool for automatic word-time alignment and an interface for an examination of phonetic details. Files can be exported in various formats which comes in handy when undertaking further analyses with different NLP tools.

During the webinar, Christoph Draxler addressed the importance of understanding the researchers’ needs in terms of transcription quality and explained the pros and cons of different types of transcripts; for example, why having a very detailed transcript or throwing away the audio signal after the transcription is not always the best decision. The process of creating the transcript in the OH-Portal, and the functionalities of different steps and options in the portal were also discussed, followed by tips and tricks on how to effectively use the portal and obtain quality outputs. During his interactive demonstration, Christoph pointed out what to keep in mind when collecting the material in order to ensure good input and output even when professional recording equipment is unavailable, and how to go about some technical limitations of the OH-portal. An asset of the portal is its security: the user can check the privacy policy of the ASR providers and select a service accordingly (some services reserve the right to perform additional analysis of the audio files or to keep the files, others do not). The files uploaded to the portal are stored for a maximum of 24 hours by the Ludwig Maximilian University of Munich, where the portal is located, in order to allow the researchers some time to resume the activity if interrupted midway.

Overall, the webinar offered a condensed but accessible presentation of theoretical and practical aspects of working with interview data. It is clear from the presentations that the OH-portal offers easy-to-use services that can ensure high-quality outputs with limited human intervention. Manual data manipulation remains crucial, but by learning how to use the OH-portal, the process of transcribing can become much quicker and considerably less tedious.

If you would like to know more, we invite you to: