Submission date: 
19 August 2021

This report reviews potential mechanisms to integrate a Translation Memory (TM) solution into the Translation Management Tool (TMT), allowing large international surveys to improve their translation processes and deliver quicker and better quality translations. This will include further research into integration with other external CAT tools and the development of a stand-alone Translation Memory tool to which the TMT will be connected, followed by an evaluation and implementation of various TM matching algorithms and sharing of TMT data with partners via the TM solution.

 

SSHOC project Task 4.3 applies computer assisted translation tools in social sciences. Since several large international social science survey projects use the Translation Management Tool (TMT) to support their translation processes, an extended review was done on how various open source CAT tools can be linked to the TMT. This report describes the possibilities of integrating open source CAT tools into the Translation Management tool.

A functionality commonly utilized in CAT tool is Translation Memory (TM), which refers to a variety of methods of processing and storing translated text in the form of ‘segments’ and representing them at a later time, when similar texts need to be translated. Such segments are text strings that are sentences or sentence-like strings, such as headers, labels, etc. Segments are connected in source-translation pairs, where the source is the original text and translation is the translated version of this text in a certain language. By storing many of such pairs a repository of historical translation data is created which can be used to aid translation processes.

However, given the highly specialized set of translations and processes, integrating these external CAT toolsand underlying TMs into TMT is not enough. The development and integration of a new open source TM solution, tailored to the field, seems a better approach for integration into the TMT translation environment. Based on this the team has defined the following steps:

Evaluation of possible integration of external CAT tools in the TMT; the team started to explore further possibilities to integrate external open-source CAT tools in the TMT will be explored with the connection achieved between the TMT and the LINDAT translation service and MyMemory developed under SSHOC deliverable 4.7 Code for data exchange between TMT and open-source CAT software serving as a guideline.

Development of a Translation Memory (TM) solution; initial creation of a stand-alone TM solution that will allow users to query it for translation suggestions based on input text and source and target translations as well as allow external Computer Aided Translation (CAT) tools to query it for translation suggestions and add new data via an API.

Connecting the TMT to the new TM solution and (possible) other CAT tools; extending already existing translation functionality within the Translation Management Tool (TMT) tool to work together with the newly created TM solution via its API and possible other CAT tools with the aim of providing translation memories to translators.

Research into pair matching and search methods; The team set up criteria to evaluate methods and algorithms for pair matching and search, including implementation and testing of several such methods, which will then be made available in the TM solution.

The sharing of TMT data with partners; using the developed TM solution and the TMT connection to export TMT data for sharing with survey partners.

Publication type: 
Milestone