IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1264-1277. doi: 10.1109/TPAMI.2020.3021361. Epub 2022 Feb 3.
Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s, these tools search the TM and retrieve the TUs (s,t) whose source segments are more similar to s. The translator then chooses a TU and edit the target segment t to turn it into an adequate translation of s. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of t that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s and (s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches ( t). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used.
基于翻译记忆的计算机辅助翻译工具被广泛用于辅助专业翻译人员。翻译记忆(TM)由一组翻译单元(TU)组成,这些 TU 由源语言和目标语言的段对组成。对于新的源段 s 的翻译,这些工具会在 TM 中进行搜索,并检索源段与 s 更相似的 TUs(s,t)。然后,译员选择一个 TU,并编辑目标段 t,将其转换为 s 的适当翻译。模糊匹配修复(FMR)技术可用于自动修改需要编辑的 t 的部分。我们描述了一种与语言无关的 FMR 方法,该方法首先使用机器翻译,根据 s 和(s,t)生成一组候选模糊匹配修复段,然后通过估计它们的质量来选择最佳段。在三个不同的语言对上的评估表明,所选的候选段与生成的最佳(oracle)候选段非常接近,并且比机器翻译段和未修复的模糊匹配(t)更接近参考翻译。此外,在所有语言的数据混合上训练的单个质量估计模型在任何语言上的表现都很好。