Miñarro-Giménez Jose Antonio, Hellrich Johannes, Schulz Stefan
Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
Stud Health Technol Inform. 2015;210:597-601.
Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pairs of English words and their automated translations to German. Using a training set with single words extracted from SNOMED CT as input we obtained a list of 268 translation rules. The evaluation of these rules improved the translation of 60% of words compared to Google Translate and 55% of translated words that exactly match the right translations. On a subset of words where machine translation had failed, our method improves translation in 56% of cases, with 27% exactly matching the gold standard.
翻译像SNOMED CT这样庞大的医学术语既昂贵又耗时。我们提出了一种方法,该方法基于医学词汇与其翻译之间已知的相似性(由于它们共同的拉丁/希腊语起源)来获取单个单词的子串替换规则。字符翻译规则是从英语单词及其自动翻译成德语的词对中自动获取的。使用从SNOMED CT中提取的单个单词组成的训练集作为输入,我们获得了268条翻译规则列表。与谷歌翻译相比,这些规则的评估改进了60%的单词翻译,并且有55%的翻译单词与正确翻译完全匹配。在机器翻译失败的单词子集中,我们的方法在56%的情况下改进了翻译,其中27%与黄金标准完全匹配。