Deléger Louise, Merkel Magnus, Zweigenbaum Pierre
INSERM, UMR_S 872, Eq. 20, Centre des cordeliers, Paris F-75006, France.
J Biomed Inform. 2009 Aug;42(4):692-701. doi: 10.1016/j.jbi.2009.03.002. Epub 2009 Mar 9.
Developing international multilingual terminologies is a time-consuming process. We present a methodology which aims to ease this process by automatically acquiring new translations of medical terms based on word alignment in parallel text corpora, and test it on English and French. After collecting a parallel, English-French corpus, we detected French translations of English terms from three terminologies-MeSH, SNOMED CT and the MedlinePlus Health Topics. We obtained respectively for each terminology 74.8%, 77.8% and 76.3% of linguistically correct new translations. A sample of the MeSH translations was submitted to expert review and 61.5% were deemed desirable additions to the French MeSH. In conclusion, we successfully obtained good quality new translations, which underlines the suitability of using alignment in text corpora to help translating terminologies. Our method may be applied to different European languages and provides a methodological framework that may be used with different processing tools.
开发国际多语言术语是一个耗时的过程。我们提出了一种方法,旨在通过基于平行文本语料库中的词对齐自动获取医学术语的新翻译来简化这一过程,并在英语和法语上进行测试。在收集了一个平行的英法语料库后,我们从三个术语集——医学主题词表(MeSH)、医学系统命名法临床术语(SNOMED CT)和MedlinePlus健康主题中检测出了英语术语的法语翻译。对于每个术语集,我们分别获得了74.8%、77.8%和76.3%语言上正确的新翻译。医学主题词表翻译的一个样本提交给了专家评审,61.5%被认为是对法语医学主题词表的理想补充。总之,我们成功获得了高质量的新翻译,这突出了在文本语料库中使用对齐来帮助翻译术语的适用性。我们的方法可以应用于不同的欧洲语言,并提供了一个可以与不同处理工具一起使用的方法框架。