Nyström Mikael, Merkel Magnus, Ahrenberg Lars, Zweigenbaum Pierre, Petersson Håkan, Ahlfeldt Hans
Department of Biomedical Engineering, Linköpings universitet, SE-58185 Linköping, Sweden.
BMC Med Inform Decis Mak. 2006 Oct 12;6:35. doi: 10.1186/1472-6947-6-35.
This paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish.
The medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates.
A dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages.
In three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.
本文报告了从医学术语系统ICD - 10、ICF、MeSH、NCSP和KSH97 - P中并行收集的类目及其在半自动创建医学术语英 - 瑞典语词典中的应用。所介绍的方法适用于许多其他西欧语言对,而非仅限于英 - 瑞典语。
以电子格式收集英语和瑞典语的医学术语系统,并提取平行语言对中的类目。最初,使用交互式单词对齐从样本中创建训练数据。然后将训练数据用于自动单词对齐,以生成候选术语对。最后一步是对候选术语对进行人工验证。
在不到三人周的时间内创建了一个包含31,000个经过验证条目的词典,因此与手动方法相比所需的时间和精力大大减少,且不影响质量。作为我们工作的一个附带结果,我们在术语系统中发现了40个不同的翻译问题,这些结果表明该方法在发现术语翻译不一致方面的强大功能。我们还报告了一些可能有助于使用类似工具更便捷地创建词典过程中的因素。最后,讨论了与其他正在进行的为非英语语言构建医学词典的努力相关的贡献。
在三人周内,我们能够制作一个包含31,000个条目的医学英 - 瑞典语词典,并在所用的医学术语系统中发现了隐藏的翻译错误。