Schulz Stefan, Bernhardt-Melischnig Johannes, Kreuzthaler Markus, Daumke Philipp, Boeker Martin
Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
Stud Health Technol Inform. 2013;192:581-4.
In the context of past and current SNOMED CT translation projects we compare three kinds of SNOMED CT translations from English to German by: (t1) professional medical translators; (t2) a free Web-based machine translation service; (t3) medical students.
500 SNOMED CT fully specified names from the (English) International release were randomly selected. Based on this, German translations t1, t2, and t3 were generated. A German and an Austrian physician rated the translations for linguistic correctness and content fidelity.
Kappa for inter-rater reliability was 0.4 for linguistic correctness and 0.23 for content fidelity. Average ratings of linguistic correctness did not differ significantly between human translation scenarios. Content fidelity was rated slightly better for student translators compared to professional translators. Comparing machine to human translation, the linguistic correctness differed about 0.5 scale units in favour of the human translation and about 0.25 regarding content fidelity, equally in favour of the human translation.
The results demonstrate that low-cost translation solutions of medical terms may produce surprisingly good results. Although we would not recommend low-cost translation for producing standardized preferred terms, this approach can be useful for creating additional language-specific entry terms. This may serve several important use cases. We also recommend testing this method to bootstrap a crowdsourcing process, by which term translations are gathered, improved, maintained, and rated by the user community.
在既往和当前的SNOMED CT翻译项目背景下,我们通过以下方式比较三种从英语到德语的SNOMED CT翻译:(t1)专业医学翻译人员;(t2)基于网络的免费机器翻译服务;(t3)医学生。
从(英语)国际版中随机选择500个SNOMED CT的完整指定名称。据此生成德语翻译t1、t2和t3。一名德国医生和一名奥地利医生对这些翻译的语言正确性和内容准确性进行评分。
评分者间信度的Kappa值在语言正确性方面为0.4,在内容准确性方面为0.23。在人工翻译场景中,语言正确性的平均评分没有显著差异。与专业翻译人员相比,学生翻译人员的内容准确性评分略高。将机器翻译与人工翻译进行比较,语言正确性方面人工翻译比机器翻译高出约0.5个量表单位,内容准确性方面同样高出约0.25个量表单位,均有利于人工翻译。
结果表明,医学术语的低成本翻译解决方案可能会产生出人意料的良好效果。虽然我们不建议使用低成本翻译来生成标准化的首选术语,但这种方法对于创建额外的特定语言词条可能有用。这可能适用于几个重要的用例。我们还建议测试这种方法以启动众包过程,通过该过程收集、改进、维护和由用户社区对术语翻译进行评分。