Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany.
University Hospital Frankfurt, Goethe University, Executive Department for medical IT-Systems and digitalization, Frankfurt, Germany.
N Biotechnol. 2023 Nov 25;77:120-129. doi: 10.1016/j.nbt.2023.08.004. Epub 2023 Aug 29.
Standardised medical terminologies are used to ensure accurate and consistent communication of information and to facilitate data exchange. Currently, many terminologies are only available in English, which hinders international research and automated processing of medical data. Natural language processing (NLP) and Machine Translation (MT) methods can be used to automatically translate these terms. This scoping review examines the research on automated translation of standardised medical terminology. A search was performed in PubMed and Web of Science and results were screened for eligibility by title and abstract as well as full text screening. In addition to bibliographic data, the following data items were considered: 'terminology considered', 'terms considered', 'source language', 'target language', 'translation type', 'NLP technique', 'NLP system', 'machine translation system', 'data source' and 'translation quality'. The results showed that the most frequently translated terminology is SNOMED CT (39.1%), followed by MeSH (13%), ICD (13%) and UMLS (8.7%). The most common source language is English (55.9%), and the most common target language is German (41.2%). Translation methods are often based on Statistical Machine Translation (SMT) (41.7%) and, more recently, Neural Machine Translation (NMT) (30.6%), but can also be combined with various MT methods. Commercial translators such as Google Translate (36.4%) and automatic validation methods such as BLEU (22.2%) are frequently used tools for translation and subsequent validation.
标准化医学术语用于确保信息的准确和一致的交流,并促进数据交换。目前,许多术语仅以英文提供,这阻碍了国际研究和医学数据的自动处理。自然语言处理 (NLP) 和机器翻译 (MT) 方法可用于自动翻译这些术语。本范围审查研究了标准化医学术语的自动翻译研究。在 PubMed 和 Web of Science 中进行了搜索,并通过标题和摘要以及全文筛选来筛选结果的合格性。除了书目数据外,还考虑了以下数据项:“考虑的术语”、“考虑的术语”、“源语言”、“目标语言”、“翻译类型”、“NLP 技术”、“NLP 系统”、“机器翻译系统”、“数据源”和“翻译质量”。结果表明,翻译频率最高的术语是 SNOMED CT(39.1%),其次是 MeSH(13%)、ICD(13%)和 UMLS(8.7%)。最常见的源语言是英语(55.9%),最常见的目标语言是德语(41.2%)。翻译方法通常基于统计机器翻译 (SMT)(41.7%),最近还基于神经机器翻译 (NMT)(30.6%),但也可以与各种 MT 方法结合使用。Google Translate 等商业翻译器(36.4%)和 BLEU 等自动验证方法(22.2%)是翻译和后续验证的常用工具。