Deléger Louise, Namer Fiammetta, Zweigenbaum Pierre
INSERM U872, Eq. 20, 15 rue de l'Ecole de Médecine, Paris F-75006, France.
Int J Med Inform. 2009 Apr;78 Suppl 1:S48-55. doi: 10.1016/j.ijmedinf.2008.07.016. Epub 2008 Sep 17.
Medical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin--in which case they are called neoclassical compounds. Morphosemantic analysis can help generate definitions of such words. The similarity of structure of those compounds in several European languages has also been observed, which seems to indicate that a same linguistic analysis could be applied to neo-classical compounds from different languages with minor modifications.
This paper reports work on the adaptation of a morphosemantic analyzer dedicated to French (DériF) to analyze English medical neo-classical compounds. It presents the principles of this transposition and its current performance.
The analyzer was tested on a set of 1299 compounds extracted from the WHO-ART terminology. 859 could be decomposed and defined, 675 of which successfully.
An advantage of this process is that complex linguistic analyses designed for French could be successfully transposed to the analysis of English medical neoclassical compounds, which confirmed our hypothesis of transferability. The fact that the method was successfully applied to a Germanic language such as English suggests that performances would be at least as high if experimenting with Romance languages such as Spanish. Finally, the resulting system can produce more complete analyses of English medical compounds than existing systems, including a hierarchical decomposition and semantic gloss of each word.
医学语言与许多专业语言一样,包含大量形态复杂的词汇,其中许多源于希腊语和拉丁语,这类词被称为新古典复合词。形态语义分析有助于生成此类词汇的定义。人们还观察到这些复合词在几种欧洲语言中的结构相似性,这似乎表明,经过微小修改后,相同的语言分析方法可应用于不同语言的新古典复合词。
本文报告了一项关于改编用于分析法语的形态语义分析器(DériF)以分析英语医学新古典复合词的工作。介绍了这种转换的原理及其当前性能。
该分析器在从世界卫生组织药物术语(WHO-ART)中提取的1299个复合词集上进行了测试。其中859个可以分解并定义,675个成功完成。
这一过程的一个优点是,为法语设计复杂的语言分析方法可以成功转换为对英语医学新古典复合词的分析,这证实了我们关于可转移性的假设。该方法成功应用于像英语这样的日耳曼语,这表明如果用像西班牙语这样的罗曼语进行实验,性能至少会一样高。最后,与现有系统相比,所得系统可以对英语医学复合词进行更全面的分析,包括每个单词的层次分解和语义注释。