Baud R, Lovis C, Rassinoux A M, Michel P A, Scherrer J R
Division d'Informatique Médicale, University Hospital of Geneva, Switzerland.
Stud Health Technol Inform. 1998;52 Pt 1:581-5.
Automatic extraction of knowledge from large corpus of texts is an essential step toward linguistic knowledge acquisition in the medical domain. The current situation shows a lack of computer-readable large medical lexicons, with a partial exception for the English language. Moreover, multilingual lexicons with versatility for multiple languages applications are far from reach as long as only manual extraction is considered. Computer-assisted linguistic knowledge acquisition is a must. A multilingual lexicon differs from a monolingual one by the necessity to bridge the words in different languages. A kind of interlingua has to be built under the form of concepts to which the specific entries are attached. In the present approach, the authors have developed an intelligent rule-based tool in order to focus on a multilingual source of medical knowledge, like the International Classification of Disease (ICD) which contains a vocabulary of some 20,000 words, translated in numerous languages.
从大量文本语料库中自动提取知识是医学领域语言知识获取的关键一步。当前的情况表明,缺乏计算机可读的大型医学词典,英语在一定程度上除外。此外,只要仅考虑人工提取,具有多种语言应用通用性的多语言词典就远不可及。计算机辅助语言知识获取势在必行。多语言词典与单语言词典的不同之处在于需要在不同语言的词汇之间架起桥梁。必须以概念的形式构建一种中间语言,并附上具体的词条。在当前的方法中,作者开发了一种基于智能规则的工具,以便专注于多语言医学知识源,如国际疾病分类(ICD),它包含约20,000个单词的词汇表,并被翻译成多种语言。