Mary Vincent, Pouliquen Bruno, Le Duff Franck, Darmoni Stefan J, Segui Alain, Le Beux Pierre
Laboratoire d'informatique médicale, Faculté de Médecine, Rennes, France.
Stud Health Technol Inform. 2002;90:388-92.
French pharmaceutical theses are rarely quoted. If the main obstacles originate from language or access barriers, proper indexation could also be blamed. Manually extracted key-words don't necessary come from a structured thesaurus. In the following work, this manual indexing method is compared to an automated one, "Nomindex", based on UMLS. The automated method is improved by the addition of a relevance scoring system. The first indexing step consists of downloading, adapting and indexing theses in electronic format. Results will then be analyzed and sorted by relevance, through the comparison of classic statistical indices (noise, silence and relevance). It was assumed that the manually obtained key-words were always relevant. The silence of manual indexing is nevertheless high: seven new key-words are proposed by Nomindex, which results are mixed (10% of silence, but 50% of noise). These results are promising on the first experiment on pharmaceutical document without lexicon improvement. The indexing, if it is currently insufficient for a real life use, could easily be improved by specific updates of the lexicon.
法国药学论文很少被引用。如果主要障碍源于语言或获取障碍,那么索引编制不当也难辞其咎。人工提取的关键词不一定来自结构化词库。在以下工作中,将这种人工索引方法与基于统一医学语言系统(UMLS)的自动索引方法“Nomindex”进行比较。通过添加相关性评分系统对自动索引方法进行了改进。第一个索引步骤包括以电子格式下载、改编并索引论文。然后,通过比较经典统计指标(噪声、沉默和相关性),按相关性对结果进行分析和排序。假定人工获取的关键词总是相关的。然而,人工索引的沉默率很高:Nomindex提出了七个新关键词,其结果参差不齐(沉默率为10%,但噪声率为50%)。在未经词汇改进的药学文献首次实验中,这些结果很有前景。如果目前索引编制不足以用于实际应用,通过词汇的特定更新可以很容易地加以改进。