Department of Computer Science, Sheffield University, Sheffield, UK.
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40. doi: 10.1136/amiajnl-2011-000415. Epub 2011 Sep 7.
Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears.
The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context.
A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset.
The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.
当前基于知识的生物医学术语歧义消解(WSD)技术依赖于统一医学语言系统术语表中的关系,但并未考虑目标文档的领域。作者的目标是通过使用在出现歧义术语的文档的主题信息来改进这些方法。
作者提出并实现了几种方法来提取与医学主题词相关的关键词列表。这些关键词用于在基于知识的 WSD 系统中表示文档主题。它们单独使用或与局部上下文结合使用。
在广泛使用的国家医学图书馆 WSD 数据集的目标词集上计算了标准准确性度量。
作者报告说,当将这些关键词与局部上下文结合使用时,准确性有了显著提高,这表明领域信息可以提高仅基于统一医学语言系统术语表的 WSD 系统的结果。使用通过相关性反馈获得的关键词并按逆文档频率加权得到的最佳结果。