Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK.
J Biomed Inform. 2010 Oct;43(5):762-73. doi: 10.1016/j.jbi.2010.06.001. Epub 2010 Jun 10.
Researchers have access to a vast amount of information stored in textual documents and there is a pressing need for the development of automated methods to enable and improve access to this resource. Lexical ambiguity, the phenomena in which a word or phrase has more than one possible meaning, presents a significant obstacle to automated text processing. Word Sense Disambiguation (WSD) is a technology that resolves these ambiguities automatically and is an important stage in text understanding. The most accurate approaches to WSD rely on manually labeled examples but this is usually not available and is prohibitively expensive to create. This paper offers a solution to that problem by using information in the UMLS Metathesaurus to automatically generate labeled examples. Two approaches are presented. The first is an extension of existing work (Liu et al., 2002 [1]) and the second a novel approach that exploits information in the UMLS that has not been used for this purpose. The automatically generated examples are evaluated by comparing them against the manually labeled ones in the NLM-WSD data set and are found to outperform the baseline. The examples generated using the novel approach produce an improvement in WSD performance when combined with manually labeled examples.
研究人员可以访问存储在文本文件中的大量信息,因此迫切需要开发自动化方法来启用和改善对这些资源的访问。词汇歧义是指一个词或短语有不止一种可能的含义,这给自动化文本处理带来了重大障碍。词类消歧(WSD)是一种自动解决这些歧义的技术,是文本理解的重要阶段。最准确的 WSD 方法依赖于手动标记的示例,但通常无法获得这些示例,并且创建这些示例的成本非常高。本文通过使用 UMLS Metathesaurus 中的信息来自动生成标记示例,解决了这个问题。本文提出了两种方法。第一种方法是对现有工作(Liu 等人,2002 [1])的扩展,第二种方法是利用 UMLS 中尚未用于此目的的信息的新方法。通过将自动生成的示例与 NLM-WSD 数据集中的手动标记示例进行比较,评估了自动生成的示例,并发现它们的性能优于基线。当与手动标记的示例结合使用时,使用新方法生成的示例可以提高 WSD 的性能。