Navigli Roberto, Velardi Paola
Dipartimento di Informatica, Università of Roma La Sapienza, via Salaria 113, 00198 Roma, Italy.
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1075-86. doi: 10.1109/TPAMI.2005.149.
Word Sense Disambiguation (WSD) is traditionally considered an Al-hard problem. A break-through in this field would have a significant impact on many relevant Web-based applications, such as Web information retrieval, improved access to Web services, information extraction, etc. Early approaches to WSD, based on knowledge representation techniques, have been replaced in the past few years by more robust machine learning and statistical techniques. The results of recent comparative evaluations of WSD systems, however, show that these methods have inherent limitations. On the other hand, the increasing availability of large-scale, rich lexical knowledge resources seems to provide new challenges to knowledge-based approaches. In this paper, we present a method, called structural semantic interconnections (SSI), which creates structural specifications of the possible senses for each word in a context and selects the best hypothesis according to a grammar G, describing relations between sense specifications. Sense specifications are created from several available lexical resources that we integrated in part manually, in part with the help of automatic procedures. The SSI algorithm has been applied to different semantic disambiguation problems, like automatic ontology population, disambiguation of sentences in generic texts, disambiguation of words in glossary definitions. Evaluation experiments have been performed on specific knowledge domains (e.g., tourism, computer networks, enterprise interoperability), as well as on standard disambiguation test sets.
词义消歧(WSD)传统上被认为是一个人工智能难题。该领域的一项突破将对许多相关的基于网络的应用产生重大影响,如网络信息检索、改进对网络服务的访问、信息提取等。早期基于知识表示技术的WSD方法在过去几年中已被更强大的机器学习和统计技术所取代。然而,最近WSD系统的比较评估结果表明,这些方法存在固有局限性。另一方面,大规模、丰富的词汇知识资源的日益可得似乎给基于知识的方法带来了新的挑战。在本文中,我们提出了一种称为结构语义互连(SSI)的方法,该方法为上下文中的每个单词创建可能词义的结构规范,并根据描述词义规范之间关系的语法G选择最佳假设。词义规范是从几个可用的词汇资源中创建的,我们部分通过手动方式、部分借助自动程序对这些资源进行了整合。SSI算法已应用于不同的语义消歧问题,如自动本体填充、通用文本中句子的消歧、词汇表定义中单词的消歧。已针对特定知识领域(如旅游、计算机网络、企业互操作性)以及标准消歧测试集进行了评估实验。