Yang Junggi, Lee Youngho
Department of IT Convergence Engineering, Gachon University, Seongnam, Korea.
IT Department, Gachon University, Seongnam, Korea.
Healthc Inform Res. 2014 Oct;20(4):272-9. doi: 10.4258/hir.2014.20.4.272. Epub 2014 Oct 31.
Anaphora recognition is a process to identify exactly which noun has been used previously and relates to a pronoun that is included in a specific sentence later. Therefore, anaphora recognition is an essential element of a dialogue agent system. In the current study, all the merits of rule-based, machine learning-based, semantic-based anaphora recognition systems were combined to design and realize a new hybrid-type anaphora recognition system with an optimum capacity.
Anaphora recognition rules were encoded on the basis of the internal traits of referred expressions and adjacent contexts to realize a rule-based system and to serve as a baseline. A semantic database, related to predicate instances of sentences including referred expressions, was constructed to identify semantic co-relationships between the referent candidates (to which semantic tags were attached) and the semantic information of predicates. This approach would upgrade the anaphora recognition system by reducing the number of referent candidates. Additionally, to realize a machine learning-based system, an anaphora recognition model was developed on the basis of training data, which indicated referred expressions and referents. The three methods were further combined to develop a new single hybrid-based anaphora recognition system.
The precision rate of the rule-based systems was 54.9%. However, the precision rate of the hybrid-based system was 63.7%, proving it to be the most efficient method.
The hybrid-based method, developed by the combination of rule-based and machine learning-based methods, represents a new system with enhanced functional capabilities as compared to other pre-existing individual methods.
指代消解是一个精确识别前文所使用的哪个名词与后文特定句子中包含的代词相关的过程。因此,指代消解是对话代理系统的一个基本要素。在本研究中,结合了基于规则、基于机器学习、基于语义的指代消解系统的所有优点,设计并实现了一个具有最佳性能的新型混合式指代消解系统。
基于所指表达式的内在特征和相邻语境对指代消解规则进行编码,以实现基于规则的系统并作为基线。构建一个与包含所指表达式的句子的谓词实例相关的语义数据库,以识别候选所指对象(附加了语义标签)与谓词的语义信息之间的语义共关系。这种方法将通过减少候选所指对象的数量来升级指代消解系统。此外,为了实现基于机器学习的系统,基于指示所指表达式和所指对象的训练数据开发了一个指代消解模型。将这三种方法进一步结合,开发出一种新的基于单一混合的指代消解系统。
基于规则的系统的精确率为54.9%。然而,基于混合的系统的精确率为63.7%,证明它是最有效的方法。
由基于规则和基于机器学习的方法相结合开发的基于混合的方法,与其他现有的单独方法相比,代表了一个功能增强的新系统。