Han Huihui, Wang Jian, Wang Xiaowen
Country Computer Integrated Manufacturing System Research Center, College of Electronics and Information Engineering, Tongji University, Shanghai, China.
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China.
Front Neurorobot. 2022 Jul 4;16:914705. doi: 10.3389/fnbot.2022.914705. eCollection 2022.
The entity relation extraction in the form of triples from unstructured text is a key step for self-learning knowledge graph construction. Two main methods have been proposed to extract relation triples, namely, the pipeline method and the joint learning approach. However, these models do not deal with the overlapping relation problem well. To overcome this challenge, we present a relation-oriented model with global context information for joint entity relation extraction, namely, ROMGCJE, which is an encoder-decoder model. The encoder layer aims to build long-term dependencies among words and capture rich global context representation. Besides, the relation-aware attention mechanism is applied to make use of the relation information to guide the entity detection. The decoder part consists of a multi-relation classifier for the relation classification task, and an improved long short-term memory for the entity recognition task. Finally, the minimum risk training mechanism is introduced to jointly train the model to generate final relation triples. Comprehensive experiments conducted on two public datasets, NYT and WebNLG, show that our model can effectively extract overlapping relation triples and outperforms the current state-of-the-art methods.
从非结构化文本中以三元组形式提取实体关系是自学习知识图谱构建的关键步骤。目前已提出两种主要方法来提取关系三元组,即流水线方法和联合学习方法。然而,这些模型不能很好地处理重叠关系问题。为了克服这一挑战,我们提出了一种面向关系的模型ROMGCJE,用于联合实体关系提取,该模型具有全局上下文信息,是一种编码器-解码器模型。编码器层旨在建立单词之间的长期依赖关系,并捕捉丰富的全局上下文表示。此外,关系感知注意力机制被应用于利用关系信息来指导实体检测。解码器部分由用于关系分类任务的多关系分类器和用于实体识别任务的改进长短期记忆组成。最后,引入最小风险训练机制来联合训练模型以生成最终的关系三元组。在两个公共数据集NYT和WebNLG上进行的综合实验表明,我们的模型可以有效地提取重叠关系三元组,并且优于当前的最先进方法。