一种基于跨度和级联双解码的实体关系联合提取模型。

A Joint Extraction Model for Entity Relationships Based on Span and Cascaded Dual Decoding.

作者信息

Liao Tao, Sun Haojie, Zhang Shunxiang

机构信息

College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China.

出版信息

Entropy (Basel). 2023 Aug 16;25(8):1217. doi: 10.3390/e25081217.

DOI:10.3390/e25081217

PMID:37628247

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10453911/

Abstract

The entity-relationship joint extraction model plays a significant role in entity relationship extraction. The existing entity-relationship joint extraction model cannot effectively identify entity-relationship triples in overlapping relationships. This paper proposes a new joint entity-relationship extraction model based on the span and a cascaded dual decoding. The model includes a Bidirectional Encoder Representations from Transformers (BERT) encoding layer, a relational decoding layer, and an entity decoding layer. The model first converts the text input into the BERT pretrained language model into word vectors. Then, it divides the word vectors based on the span to form a span sequence and decodes the relationship between the span sequence to obtain the relationship type in the span sequence. Finally, the entity decoding layer fuses the span sequences and the relationship type obtained by relation decoding and uses a bi-directional long short-term memory (Bi-LSTM) neural network to obtain the head entity and tail entity in the span sequence. Using the combination of span division and cascaded double decoding, the overlapping relations existing in the text can be effectively identified. Experiments show that compared with other baseline models, the F1 value of the model is effectively improved on the NYT dataset and WebNLG dataset.

摘要

实体关系联合提取模型在实体关系提取中起着重要作用。现有的实体关系联合提取模型无法有效识别重叠关系中的实体关系三元组。本文提出了一种基于跨度和级联双解码的新型实体关系联合提取模型。该模型包括一个来自变换器的双向编码器表示（BERT）编码层、一个关系解码层和一个实体解码层。该模型首先将文本输入转换为BERT预训练语言模型中的词向量。然后，基于跨度对词向量进行划分，形成跨度序列，并对跨度序列之间的关系进行解码，以获得跨度序列中的关系类型。最后，实体解码层融合通过关系解码获得的跨度序列和关系类型，并使用双向长短期记忆（Bi-LSTM）神经网络来获得跨度序列中的头实体和尾实体。通过跨度划分和级联双解码的结合，可以有效识别文本中存在的重叠关系。实验表明，与其他基线模型相比，该模型在NYT数据集和WebNLG数据集上的F1值得到了有效提高。