Zhang Yanzi
College of Chinese Language and Culture, Jinan University, Guangzhou, China.
PeerJ Comput Sci. 2023 Aug 17;9:e1509. doi: 10.7717/peerj-cs.1509. eCollection 2023.
Relation extraction is an important topic in information extraction, as it is used to create large-scale knowledge graphs for a variety of downstream applications. Its goal is to find and extract semantic links between entity pairs in natural language sentences. Deep learning has substantially advanced neural relation extraction, allowing for the autonomous learning of semantic features. We offer an effective Chinese relation extraction model that uses bidirectional LSTM (Bi-LSTM) and an attention mechanism to extract crucial semantic information from phrases without relying on domain knowledge from lexical resources or language systems in this study. The attention mechanism included into the Bi-LSTM network allows for automatic focus on key words. Two benchmark datasets were used to create and test our models: Chinese SanWen and FinRE. The experimental results show that the SanWen dataset model outperforms the FinRE dataset model, with area under the receiver operating characteristic curve values of 0.70 and 0.50, respectively. The models trained on the SanWen and FinRE datasets achieve values of 0.44 and 0.19, respectively, for the area under the precision-recall curve. In addition, the results of repeated modeling experiments indicated that our proposed method was robust and reproducible.
关系抽取是信息抽取中的一个重要课题,因为它被用于为各种下游应用创建大规模知识图谱。其目标是在自然语言句子中的实体对之间找到并提取语义链接。深度学习极大地推动了神经关系抽取的发展,使得能够自主学习语义特征。在本研究中,我们提供了一种有效的中文关系抽取模型,该模型使用双向长短期记忆网络(Bi-LSTM)和注意力机制,无需依赖词汇资源或语言系统中的领域知识,就能从短语中提取关键语义信息。纳入Bi-LSTM网络的注意力机制能够自动聚焦于关键词。我们使用两个基准数据集来创建和测试我们的模型:中文散文(Chinese SanWen)和金融关系抽取数据集(FinRE)。实验结果表明,中文散文数据集模型优于金融关系抽取数据集模型,其在接收者操作特征曲线下的面积值分别为0.70和0.50。在中文散文和金融关系抽取数据集上训练的模型,其精确率-召回率曲线下的面积值分别为0.44和0.19。此外,重复建模实验的结果表明,我们提出的方法具有鲁棒性且可重复。