Zhang Chong, Lyu Jiagao, Xu Ke
State Key Lab of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China.
Knowl Inf Syst. 2023;65(2):827-853. doi: 10.1007/s10115-022-01781-7. Epub 2022 Nov 3.
With more and more news articles appearing on the Internet, discovering causal relations between news articles is very important for people to understand the development of news. Extracting the causal relations between news articles is an inter-document relation extraction task. Existing works on relation extraction cannot solve it well because of the following two reasons: (1) most relation extraction models are intra-document models, which focus on relation extraction between entities. However, news articles are many times longer and more complex than entities, which makes the inter-document relation extraction task harder than intra-document. (2) Existing inter-document relation extraction models rely on similarity information between news articles, which could limit the performance of extraction methods. In this paper, we propose an inter-document model based on storytree information to extract causal relations between news articles. We adopt storytree information to integer linear programming (ILP) and design the storytree constraints for the ILP objective function. Experimental results show that all the constraints are effective and the proposed method outperforms widely used machine learning models and a state-of-the-art deep learning model, with F1 improved by more than 5% on three different datasets. Further analysis shows that five constraints in our model improve the results to varying degrees and the effects on the three datasets are different. The experiment about link features also suggests the positive influence of link information.
随着互联网上出现的新闻文章越来越多,发现新闻文章之间的因果关系对于人们理解新闻的发展非常重要。提取新闻文章之间的因果关系是一项文档间关系提取任务。现有的关系提取工作由于以下两个原因不能很好地解决这个问题:(1)大多数关系提取模型是文档内模型,专注于实体之间的关系提取。然而,新闻文章比实体长很多且复杂得多,这使得文档间关系提取任务比文档内关系提取更难。(2)现有的文档间关系提取模型依赖于新闻文章之间的相似性信息,这可能会限制提取方法的性能。在本文中,我们提出了一种基于故事树信息的文档间模型来提取新闻文章之间的因果关系。我们将故事树信息应用于整数线性规划(ILP),并为ILP目标函数设计故事树约束。实验结果表明,所有约束都是有效的,并且所提出的方法优于广泛使用的机器学习模型和一个最先进的深度学习模型,在三个不同数据集上F1值提高了5%以上。进一步分析表明,我们模型中的五个约束在不同程度上改善了结果,并且对三个数据集的影响各不相同。关于链接特征的实验也表明了链接信息的积极影响。