Liu Xinfang, Yang Wenzhong, Wei Fuyuan, Wu Zhonghua
School of Computer Science and Technology, Xinjiang University, Urumqi, 830017, China.
Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi, 830017, China.
Sci Rep. 2024 Dec 30;14(1):32078. doi: 10.1038/s41598-024-83678-9.
Event Causality Identification (ECI) aims to predict causal relations between events in a text. Existing research primarily focuses on leveraging external knowledge such as knowledge graphs and dependency trees to construct explicit structured features to enrich event representations. However, this approach underestimates the semantic features of the original input sentences and performs poorly in capturing implicit causal relations. Therefore, this paper proposes a new framework based on Hierarchical Feature Extraction and Prompt-aware Attention (HFEPA) to address the issues above. On the one hand, we introduce a Hierarchical Feature Extraction (HFE) module to extract two kinds of features based on the input sentences: event mention level and segment level, enriching the semantic information of events through the interaction between event pairs and different segments. On the other hand, we design a Prompt-aware Attention (PAA) module that utilizes implicit causal knowledge in pre-trained language models to capture potential relationship information between events. This information is then combined with the contextual information of the text sequence to enhance the model's ability to identify implicit causal relations between events. Additionally, this task faces challenges in the Chinese domain due to the limited scale of annotated datasets, leading to relatively slow research progress. To address this issue, we propose a new Chinese ECI dataset (Chinese News Causality), aiming to solve the current data scarcity problem in the Chinese domain. This dataset contains 25,629 event mentions and 5,569 causal event pairs, making it, to our knowledge, the largest Chinese dataset to date. We evaluate the effectiveness of HFEPA on both the EventStoryLine and Chinese News Causality datasets, and experimental results show that HFEPA significantly outperforms previous methods. The CNC dataset is available at https://github.com/twinkle121/CNC .
事件因果关系识别(ECI)旨在预测文本中事件之间的因果关系。现有研究主要集中于利用诸如知识图谱和依存句法树等外部知识来构建显式的结构化特征,以丰富事件表示。然而,这种方法低估了原始输入句子的语义特征,在捕捉隐含因果关系方面表现不佳。因此,本文提出了一种基于分层特征提取和提示感知注意力(HFEPA)的新框架来解决上述问题。一方面,我们引入了一个分层特征提取(HFE)模块,基于输入句子提取两种特征:事件提及级别和片段级别,通过事件对与不同片段之间的交互来丰富事件的语义信息。另一方面,我们设计了一个提示感知注意力(PAA)模块,该模块利用预训练语言模型中的隐含因果知识来捕捉事件之间的潜在关系信息。然后,这些信息与文本序列的上下文信息相结合,以增强模型识别事件之间隐含因果关系的能力。此外,由于标注数据集规模有限,该任务在中国领域面临挑战,导致研究进展相对缓慢。为了解决这个问题,我们提出了一个新的中文ECI数据集(中文新闻因果关系数据集),旨在解决当前中国领域的数据稀缺问题。该数据集包含25629个事件提及和5569个因果事件对,据我们所知,这是迄今为止最大的中文数据集。我们在EventStoryLine和中文新闻因果关系数据集上评估了HFEPA的有效性,实验结果表明,HFEPA显著优于先前的方法。CNC数据集可在https://github.com/twinkle121/CNC获取。