Dukić David, Došilović Filip Karlo, Pluščec Domagoj, Šnajder Jan
University of Zagreb Faculty of Electrical Engineering and Computing, TakeLab, Zagreb, Croatia.
PeerJ Comput Sci. 2024 Oct 10;10:e2355. doi: 10.7717/peerj-cs.2355. eCollection 2024.
News event monitoring systems allow real-time monitoring of a large number of events reported in the news, including the urgent and critical events comprising the so-called hard news. These systems heavily rely on natural language processing (NLP) to perform automatic event extraction at scale. While state-of-the-art event extraction models are readily available, integrating them into a news event monitoring system is not as straightforward as it seems due to practical issues related to model selection, robustness, and scale. To address this gap, we present a study on the practical use of event extraction models for news event monitoring. Our study focuses on the key task of closed-domain main event extraction (CDMEE), which aims to determine the type of the story's main event and extract its arguments from the text. We evaluate a range of state-of-the-art NLP models for this task, including those based on pre-trained language models. Aiming at a more realistic evaluation than done in the literature, we introduce a new dataset manually labeled with event types and their arguments. Additionally, we assess the scalability of CDMEE models and analyze the trade-off between accuracy and inference speed. Our results give insights into the performance of state-of-the-art NLP models on the CDMEE task and provide recommendations for developing effective, robust, and scalable news event monitoring systems.
新闻事件监测系统允许对新闻中报道的大量事件进行实时监测,包括构成所谓硬新闻的紧急和关键事件。这些系统严重依赖自然语言处理(NLP)来大规模执行自动事件提取。虽然最先进的事件提取模型很容易获得,但由于与模型选择、鲁棒性和规模相关的实际问题,将它们集成到新闻事件监测系统中并不像看起来那么简单。为了弥补这一差距,我们提出了一项关于事件提取模型在新闻事件监测中的实际应用的研究。我们的研究重点是封闭域主要事件提取(CDMEE)的关键任务,该任务旨在确定故事主要事件的类型并从文本中提取其论据。我们针对此任务评估了一系列最先进的NLP模型,包括基于预训练语言模型的模型。为了进行比文献中更现实的评估,我们引入了一个新的数据集,该数据集手动标注了事件类型及其论据。此外,我们评估了CDMEE模型的可扩展性,并分析了准确性和推理速度之间的权衡。我们的结果深入了解了最先进的NLP模型在CDMEE任务上的性能,并为开发有效、鲁棒和可扩展的新闻事件监测系统提供了建议。