Hossain K S M Tozammel, Harutyunyan Hrayr, Ning Yue, Kennedy Brendan, Ramakrishnan Naren, Galstyan Aram
Institute for Data Science & Informatics, University of Missouri, Columbia, MO, United States.
Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States.
Front Artif Intell. 2022 Oct 31;5:893875. doi: 10.3389/frai.2022.893875. eCollection 2022.
Forecasting societal events such as civil unrest, mass protests, and violent conflicts is a challenging problem with several important real-world applications in planning and policy making. While traditional forecasting approaches have typically relied on historical time series for generating such forecasts, recent research has focused on using open source surrogate data for more accurate and timely forecasts. Furthermore, leveraging such data can also help to identify precursors of those events that can be used to gain insights into the generated forecasts. The key challenge is to develop a unified framework for forecasting and precursor identification that can deal with missing historical data. Other challenges include sufficient flexibility in handling different types of events and providing interpretable representations of identified precursors. Although existing methods exhibit promising performance for predictive modeling in event detection, these models do not adequately address the above challenges. Here, we propose a unified framework based on an attention-based long short-term memory (LSTM) model to simultaneously forecast events with sequential text datasets as well as identify precursors at different granularity such as documents and document excerpts. The key idea is to leverage word context in sequential and time-stamped documents such as news articles and blogs for learning a rich set of precursors. We validate the proposed framework by conducting extensive experiments with two real-world datasets-military action and violent conflicts in the Middle East and mass protests in Latin America. Our results show that overall, the proposed approach generates more accurate forecasts compared to the existing state-of-the-art methods, while at the same time producing a rich set of precursors for the forecasted events.
预测诸如内乱、大规模抗议和暴力冲突等社会事件是一个具有挑战性的问题,在规划和政策制定中有几个重要的实际应用。虽然传统的预测方法通常依赖历史时间序列来生成此类预测,但最近的研究集中在使用开源替代数据进行更准确、更及时的预测。此外,利用这些数据还可以帮助识别这些事件的先兆,从而深入了解生成的预测。关键挑战在于开发一个统一的框架,用于预测和先兆识别,该框架能够处理缺失的历史数据。其他挑战包括在处理不同类型事件时具有足够的灵活性,以及提供已识别先兆的可解释表示。尽管现有方法在事件检测的预测建模中表现出有前景的性能,但这些模型并未充分解决上述挑战。在此,我们提出一个基于注意力的长短期记忆(LSTM)模型的统一框架,以同时利用顺序文本数据集预测事件,并在不同粒度(如文档和文档摘录)上识别先兆。关键思想是利用诸如新闻文章和博客等顺序和带时间戳的文档中的词上下文来学习丰富的先兆集。我们通过对两个真实世界数据集——中东的军事行动和暴力冲突以及拉丁美洲的大规模抗议——进行广泛实验来验证所提出的框架。我们的结果表明,总体而言,与现有的最先进方法相比,所提出的方法能生成更准确的预测,同时为预测事件生成丰富的先兆集。