Suppr超能文献

从新闻中提取因果图:时间序列因果关系学习技术的比较研究

Causal graph extraction from news: a comparative study of time-series causality learning techniques.

作者信息

Maisonnave Mariano, Delbianco Fernando, Tohme Fernando, Milios Evangelos, Maguitman Ana G

机构信息

Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur, Bahía Blanca, Buenos Aires, Argentina.

Faculty of Computer Science, Dalhousie University, Halifax, Canada.

出版信息

PeerJ Comput Sci. 2022 Aug 3;8:e1066. doi: 10.7717/peerj-cs.1066. eCollection 2022.

Abstract

Causal graph extraction from news has the potential to aid in the understanding of complex scenarios. In particular, it can help explain and predict events, as well as conjecture about possible cause-effect connections. However, limited work has addressed the problem of large-scale extraction of causal graphs from news articles. This article presents a novel framework for extracting causal graphs from digital text media. The framework relies on topic-relevant variables representing terms and ongoing events that are selected from a domain under analysis by applying specially developed information retrieval and natural language processing methods. Events are represented as event-phrase embeddings, which make it possible to group similar events into semantically cohesive clusters. A time series of the selected variables is given as input to a causal structure learning techniques to learn a causal graph associated with the topic that is being examined. The complete framework is applied to the New York Times dataset, which covers news for a period of 246 months (roughly 20 years), and is illustrated through a case study. An initial evaluation based on synthetic data is carried out to gain insight into the most effective time-series causality learning techniques. This evaluation comprises a systematic analysis of nine state-of-the-art causal structure learning techniques and two novel ensemble methods derived from the most effective techniques. Subsequently, the complete framework based on the most promising causal structure learning technique is evaluated with domain experts in a real-world scenario through the use of the presented case study. The proposed analysis offers valuable insights into the problems of identifying topic-relevant variables from large volumes of news and learning causal graphs from time series.

摘要

从新闻中提取因果图有助于理解复杂的事件场景。特别是,它可以帮助解释和预测事件,以及推测可能的因果关系。然而,目前针对从新闻文章中大规模提取因果图的研究还比较有限。本文提出了一种从数字文本媒体中提取因果图的新颖框架。该框架依赖于与主题相关的变量,这些变量代表通过应用专门开发的信息检索和自然语言处理方法从分析领域中选择的术语和正在发生的事件。事件被表示为事件短语嵌入,这使得将相似事件分组为语义连贯的集群成为可能。将所选变量的时间序列作为输入提供给因果结构学习技术,以学习与正在研究的主题相关的因果图。完整的框架应用于《纽约时报》数据集,该数据集涵盖了246个月(约20年)的新闻,并通过一个案例研究进行说明。基于合成数据进行了初步评估,以深入了解最有效的时间序列因果关系学习技术。该评估包括对九种最先进的因果结构学习技术以及从最有效技术派生的两种新颖的集成方法进行系统分析。随后,通过使用所呈现的案例研究,在实际场景中与领域专家一起对基于最有前景的因果结构学习技术的完整框架进行评估。所提出的分析为从大量新闻中识别与主题相关的变量以及从时间序列中学习因果图的问题提供了有价值的见解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验