Suppr超能文献

DAGAF:用于联合结构学习和表格数据合成的有向无环生成对抗框架。

DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis.

作者信息

Petkov Hristo, MacLellan Calum, Dong Feng

机构信息

Department of Computer and Information Sciences, University of Strathclyde, 16 Richmond Street, Glasgow, Lanarkshire G1 1XQ United Kingdom.

出版信息

Appl Intell (Dordr). 2025;55(7):602. doi: 10.1007/s10489-025-06410-8. Epub 2025 Mar 31.

Abstract

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.

摘要

理解数据变量之间的因果关系能够为表格数据集的构建提供至关重要的见解。大多数现有的因果关系学习方法通常专注于应用单一可识别的因果模型,比如加性噪声模型(ANM)或线性非高斯无环模型(LiNGAM),来发现观测数据中呈现的依赖性。我们通过引入一种新颖的双步框架改进了这种方法,该框架能够在多个因果模型假设下执行因果结构学习和表格数据合成。我们的方法使用有向无环图(DAG)来表示数据变量之间的因果关系。通过应用包括ANM、LiNGAM和后非线性模型(PNL)在内的各种功能因果模型,我们隐式地学习DAG的内容以模拟观测数据的生成过程,有效地复制真实数据分布。这得到了理论分析的支持,该分析解释了构成框架目标函数的多个损失项。实验结果表明,在结构学习方面,DAGAF优于许多现有方法,在真实世界和基准数据集上均实现了显著更低的结构汉明距离(SHD)分数(与最先进方法相比,在Sachs数据集上提升了47%,在Child数据集上提升了11%,在Hailfinder数据集上提升了5%,在Pathfinder数据集上提升了7%),同时能够生成多样的高质量样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cf1/11958450/1053e4cc2d73/10489_2025_6410_Figd_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验