Cai Ruichu, Zhang Zhenjie, Hao Zhifeng, Winslett Marianne
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3623-3635. doi: 10.1109/TNNLS.2017.2734804. Epub 2017 Aug 24.
Scalable causal discovery is an essential technology to a wide spectrum of applications, including biomedical studies and social network evolution analysis. To tackle the difficulty of high dimensionality, a number of solutions are proposed in the literature, generally dividing the original variable domain into smaller subdomains by computation intensive partitioning strategies. These approaches usually suffer significant structural errors when the partitioning strategies fail to recognize true causal edges across the output subdomains. Such a structural error accumulates quickly with the growing depth of recursive partitioning, due to the lack of correction mechanism over causally connected variables when they are wrongly divided into two subdomains, finally jeopardizing the robustness of the integrated results. This paper proposes a completely different strategy to solve the problem, powered by a lightweight random partitioning scheme together with a carefully designed merging algorithm over results from the random partitions. Based on the randomness properties of the partitioning scheme, we design a suite of tricks for the merging algorithm, in order to support propagation-based significance enhancement, maximal acyclic subgraph causal ordering, and order-sensitive redundancy elimination. Theoretical studies as well as empirical evaluations verify the genericity, effectiveness, and scalability of our proposal on both simulated and real-world causal structures when the scheme is used in combination with a variety of causal solvers known effective on smaller domains.
可扩展因果发现是广泛应用领域中的一项关键技术,包括生物医学研究和社交网络演化分析。为解决高维难题,文献中提出了多种解决方案,通常通过计算密集型划分策略将原始变量域划分为更小的子域。当划分策略无法识别跨输出子域的真实因果边时,这些方法通常会出现显著的结构错误。由于在因果相关变量被错误地划分为两个子域时缺乏校正机制,这种结构错误会随着递归划分深度的增加而迅速累积,最终危及集成结果的稳健性。本文提出了一种截然不同的策略来解决该问题,该策略由轻量级随机划分方案以及针对随机划分结果精心设计的合并算法驱动。基于划分方案的随机性属性,我们为合并算法设计了一套技巧,以支持基于传播的显著性增强、最大无环子图因果排序和顺序敏感冗余消除。理论研究以及实证评估验证了我们的方案在与各种已知在较小域上有效的因果求解器结合使用时,在模拟和现实世界因果结构上的通用性、有效性和可扩展性。