Jabbari Fattaneh, Ramsey Joseph, Spirtes Peter, Cooper Gregory
Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, USA.
Mach Learn Knowl Discov Databases. 2017 Sep;2017:142-157. doi: 10.1007/978-3-319-71246-8_9. Epub 2017 Dec 30.
Discovering causal structure from observational data in the presence of latent variables remains an active research area. Constraint-based causal discovery algorithms are relatively efficient at discovering such causal models from data using independence tests. Typically, however, they derive and output only one such model. In contrast, Bayesian methods can generate and probabilistically score multiple models, outputting the most probable one; however, they are often computationally infeasible to apply when modeling latent variables. We introduce a hybrid method that derives a Bayesian probability that the set of independence tests associated with a given causal model are jointly correct. Using this constraint-based scoring method, we are able to score multiple causal models, which possibly contain latent variables, and output the most probable one. The structure-discovery performance of the proposed method is compared to an existing constraint-based method (RFCI) using data generated from several previously published Bayesian networks. The structural Hamming distances of the output models improved when using the proposed method compared to RFCI, especially for small sample sizes.
在存在潜在变量的情况下从观测数据中发现因果结构仍然是一个活跃的研究领域。基于约束的因果发现算法在使用独立性检验从数据中发现此类因果模型方面相对高效。然而,通常它们仅推导并输出一个这样的模型。相比之下,贝叶斯方法可以生成多个模型并对其进行概率评分,输出最可能的模型;然而,在对潜在变量进行建模时,它们通常在计算上不可行。我们引入了一种混合方法,该方法得出与给定因果模型相关联的独立性检验集联合正确的贝叶斯概率。使用这种基于约束的评分方法,我们能够对多个可能包含潜在变量的因果模型进行评分,并输出最可能的模型。使用从几个先前发表的贝叶斯网络生成的数据,将所提出方法的结构发现性能与现有的基于约束的方法(RFCI)进行比较。与RFCI相比,使用所提出的方法时输出模型的结构汉明距离有所改善,尤其是对于小样本量。