Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy.
Biometrics. 2021 Mar;77(1):136-149. doi: 10.1111/biom.13281. Epub 2020 May 8.
We assume that multivariate observational data are generated from a distribution whose conditional independencies are encoded in a Directed Acyclic Graph (DAG). For any given DAG, the causal effect of a variable onto another one can be evaluated through intervention calculus. A DAG is typically not identifiable from observational data alone. However, its Markov equivalence class (a collection of DAGs) can be estimated from the data. As a consequence, for the same intervention a set of causal effects, one for each DAG in the equivalence class, can be evaluated. In this paper, we propose a fully Bayesian methodology to make inference on the causal effects of any intervention in the system. Main features of our method are: (a) both uncertainty on the equivalence class and the causal effects are jointly modeled; (b) priors on the parameters of the modified Cholesky decomposition of the precision matrices across all DAG models are constructively assigned starting from a unique prior on the complete (unrestricted) DAG; (c) an efficient algorithm to sample from the posterior distribution on graph space is adopted; (d) an objective Bayes approach, requiring virtually no user specification, is used throughout. We demonstrate the merits of our methodology in simulation studies, wherein comparisons with current state-of-the-art procedures turn out to be highly satisfactory. Finally we examine a real data set of gene expressions for Arabidopsis thaliana.
我们假设多元观测数据是从一个分布中生成的,该分布的条件独立性被编码在有向无环图(DAG)中。对于任何给定的 DAG,可以通过干预演算评估一个变量对另一个变量的因果效应。DAG 通常不能仅从观测数据中识别出来。然而,它的马尔可夫等价类(DAG 的集合)可以从数据中估计出来。因此,对于相同的干预措施,可以评估等价类中每个 DAG 的一组因果效应。在本文中,我们提出了一种完全贝叶斯方法,用于对系统中任何干预的因果效应进行推断。我们方法的主要特点是:(a)对等价类和因果效应的不确定性进行联合建模;(b)从完整(无约束)DAG 的唯一先验开始,对所有 DAG 模型的精度矩阵的修改 Cholesky 分解的参数进行建设性分配;(c)采用了从图空间后验分布中抽样的有效算法;(d)在整个过程中使用客观贝叶斯方法,几乎不需要用户指定。我们在模拟研究中证明了我们方法的优点,与当前最先进的程序相比,结果非常令人满意。最后,我们检查了拟南芥基因表达的真实数据集。