Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.
Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany.
Stat Med. 2022 Oct 15;41(23):4716-4743. doi: 10.1002/sim.9535. Epub 2022 Jul 31.
Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
因果发现算法可以根据观测数据估计因果图。这可以为专注于个体治疗效果对之间因果关系的分析提供有价值的补充。基于约束的因果发现算法在构建图时依赖于条件独立性检验。直到最近,这些算法还无法处理缺失值。在本文中,我们研究了两种替代解决方案:逐个测试删除和多重插补。我们为逐个测试删除下因果结构的可恢复性建立了必要和充分条件,并认为在因果发现的背景下,多重插补比估计更为复杂。我们通过从基准因果图进行模拟进行了广泛的比较:正如人们所料,我们发现逐个测试删除和多重插补都明显优于逐个删除和单重插补。至关重要的是,我们的结果进一步表明,在存在少量正态或离散变量的情况下,多重插补特别有用,但当数据集同时包含两者时,两种方法都不是统一的最佳选择。我们比较的方法包括随机森林插补和一种结合逐个测试删除和多重插补的混合程序。对来自欧洲儿童饮食和生活方式相关疾病的 IDEFICS 队列研究数据的应用作为一个说明性示例。