Nabi Razieh, Bhattacharya Rohit, Shpitser Ilya
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
Proc Mach Learn Res. 2020 Jul;119:7153-7163.
Missing data has the potential to affect analyses conducted in all fields of scientific study including healthcare, economics, and the social sciences. Several approaches to unbiased inference in the presence of non-ignorable missingness rely on the specification of the target distribution and its missingness process as a probability distribution that factorizes with respect to a directed acyclic graph. In this paper, we address the longstanding question of the characterization of models that are identifiable within this class of missing data distributions. We provide the first completeness result in this field of study - necessary and sufficient graphical conditions under which, the full data distribution can be recovered from the observed data distribution. We then simultaneously address issues that may arise due to the presence of both missing data and unmeasured confounding, by extending these graphical conditions and proofs of completeness, to settings where some variables are not just missing, but completely unobserved.
缺失数据有可能影响包括医疗保健、经济学和社会科学在内的所有科学研究领域所进行的分析。在存在不可忽略的缺失性的情况下,几种无偏推断方法依赖于将目标分布及其缺失过程指定为相对于有向无环图可分解的概率分布。在本文中,我们解决了在这类缺失数据分布中可识别模型的表征这一长期存在的问题。我们在该研究领域给出了首个完备性结果——充分必要的图形条件,在这些条件下,可以从观测数据分布中恢复完整数据分布。然后,我们通过将这些图形条件和完备性证明扩展到某些变量不仅缺失而且完全未被观测到的情形,同时解决由于存在缺失数据和未测量的混杂因素可能出现的问题。