Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. METHODS: We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. RESULTS: For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. CONCLUSIONS: Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.
背景:在流行病学研究中,缺失数据是不可避免的,这可能导致偏差和精度损失。多重插补(MI)被广泛认为优于完全案例分析(CCA)。然而,与普遍的看法相反,在某些情况下 CCA 比 MI 更可取。
方法:我们提供了在数据不完整时选择分析的指导。我们使用因果图来描述缺失机制,描述了当 CCA 不受缺失数据影响的情况,并在一系列缺失数据情况下,比较 MI 和 CCA 关于偏差和效率的情况。我们说明了在实践中选择适当方法的情况。
结果:对于大多数回归模型,当完整案例的机会不依赖于考虑协变量后的结果时,CCA 给出无偏结果,这包括数据缺失不是随机的情况。因此,存在 CCA 分析无偏而 MI 分析(假设 MAR)有偏的情况。相比之下,MI 与 CCA 不同,对于所有 MAR 情况都是有效的,并且有可能利用不完整案例和辅助变量中的信息来减少偏差和/或提高精度。出于这个原因,在我们的实际数据示例中,MI 比 CCA 更受青睐。
结论:处理缺失数据的方法选择对于结论的有效性至关重要,应基于仔细考虑缺失数据的原因、缺失数据模式和辅助信息的可用性。
Int J Epidemiol. 2019-8-1
BMC Med Res Methodol. 2015-4-7
BMC Med Res Methodol. 2010-1-19
Am J Epidemiol. 2024-8-27
NPJ Parkinsons Dis. 2025-8-29
J Racial Ethn Health Disparities. 2025-8-25
BMJ Glob Health. 2025-8-21
NPJ Digit Med. 2025-7-24
Epidemiol Psychiatr Sci. 2025-7-1
J Clin Epidemiol. 2019-3-13
Emerg Themes Epidemiol. 2017-12-19
Am J Epidemiol. 2018-3-1
Emerg Themes Epidemiol. 2017-8-23
J Clin Epidemiol. 2016-12
BMJ. 2016-1-15