文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

在统计分析中处理缺失数据:多重插补并不总是答案。

Accounting for missing data in statistical analyses: multiple imputation is not always the answer.

机构信息

Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.

MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.

出版信息

Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.


DOI:10.1093/ije/dyz032
PMID:30879056
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6693809/
Abstract

BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. METHODS: We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. RESULTS: For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. CONCLUSIONS: Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.

摘要

背景:在流行病学研究中,缺失数据是不可避免的,这可能导致偏差和精度损失。多重插补(MI)被广泛认为优于完全案例分析(CCA)。然而,与普遍的看法相反,在某些情况下 CCA 比 MI 更可取。

方法:我们提供了在数据不完整时选择分析的指导。我们使用因果图来描述缺失机制,描述了当 CCA 不受缺失数据影响的情况,并在一系列缺失数据情况下,比较 MI 和 CCA 关于偏差和效率的情况。我们说明了在实践中选择适当方法的情况。

结果:对于大多数回归模型,当完整案例的机会不依赖于考虑协变量后的结果时,CCA 给出无偏结果,这包括数据缺失不是随机的情况。因此,存在 CCA 分析无偏而 MI 分析(假设 MAR)有偏的情况。相比之下,MI 与 CCA 不同,对于所有 MAR 情况都是有效的,并且有可能利用不完整案例和辅助变量中的信息来减少偏差和/或提高精度。出于这个原因,在我们的实际数据示例中,MI 比 CCA 更受青睐。

结论:处理缺失数据的方法选择对于结论的有效性至关重要,应基于仔细考虑缺失数据的原因、缺失数据模式和辅助信息的可用性。

相似文献

[1]
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.

Int J Epidemiol. 2019-8-1

[2]
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.

BMC Med Res Methodol. 2024-11-13

[3]
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

BMC Med Res Methodol. 2024-10-7

[4]
Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.

BMC Med Res Methodol. 2022-4-3

[5]
Comparison of methods to handle missing values in a continuous index test in a diagnostic accuracy study - a simulation study.

BMC Med Res Methodol. 2025-5-27

[6]
The rise of multiple imputation: a review of the reporting and implementation of the method in medical research.

BMC Med Res Methodol. 2015-4-7

[7]
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer.

Br J Cancer. 2011-1-25

[8]
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.

Biom J. 2020-7

[9]
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

BMC Med Res Methodol. 2010-1-19

[10]
Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.

Am J Epidemiol. 2024-8-27

引用本文的文献

[1]
Systematic review of prognostic models in Parkinson's disease.

NPJ Parkinsons Dis. 2025-8-29

[2]
Cognition and Depression in American Indian Elders with Prescribed Opioid Use: Data from the Strong Heart Study.

J Racial Ethn Health Disparities. 2025-8-25

[3]
Socioeconomic inequalities in infant mortality in Colombia: a nationwide cohort study during 10 years.

BMJ Glob Health. 2025-8-21

[4]
The ILIA study: protocol for a randomized-controlled multicenter clinical trial on smartphone- and web-based relapse monitoring for patients with schizophrenia or schizoaffective disorder.

Eur Arch Psychiatry Clin Neurosci. 2025-8-19

[5]
Bias and Efficiency Comparison between Multiple Imputation and Available-Case Analysis for Missing Data in Longitudinal Models.

Stat Biosci. 2025-6-12

[6]
Does concern regarding climate change impact subsequent mental health? A longitudinal analysis using data from the Avon Longitudinal Study of Parents and Children (ALSPAC).

R Soc Open Sci. 2025-8-6

[7]
Are Changes in Thigh Muscle Concentric Strength Associated With Changes in Leg Function After a Youth Sport-Related Knee Injury?

Sports Health. 2025-7-30

[8]
Predicting outcomes following endovascular aortoiliac revascularization using machine learning.

NPJ Digit Med. 2025-7-24

[9]
Intensity of perinatal care for extreme preterm births and neurodevelopmental outcomes at age 5½: the EPIPAGE-2 cohort study.

BMJ Paediatr Open. 2025-7-5

[10]
Experience of financial hardship and depression: a longitudinal population-based multi-state analysis.

Epidemiol Psychiatr Sci. 2025-7-1

本文引用的文献

[1]
The proportion of missing data should not be used to guide decisions on multiple imputation.

J Clin Epidemiol. 2019-3-13

[2]
On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice.

Stat Med. 2018-4-2

[3]
Reference-based sensitivity analysis via multiple imputation for longitudinal trials with protocol deviation.

Stata J. 2016-4

[4]
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.

Emerg Themes Epidemiol. 2017-12-19

[5]
Principled Approaches to Missing Data in Epidemiologic Studies.

Am J Epidemiol. 2018-3-1

[6]
Model checking in multiple imputation: an overview and case study.

Emerg Themes Epidemiol. 2017-8-23

[7]
Appropriate inclusion of interactions was needed to avoid bias in multiple imputation.

J Clin Epidemiol. 2016-12

[8]
Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random.

Stat Methods Med Res. 2018-2

[9]
A multiple imputation approach for MNAR mechanisms compatible with Heckman's model.

Stat Med. 2016-7-30

[10]
Inverse probability weighting.

BMJ. 2016-1-15

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索