在随机缺失下对缺失数据进行多重插补：如果插补模型指定错误，即使相容的插补模型也不足以避免偏差。

Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified.

机构信息

Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK.

Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, London, UK; Medical Research Council Clinical Trials Unit at University College London, University of London, London, UK.

出版信息

J Clin Epidemiol. 2023 Aug;160:100-109. doi: 10.1016/j.jclinepi.2023.06.011. Epub 2023 Jun 19.

DOI:10.1016/j.jclinepi.2023.06.011

PMID:37343895

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7615471/

Abstract

OBJECTIVES

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). Standard (default) MI procedures use simple linear covariate functions in the imputation model. We examine the bias that may be caused by acceptance of this default option and evaluate methods to identify problematic imputation models, providing practical guidance for researchers.

STUDY DESIGN AND SETTING

Using simulation and real data analysis, we investigated how imputation model mis-specification affected MI performance, comparing results with complete records analysis (CRA). We considered scenarios in which imputation model mis-specification occurred because (i) the analysis model was mis-specified or (ii) the relationship between exposure and confounder was mis-specified.

RESULTS

Mis-specification of the relationship between outcome and exposure, or between exposure and confounder, can cause biased CRA and MI estimates (in addition to any bias in the full-data estimate due to analysis model mis-specification). MI by predictive mean matching can mitigate model mis-specification. Methods for examining model mis-specification were effective in identifying mis-specified relationships.

CONCLUSION

When using MI methods that assume data are MAR, compatibility between the analysis and imputation models is necessary, but not sufficient to avoid bias. We propose a step-by-step procedure for identifying and correcting mis-specification of imputation models.

摘要

目的

流行病学研究经常存在缺失数据，通常采用多重插补（MI）来处理。标准（默认）MI 程序在插补模型中使用简单的线性协变量函数。我们研究了接受这种默认选项可能导致的偏差，并评估了识别有问题的插补模型的方法，为研究人员提供了实用的指导。

研究设计和设置

使用模拟和真实数据分析，我们研究了插补模型误设定如何影响 MI 性能，将结果与完整记录分析（CRA）进行比较。我们考虑了以下两种情况：

分析模型误设定；
暴露因素与混杂因素之间的关系误设定。

结果

暴露因素与结局之间，或暴露因素与混杂因素之间关系的误设定，可能导致 CRA 和 MI 估计值出现偏差（除了由于分析模型误设定而导致的全数据估计值中的任何偏差外）。预测均值匹配的 MI 可以减轻模型误设定的影响。用于检查模型误设定的方法可以有效地识别误设定的关系。

结论

当使用假设数据为 MAR 的 MI 方法时，分析和插补模型之间的兼容性是必要的，但不足以避免偏差。我们提出了一种逐步识别和纠正插补模型误设定的程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd80/7615471/0bce2858eb23/EMS192889-f001.jpg

相似文献

Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified.

J Clin Epidemiol. 2023 Aug;160:100-109. doi: 10.1016/j.jclinepi.2023.06.011. Epub 2023 Jun 19.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Multiple imputation for handling missing outcome data when estimating the relative risk.

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

Correction of bias from non-random missing longitudinal data using auxiliary information.

Stat Med. 2010 Mar 15;29(6):671-9. doi: 10.1002/sim.3821.

Appropriate inclusion of interactions was needed to avoid bias in multiple imputation.

J Clin Epidemiol. 2016 Dec;80:107-115. doi: 10.1016/j.jclinepi.2016.07.004. Epub 2016 Jul 19.

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.

BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4.

Missing data in a multi-item instrument were best handled by multiple imputation at the item score level.

J Clin Epidemiol. 2014 Mar;67(3):335-42. doi: 10.1016/j.jclinepi.2013.09.009. Epub 2013 Dec 2.

引用本文的文献

Challenge of missing data in observational studies: investigating cross-sectional imputation methods for assessing disease activity in axial spondyloarthritis.

RMD Open. 2025 Feb 20;11(1):e004844. doi: 10.1136/rmdopen-2024-004844.

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.

Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.

Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.

Effectiveness of early pharmaceutical interventions in symptomatic COVID-19 patients: A randomized clinical trial.

Pak J Med Sci. 2024 May-Jun;40(5):800-810. doi: 10.12669/pjms.40.5.8757.

Handling of outcome missing data dependent on measured or unmeasured background factors in micro-randomized trial: Simulation and application study.

Digit Health. 2024 Apr 30;10:20552076241249631. doi: 10.1177/20552076241249631. eCollection 2024 Jan-Dec.

Categorisation of continuous covariates for stratified randomisation: How should we adjust?

Stat Med. 2024 May 20;43(11):2083-2095. doi: 10.1002/sim.10060. Epub 2024 Mar 15.

本文引用的文献

Missing data: A statistical framework for practice.

Biom J. 2021 Jun;63(5):915-947. doi: 10.1002/bimj.202000196. Epub 2021 Feb 24.

Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework.

J Clin Epidemiol. 2021 Jun;134:79-88. doi: 10.1016/j.jclinepi.2021.01.008. Epub 2021 Feb 2.

Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: A practical guide.

Stat Med. 2020 Sep 20;39(21):2815-2842. doi: 10.1002/sim.8569. Epub 2020 May 17.

Accounting for missing data in statistical analyses: multiple imputation is not always the answer.

Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.

Using simulation studies to evaluate statistical methods.

Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.

Multiple imputation in the presence of non-normal data.

Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.

Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

Stat Methods Med Res. 2018 Jun;27(6):1603-1614. doi: 10.1177/0962280216665872. Epub 2016 Sep 5.

Appropriate inclusion of interactions was needed to avoid bias in multiple imputation.

J Clin Epidemiol. 2016 Dec;80:107-115. doi: 10.1016/j.jclinepi.2016.07.004. Epub 2016 Jul 19.

Tuning multiple imputation by predictive mean matching and local residual draws.

BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.

Multiple imputation using chained equations: Issues and guidance for practice.

Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在随机缺失下对缺失数据进行多重插补：如果插补模型指定错误，即使相容的插补模型也不足以避免偏差。

Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified.

机构信息