文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

选择缺失数据多重插补模型:只用信息准则(IC)!

Selecting the model for multiple imputation of missing data: Just use an IC!

机构信息

Discipline of Biomedical Informatics and Digital Health, The University of Sydney, Sydney, New South Wales, Australia.

School of Mathematics and Statistics, The University of New South Wales, Sydney, New South Wales, Australia.

出版信息

Stat Med. 2021 May 10;40(10):2467-2497. doi: 10.1002/sim.8915. Epub 2021 Feb 24.


DOI:10.1002/sim.8915
PMID:33629367
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8248419/
Abstract

Multiple imputation and maximum likelihood estimation (via the expectation-maximization algorithm) are two well-known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation-maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood-based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.

摘要

多重插补和最大似然估计(通过期望最大化算法)是两种常用于分析含有缺失值数据的知名方法。虽然这两种方法通常被认为彼此不同,但多重插补(当使用不当的插补时)实际上相当于对似然的随机期望最大化逼近。在本文中,我们利用这一关键结果表明,常见的基于似然的模型选择方法,如赤池信息量准则(AIC)和贝叶斯信息量准则(BIC),可用于选择最适合观察数据的插补模型。已知插补模型选择不当会导致推断偏差,尽管敏感性分析常用于探索不同插补模型的影响,但我们表明可以通过传统的模型选择工具利用数据来选择适当的插补模型。我们表明,在存在缺失数据的情况下,BIC 可以一致地选择正确的插补模型。我们通过模拟研究验证了这些结果,并在两个经典的缺失数据示例上演示了其实用性。我们在模拟中看到的一个有趣结果是,不仅参数估计会因指定错误的插补模型而产生偏差,还会因过度拟合插补模型而产生偏差。这强调了使用模型选择不仅要选择适当的插补模型类型,还要决定插补模型复杂度的适当水平的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/86e0f877f689/SIM-40-2467-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/773b68c5451e/SIM-40-2467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/9f8c022b3487/SIM-40-2467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/a55bb104dfe2/SIM-40-2467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/53bfcdf1099f/SIM-40-2467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/725deedbe8bf/SIM-40-2467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/09723f8aa467/SIM-40-2467-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/86e0f877f689/SIM-40-2467-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/773b68c5451e/SIM-40-2467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/9f8c022b3487/SIM-40-2467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/a55bb104dfe2/SIM-40-2467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/53bfcdf1099f/SIM-40-2467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/725deedbe8bf/SIM-40-2467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/09723f8aa467/SIM-40-2467-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00f0/8248419/86e0f877f689/SIM-40-2467-g006.jpg

相似文献

[1]
Selecting the model for multiple imputation of missing data: Just use an IC!

Stat Med. 2021-5-10

[2]
Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model.

PLoS One. 2016-8-23

[3]
Variable selection with incomplete covariate data.

Biometrics. 2008-12

[4]
Model selection in occupancy models: Inference versus prediction.

Ecology. 2023-3

[5]
Imputation methods to improve inference in SNP association studies.

Genet Epidemiol. 2006-12

[6]
SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.

Am J Epidemiol. 2022-2-19

[7]
Multiple imputation with sequential penalized regression.

Stat Methods Med Res. 2018-2-16

[8]
Information criteria for Firth's penalized partial likelihood approach in Cox regression models.

Stat Med. 2017-9-20

[9]
Empirical evaluation of scoring functions for Bayesian network model selection.

BMC Bioinformatics. 2012-9-11

[10]
Latent class based multiple imputation approach for missing categorical data.

J Stat Plan Inference. 2010-11

引用本文的文献

[1]
Enhancing representativeness in population-based surveys to improve data quality and decision-making.

Sci Rep. 2025-8-27

[2]
Association between lactate-to-albumin ratio and shortand long-term mortality in critically ill patients with ischemic stroke: A retrospective analysis of the MIMIC-IV database.

J Med Biochem. 2025-6-13

[3]
Association of anaesthesia type with one-year mortality after surgery in elderly patients: a secondary retrospective cohort study.

BMC Anesthesiol. 2025-7-1

[4]
Multiple imputation for systematically missing effect modifiers in individual participant data meta-analysis.

Stat Methods Med Res. 2025-8

[5]
Incorporation of missing indicator with multiple imputation in propensity score analysis with partially observed covariates: A simulation study.

Stat Methods Med Res. 2025-7

[6]
Competing risk nomogram for predicting cancer-specific survival in patients with primary bone diffuse large B-cell lymphoma: a SEER-based retrospective study.

Front Med (Lausanne). 2025-5-12

[7]
Development and validation of a risk prediction model for autologous arteriovenous fistula thrombosis in patients receiving maintenance hemodialysis.

Ren Fail. 2025-12

[8]
Association of diurnal temperature range and childhood asthma: a population-based cross-sectional study in a Tropical City, China.

BMC Public Health. 2025-4-7

[9]
Unraveling the link between physical activity and cognitive function: the mediating impact of depressive symptoms.

BMC Public Health. 2025-4-3

[10]
Methods for diagnosing malnutrition in patients with esophageal cancer, and the association with nutritional and inflammatory indices: A cross‑sectional study.

Oncol Lett. 2025-3-5

本文引用的文献

[1]
Fractional Brownian motion and multivariate-t models for longitudinal biomedical data, with application to CD4 counts in HIV-positive patients.

Stat Med. 2016-4-30

[2]
The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data.

Biom J. 2015-7

[3]
Prognosis of patients with HIV-1 infection starting antiretroviral therapy in sub-Saharan Africa: a collaborative analysis of scale-up programmes.

Lancet. 2010-7-15

[4]
Model Selection Criteria for Missing-Data Problems Using the EM Algorithm.

J Am Stat Assoc. 2008-12-1

[5]
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.

BMJ. 2009-6-29

[6]
Imputing missing covariate values for the Cox model.

Stat Med. 2009-7-10

[7]
Estimating HIV incidence in the United States from HIV/AIDS surveillance data and biomarker HIV test results.

Stat Med. 2008-10-15

[8]
Missing data analysis: making it work in the real world.

Annu Rev Psychol. 2009

[9]
How should variable selection be performed with multiply imputed data?

Stat Med. 2008-7-30

[10]
Sensitivity analysis after multiple imputation under missing at random: a weighting approach.

Stat Methods Med Res. 2007-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索