在不完全分类变量的多重填补中避免因完美预测导致的偏差。

Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables.

作者信息

White Ian R, Daniel Rhian, Royston Patrick

机构信息

MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, Cambridge, UK.

Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, UK.

出版信息

Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.

DOI:10.1016/j.csda.2010.04.005

PMID:24748700

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3990447/

Abstract

Multiple imputation is a popular way to handle missing data. Automated procedures are widely available in standard software. However, such automated procedures may hide many assumptions and possible difficulties from the view of the data analyst. Imputation procedures such as monotone imputation and imputation by chained equations often involve the fitting of a regression model for a categorical outcome. If perfect prediction occurs in such a model, then automated procedures may give severely biased results. This is a problem in some standard software, but it may be avoided by bootstrap methods, penalised regression methods, or a new augmentation procedure.

摘要

多重填补是处理缺失数据的一种常用方法。标准软件中广泛提供了自动化程序。然而，从数据分析人员的角度来看，此类自动化程序可能会掩盖许多假设和潜在困难。诸如单调填补和链式方程填补等填补程序通常涉及对分类结果拟合回归模型。如果在这样的模型中出现完美预测，那么自动化程序可能会给出严重有偏差的结果。这在一些标准软件中是个问题，但可以通过自助法、惩罚回归方法或一种新的扩充程序来避免。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00df/3990447/b2cc4a05c900/fx1.jpg

相似文献

Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables.在不完全分类变量的多重填补中避免因完美预测导致的偏差。

Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.

Practical strategies for handling breakdown of multiple imputation procedures.处理多重填补程序故障的实用策略。

Emerg Themes Epidemiol. 2021 Apr 1;18(1):5. doi: 10.1186/s12982-021-00095-3.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

Multiple imputation by chained equations for systematically and sporadically missing multilevel data.多水平数据系统缺失和随机缺失的链方程多重插补法。

Stat Methods Med Res. 2018 Jun;27(6):1634-1649. doi: 10.1177/0962280216666564. Epub 2016 Sep 19.

A comparison of incomplete-data methods for categorical data.分类数据不完全数据方法的比较

Stat Methods Med Res. 2016 Apr;25(2):754-74. doi: 10.1177/0962280212465502. Epub 2012 Nov 18.

SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.超级小鼠：一种基于链式方程的多重填补集成机器学习方法。

Am J Epidemiol. 2022 Feb 19;191(3):516-525. doi: 10.1093/aje/kwab271.

Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。

Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

[Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method].[随机缺失数据的多重填补：一般要点及一种蒙特卡罗方法的介绍]

Rev Epidemiol Sante Publique. 2009 Oct;57(5):361-72. doi: 10.1016/j.respe.2009.04.011. Epub 2009 Aug 11.

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry.不同插补策略下预测模型性能：使用北美 ICU 登记处的模拟研究。

Pediatr Crit Care Med. 2022 Jan 1;23(1):e29-e44. doi: 10.1097/PCC.0000000000002835.

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择：一项模拟研究。

J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.

引用本文的文献

Enhancing Imputation Accuracy for Catch-All Missing Data Mechanisms with DFBETAS and Leverage.利用DFBETAS和杠杆率提高通用缺失数据机制的插补准确性。

Res Stat (Phila). 2025;3(1). doi: 10.1080/27684520.2025.2451682. Epub 2025 Feb 3.

Comparison of imputation methods for univariate categorical longitudinal data.单变量分类纵向数据插补方法的比较

Qual Quant. 2025;59(2):1767-1791. doi: 10.1007/s11135-024-02028-z. Epub 2024 Dec 26.

Optimization of school physical education schedules to enhance long-term public health outcomes.优化学校体育课程安排以提高长期公共卫生效益。

Front Public Health. 2025 Feb 19;13:1548056. doi: 10.3389/fpubh.2025.1548056. eCollection 2025.

Analyzing Coarsened and Missing Data by Imputation Methods.通过插补方法分析粗化和缺失数据。

Stat Med. 2025 Mar 15;44(6):e70032. doi: 10.1002/sim.70032.

Analysis and prediction of atmospheric ozone concentrations using machine learning.利用机器学习对大气臭氧浓度进行分析与预测。

Front Big Data. 2025 Jan 15;7:1469809. doi: 10.3389/fdata.2024.1469809. eCollection 2024.

Imputation methods for mixed datasets in bioarchaeology.生物考古学中混合数据集的插补方法。

Archaeol Anthropol Sci. 2024;16(11):187. doi: 10.1007/s12520-024-02078-2. Epub 2024 Oct 23.

Racial and ethnic differences in comorbid psychosis: a population-based study.共病性精神病中的种族和民族差异：一项基于人群的研究。

Front Psychiatry. 2024 Jul 29;15:1280253. doi: 10.3389/fpsyt.2024.1280253. eCollection 2024.

Older men and loneliness: a cross-sectional study of sex differences in the English Longitudinal Study of Ageing.老年人与孤独：英国老龄化纵向研究中性别差异的横断面研究。

BMC Public Health. 2024 Feb 2;24(1):354. doi: 10.1186/s12889-024-17892-5.

A comparison of strategies for selecting auxiliary variables for multiple imputation.辅助变量选择策略在多重插补中的比较。

Biom J. 2024 Jan;66(1):e2200291. doi: 10.1002/bimj.202200291.

The Association of Emotional Support, HIV Stigma, and Home Environment With Disclosure Efficacy and Perceived Disclosure Outcomes in Young People Living With HIV in Zambia: A Cross-Sectional Study.赞比亚艾滋病毒感染者青少年的情感支持、艾滋病毒污名、家庭环境与披露效能和感知披露结果的关联：一项横断面研究。

J Assoc Nurses AIDS Care. 2024;35(1):17-26. doi: 10.1097/JNC.0000000000000442. Epub 2023 Nov 22.

本文引用的文献

Multiple imputation of discrete and continuous data by fully conditional specification.通过完全条件设定对离散和连续数据进行多重填补

Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.

Robustness of a multivariate normal approximation for imputation of incomplete binary data.用于不完全二元数据插补的多元正态近似的稳健性。

Stat Med. 2007 Mar 15;26(6):1368-82. doi: 10.1002/sim.2619.

Confidence intervals for multinomial logistic regression in sparse data.稀疏数据中多项逻辑回归的置信区间

Stat Med. 2007 Feb 20;26(4):903-18. doi: 10.1002/sim.2518.

What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data.对空无内容的数据该补充些什么？稀疏数据荟萃分析中连续性校正的使用与规避

Stat Med. 2004 May 15;23(9):1351-75. doi: 10.1002/sim.1761.

A solution to the problem of separation in logistic regression.逻辑回归中分离问题的一种解决方案。

Stat Med. 2002 Aug 30;21(16):2409-19. doi: 10.1002/sim.1047.

A comparison of inclusive and restrictive strategies in modern missing data procedures.现代缺失数据处理中包容性策略与限制性策略的比较。

Psychol Methods. 2001 Dec;6(4):330-51.

Multiple imputation in public health research.公共卫生研究中的多重填补

Stat Med. 2001;20(9-10):1541-9. doi: 10.1002/sim.689.

Multiple imputation of missing blood pressure covariates in survival analysis.生存分析中缺失血压协变量的多重填补

Stat Med. 1999 Mar 30;18(6):681-94. doi: 10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在不完全分类变量的多重填补中避免因完美预测导致的偏差。

Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献