缺失数据的引导多重填补：利用子样本来强化随机缺失假设。

Guided multiple imputation of missing data: using a subsample to strengthen the missing-at-random assumption.

作者信息

Fraser Gary, Yan Ru

机构信息

Loma Linda University, Loma Linda, California, USA.

出版信息

Epidemiology. 2007 Mar;18(2):246-52. doi: 10.1097/01.ede.0000254708.40228.8b.

DOI:10.1097/01.ede.0000254708.40228.8b

PMID:17259903

Abstract

Multiple imputation can be a good solution to handling missing data if data are missing at random. However, this assumption is often difficult to verify. We describe an application of multiple imputation that makes this assumption plausible. This procedure requires contacting a random sample of subjects with incomplete data to fill in the missing information, and then adjusting the imputation model to incorporate the new data. Simulations with missing data that were decidedly not missing at random showed, as expected, that the method restored the original beta coefficients, whereas other methods of dealing with missing data failed. Using a dataset with real missing data, we found that different approaches to imputation produced moderately different results. Simulations suggest that filling in 10% of data that was initially missing is sufficient for imputation in many epidemiologic applications, and should produce approximately unbiased results, provided there is a high response on follow-up from the subsample of those with some originally missing data. This response can probably be achieved if this data collection is planned as an initial approach to dealing with the missing data, rather than at later stages, after further attempts that leave only data that is very difficult to complete.

摘要

如果数据是随机缺失的，多重填补可能是处理缺失数据的一个好方法。然而，这一假设往往很难验证。我们描述了一种多重填补的应用，它使这一假设变得合理。该过程需要联系一个随机抽取的、数据不完整的研究对象样本以填补缺失信息，然后调整填补模型以纳入新数据。对明显不是随机缺失的缺失数据进行模拟，不出所料，结果显示该方法恢复了原始的β系数，而其他处理缺失数据的方法则失败了。使用一个存在实际缺失数据的数据集，我们发现不同的填补方法产生的结果略有不同。模拟表明，在许多流行病学应用中，填补10%最初缺失的数据就足以进行填补，并且如果对那些最初有一些缺失数据的子样本的随访有较高的应答率，应该会产生大致无偏的结果。如果将这种数据收集计划作为处理缺失数据的初始方法，而不是在后续阶段，即在经过进一步尝试后只剩下非常难以完成的数据时进行，那么可能会实现这种应答率。

相似文献

Guided multiple imputation of missing data: using a subsample to strengthen the missing-at-random assumption.

Epidemiology. 2007 Mar;18(2):246-52. doi: 10.1097/01.ede.0000254708.40228.8b.

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods.

BMC Med Res Methodol. 2006 Dec 13;6:57. doi: 10.1186/1471-2288-6-57.

Missing data in the American College of Surgeons National Surgical Quality Improvement Program are not missing at random: implications and potential impact on quality assessments.

J Am Coll Surg. 2010 Feb;210(2):125-139.e2. doi: 10.1016/j.jamcollsurg.2009.10.021.

[Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method].

Rev Epidemiol Sante Publique. 2009 Oct;57(5):361-72. doi: 10.1016/j.respe.2009.04.011. Epub 2009 Aug 11.

Methods for handling missing data in palliative care research.

Palliat Med. 2006 Dec;20(8):791-8. doi: 10.1177/0269216306072555.

Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques.

Res Social Adm Pharm. 2007 Mar;3(1):1-27. doi: 10.1016/j.sapharm.2006.04.001.

[Roaming through methodology. XVI. What to do about missing data].

Ned Tijdschr Geneeskd. 1999 Oct 2;143(40):1996-2000.

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes.

Int J Epidemiol. 2005 Feb;34(1):89-99. doi: 10.1093/ije/dyh297. Epub 2004 Aug 27.

Sequential imputation for missing values.

Comput Biol Chem. 2007 Oct;31(5-6):320-7. doi: 10.1016/j.compbiolchem.2007.07.001. Epub 2007 Jul 10.

Using the outcome for imputation of missing predictor values was preferred.

J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.

引用本文的文献

Longitudinal associations between vegetarian dietary habits and site-specific cancers in the Adventist Health Study-2 North American cohort.

Am J Clin Nutr. 2025 Aug;122(2):535-543. doi: 10.1016/j.ajcnut.2025.06.006. Epub 2025 Jun 9.

The association between time spent outdoors during daylight and mortality among participants of the Adventist Health Study 2 Cohort.

Environ Epidemiol. 2025 May 28;9(3):e401. doi: 10.1097/EE9.0000000000000401. eCollection 2025 Jun.

Living longer and lifestyle: A report on the oldest of the old in the Adventist Health Study-2.

JAR Life. 2025 Apr 11;14:100010. doi: 10.1016/j.jarlif.2025.100010. eCollection 2025.

Maternal Folic Acid and Dietary Folate Intake in Relation to Sex Ratio at Birth and Sex-Specific Birth Weight in China.

Nutrients. 2024 Sep 16;16(18):3122. doi: 10.3390/nu16183122.

The impact of green space on nonaccidental and cause-specific mortality in the Adventist Health Study-2 population.

Environ Epidemiol. 2024 Aug 14;8(5):e332. doi: 10.1097/EE9.0000000000000332. eCollection 2024 Oct.

Cause-specific and all-cause mortalities in vegetarian compared with those in nonvegetarian participants from the Adventist Health Study-2 cohort.

Am J Clin Nutr. 2024 Oct;120(4):907-917. doi: 10.1016/j.ajcnut.2024.07.028. Epub 2024 Aug 2.

Synergistic effect of non-alcoholic fatty liver disease and history of gestational diabetes to increase risk of type 2 diabetes.

Eur J Epidemiol. 2023 Aug;38(8):901-911. doi: 10.1007/s10654-023-01016-1. Epub 2023 May 31.

The benefit of vegetarian diets for reducing blood pressure in Taiwan: a historically prospective cohort study.

J Health Popul Nutr. 2023 May 9;42(1):41. doi: 10.1186/s41043-023-00377-3.

Interpersonal psychotherapy versus sertraline for women with posttraumatic stress disorder following recent sexual assault: a randomized clinical trial.

Eur J Psychotraumatol. 2022 Oct 14;13(2):2127474. doi: 10.1080/20008066.2022.2127474. eCollection 2022.

Are total omega-3 and omega-6 polyunsaturated fatty acids predictors of fatal stroke in the Adventist Health Study 2 prospective cohort?

PLoS One. 2022 Sep 9;17(9):e0274109. doi: 10.1371/journal.pone.0274109. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

缺失数据的引导多重填补：利用子样本来强化随机缺失假设。

Guided multiple imputation of missing data: using a subsample to strengthen the missing-at-random assumption.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献