使用基于距离的供体选择的迭代热插补法进行多重填补。

Multiple imputation using an iterative hot-deck with distance-based donor selection.

作者信息

Siddique Juned, Belin Thomas R

机构信息

Department of Health Studies, University of Chicago, Chicago, IL 60637, U.S.A.

出版信息

Stat Med. 2008 Jan 15;27(1):83-102. doi: 10.1002/sim.3001.

DOI:10.1002/sim.3001

PMID:17634973

Abstract

Hot-deck imputation offers advantages in reflecting salient features of data distributions in missing-data problems, but previous implementations have lacked the appeal associated with modern Bayesian statistical-computing techniques. We outline a strategy of iterative hot-deck multiple imputation with distance-based donor selection. With distance defined as a monotonic function of the difference in predictive means between cases, donors are chosen with probability inversely proportional to their distance from the donee. This method retains the implementation ease of ad hoc techniques, while incorporating the desirable features of Bayesian approaches. Special cases of our method include nearest-neighbor imputation and a simple random hot-deck. Iterating the procedure provides an analogy to Markov Chain Monte Carlo methods and is intended to mitigate dependence on starting values. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented. We evaluate how different definitions of distance, choices of starting values, the order in which variables are chosen for imputation, and the number of iterations impact inferences. We show that our measure of distance controls the tradeoff between bias and variance of our estimates. We find that inferences from the depression treatment trial are not sensitive to most definitions of distance. In addition, while differences exist between 1 iteration and 10 iterations, there are no meaningful differences between inferences based on 10 iterations and those based on 500 iterations. The choice of starting value did not have an impact on inferences but the order in which the variables were chosen for imputation was significant even after iteration.

摘要

热卡填充法在反映缺失数据问题中数据分布的显著特征方面具有优势，但先前的实现方式缺乏与现代贝叶斯统计计算技术相关的吸引力。我们概述了一种基于距离的供体选择的迭代热卡多重填充策略。将距离定义为病例之间预测均值差异的单调函数，选择供体的概率与其到受者的距离成反比。该方法保留了临时技术的易于实现性，同时融入了贝叶斯方法的理想特征。我们方法的特殊情况包括最近邻填充和简单随机热卡。迭代该过程类似于马尔可夫链蒙特卡罗方法，旨在减轻对初始值的依赖。给出了在纵向抑郁症治疗试验中填充缺失值的结果以及一项模拟研究的结果。我们评估了不同的距离定义、初始值的选择、选择用于填充的变量的顺序以及迭代次数如何影响推断。我们表明，我们的距离度量控制了估计偏差和方差之间的权衡。我们发现，抑郁症治疗试验中的推断对大多数距离定义不敏感。此外，虽然1次迭代和10次迭代之间存在差异，但基于10次迭代的推断与基于500次迭代的推断之间没有显著差异。初始值的选择对推断没有影响，但即使在迭代之后，选择用于填充的变量的顺序也很重要。

相似文献

Multiple imputation using an iterative hot-deck with distance-based donor selection.

Stat Med. 2008 Jan 15;27(1):83-102. doi: 10.1002/sim.3001.

The relationship between hot-deck multiple imputation and weighted likelihood.

Stat Med. 1997;16(1-3):5-19. doi: 10.1002/(sici)1097-0258(19970115)16:1<5::aid-sim469>3.0.co;2-8.

Bayesian Extended Redundancy Analysis: A Bayesian Approach to Component-based Regression with Dimension Reduction.

Multivariate Behav Res. 2020 Jan-Feb;55(1):30-48. doi: 10.1080/00273171.2019.1598837. Epub 2019 Apr 25.

Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques.

Res Social Adm Pharm. 2007 Mar;3(1):1-27. doi: 10.1016/j.sapharm.2006.04.001.

[Markov Chain Monte Carlo Method of multiple imputation for longitudinal data with missing values in the survey of maternal and children health].

Sichuan Da Xue Xue Bao Yi Xue Ban. 2005 May;36(3):422-5.

Multiple imputation in the presence of high-dimensional data.

Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.

The HCUP SID Imputation Project: Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data.

Health Serv Res. 2018 Jun;53(3):1870-1889. doi: 10.1111/1475-6773.12704. Epub 2017 May 4.

A nonparametric multiple imputation approach for missing categorical data.

BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.

Cox regression analysis with missing covariates via nonparametric multiple imputation.

Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.

Multiple imputation by predictive mean matching in cluster-randomized trials.

BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6.

引用本文的文献

Challenge of missing data in observational studies: investigating cross-sectional imputation methods for assessing disease activity in axial spondyloarthritis.

RMD Open. 2025 Feb 20;11(1):e004844. doi: 10.1136/rmdopen-2024-004844.

The impact of misclassifications and outliers on imputation methods.

J Appl Stat. 2024 Mar 5;51(14):2894-2928. doi: 10.1080/02664763.2024.2325969. eCollection 2024.

Hemagglutination Inhibition Antibody Titers as Mediators of Influenza Vaccine Efficacy Against Symptomatic Influenza A(H1N1), A(H3N2), and B/Victoria Virus Infections.

J Infect Dis. 2024 Jul 25;230(1):152-160. doi: 10.1093/infdis/jiae122.

Effect of supplementary private health insurance on out-of-pocket inpatient medical expenditure: evidence from Malaysia.

Health Policy Plan. 2024 Mar 12;39(3):268-280. doi: 10.1093/heapol/czae004.

Usability and feasibility of E-nergEYEze: a blended vision-specific E-health based cognitive behavioral therapy and self-management intervention to reduce fatigue in adults with visual impairment.

BMC Health Serv Res. 2023 Nov 16;23(1):1271. doi: 10.1186/s12913-023-10193-4.

Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking.

Heliyon. 2023 Jun 13;9(6):e17077. doi: 10.1016/j.heliyon.2023.e17077. eCollection 2023 Jun.

A multiple imputation-based sensitivity analysis approach for regression analysis with a missing not at random covariate.

Stat Med. 2023 Jun 30;42(14):2275-2292. doi: 10.1002/sim.9723. Epub 2023 Mar 30.

Imputing Missing Data in Hourly Traffic Counts.

Sensors (Basel). 2022 Dec 15;22(24):9876. doi: 10.3390/s22249876.

Same same, but different: A psychometric examination of three frequently used experimental tasks for cognitive bias assessment in a sample of healthy young adults.

Behav Res Methods. 2023 Apr;55(3):1332-1351. doi: 10.3758/s13428-022-01804-9. Epub 2022 Jun 1.

Self-Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study.

JMIR Public Health Surveill. 2021 Oct 13;7(10):e30824. doi: 10.2196/30824.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用基于距离的供体选择的迭代热插补法进行多重填补。

Multiple imputation using an iterative hot-deck with distance-based donor selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献