• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于距离的供体选择的迭代热插补法进行多重填补。

Multiple imputation using an iterative hot-deck with distance-based donor selection.

作者信息

Siddique Juned, Belin Thomas R

机构信息

Department of Health Studies, University of Chicago, Chicago, IL 60637, U.S.A.

出版信息

Stat Med. 2008 Jan 15;27(1):83-102. doi: 10.1002/sim.3001.

DOI:10.1002/sim.3001
PMID:17634973
Abstract

Hot-deck imputation offers advantages in reflecting salient features of data distributions in missing-data problems, but previous implementations have lacked the appeal associated with modern Bayesian statistical-computing techniques. We outline a strategy of iterative hot-deck multiple imputation with distance-based donor selection. With distance defined as a monotonic function of the difference in predictive means between cases, donors are chosen with probability inversely proportional to their distance from the donee. This method retains the implementation ease of ad hoc techniques, while incorporating the desirable features of Bayesian approaches. Special cases of our method include nearest-neighbor imputation and a simple random hot-deck. Iterating the procedure provides an analogy to Markov Chain Monte Carlo methods and is intended to mitigate dependence on starting values. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented. We evaluate how different definitions of distance, choices of starting values, the order in which variables are chosen for imputation, and the number of iterations impact inferences. We show that our measure of distance controls the tradeoff between bias and variance of our estimates. We find that inferences from the depression treatment trial are not sensitive to most definitions of distance. In addition, while differences exist between 1 iteration and 10 iterations, there are no meaningful differences between inferences based on 10 iterations and those based on 500 iterations. The choice of starting value did not have an impact on inferences but the order in which the variables were chosen for imputation was significant even after iteration.

摘要

热卡填充法在反映缺失数据问题中数据分布的显著特征方面具有优势,但先前的实现方式缺乏与现代贝叶斯统计计算技术相关的吸引力。我们概述了一种基于距离的供体选择的迭代热卡多重填充策略。将距离定义为病例之间预测均值差异的单调函数,选择供体的概率与其到受者的距离成反比。该方法保留了临时技术的易于实现性,同时融入了贝叶斯方法的理想特征。我们方法的特殊情况包括最近邻填充和简单随机热卡。迭代该过程类似于马尔可夫链蒙特卡罗方法,旨在减轻对初始值的依赖。给出了在纵向抑郁症治疗试验中填充缺失值的结果以及一项模拟研究的结果。我们评估了不同的距离定义、初始值的选择、选择用于填充的变量的顺序以及迭代次数如何影响推断。我们表明,我们的距离度量控制了估计偏差和方差之间的权衡。我们发现,抑郁症治疗试验中的推断对大多数距离定义不敏感。此外,虽然1次迭代和10次迭代之间存在差异,但基于10次迭代的推断与基于500次迭代的推断之间没有显著差异。初始值的选择对推断没有影响,但即使在迭代之后,选择用于填充的变量的顺序也很重要。

相似文献

1
Multiple imputation using an iterative hot-deck with distance-based donor selection.使用基于距离的供体选择的迭代热插补法进行多重填补。
Stat Med. 2008 Jan 15;27(1):83-102. doi: 10.1002/sim.3001.
2
The relationship between hot-deck multiple imputation and weighted likelihood.热插补多重填补与加权似然之间的关系。
Stat Med. 1997;16(1-3):5-19. doi: 10.1002/(sici)1097-0258(19970115)16:1<5::aid-sim469>3.0.co;2-8.
3
Bayesian Extended Redundancy Analysis: A Bayesian Approach to Component-based Regression with Dimension Reduction.贝叶斯扩展冗余分析:一种基于组件的降维回归的贝叶斯方法。
Multivariate Behav Res. 2020 Jan-Feb;55(1):30-48. doi: 10.1080/00273171.2019.1598837. Epub 2019 Apr 25.
4
Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques.流行病学研究中心抑郁量表的缺失数据:4种插补技术的比较
Res Social Adm Pharm. 2007 Mar;3(1):1-27. doi: 10.1016/j.sapharm.2006.04.001.
5
[Markov Chain Monte Carlo Method of multiple imputation for longitudinal data with missing values in the survey of maternal and children health].[妇幼健康调查中具有缺失值的纵向数据多重填补的马尔可夫链蒙特卡罗方法]
Sichuan Da Xue Xue Bao Yi Xue Ban. 2005 May;36(3):422-5.
6
Multiple imputation in the presence of high-dimensional data.高维数据情形下的多重填补
Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.
7
The HCUP SID Imputation Project: Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data.HCUP SID 填补项目:通过填补缺失的种族数据来提高健康差异研究的统计推断。
Health Serv Res. 2018 Jun;53(3):1870-1889. doi: 10.1111/1475-6773.12704. Epub 2017 May 4.
8
A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。
BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.
9
Cox regression analysis with missing covariates via nonparametric multiple imputation.Cox 回归分析中缺失协变量的非参数多重插补法。
Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.
10
Multiple imputation by predictive mean matching in cluster-randomized trials.基于预测均数匹配的多重填补在整群随机临床试验中的应用。
BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6.

引用本文的文献

1
Challenge of missing data in observational studies: investigating cross-sectional imputation methods for assessing disease activity in axial spondyloarthritis.观察性研究中缺失数据的挑战:探究用于评估轴性脊柱关节炎疾病活动度的横断面插补方法
RMD Open. 2025 Feb 20;11(1):e004844. doi: 10.1136/rmdopen-2024-004844.
2
The impact of misclassifications and outliers on imputation methods.错误分类和异常值对插补方法的影响。
J Appl Stat. 2024 Mar 5;51(14):2894-2928. doi: 10.1080/02664763.2024.2325969. eCollection 2024.
3
Hemagglutination Inhibition Antibody Titers as Mediators of Influenza Vaccine Efficacy Against Symptomatic Influenza A(H1N1), A(H3N2), and B/Victoria Virus Infections.
血凝抑制抗体滴度作为流感疫苗对甲型流感(H1N1)、甲型流感(H3N2)和乙型/维多利亚系病毒感染所致症状性流感的疗效的中介。
J Infect Dis. 2024 Jul 25;230(1):152-160. doi: 10.1093/infdis/jiae122.
4
Effect of supplementary private health insurance on out-of-pocket inpatient medical expenditure: evidence from Malaysia.补充私人健康保险对住院自付医疗费用的影响:来自马来西亚的证据。
Health Policy Plan. 2024 Mar 12;39(3):268-280. doi: 10.1093/heapol/czae004.
5
Usability and feasibility of E-nergEYEze: a blended vision-specific E-health based cognitive behavioral therapy and self-management intervention to reduce fatigue in adults with visual impairment.E-nergEYEze 的可用性和可行性:一种混合的基于视力的特定电子健康的认知行为疗法和自我管理干预,以减少视力障碍成年人的疲劳。
BMC Health Serv Res. 2023 Nov 16;23(1):1271. doi: 10.1186/s12913-023-10193-4.
6
Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking.通过后验预测检验评估多重填补模型的图形和数值诊断工具。
Heliyon. 2023 Jun 13;9(6):e17077. doi: 10.1016/j.heliyon.2023.e17077. eCollection 2023 Jun.
7
A multiple imputation-based sensitivity analysis approach for regression analysis with a missing not at random covariate.基于多重插补的灵敏度分析方法,用于分析具有非随机缺失协变量的回归分析。
Stat Med. 2023 Jun 30;42(14):2275-2292. doi: 10.1002/sim.9723. Epub 2023 Mar 30.
8
Imputing Missing Data in Hourly Traffic Counts.缺失的小时交通流量数据插补。
Sensors (Basel). 2022 Dec 15;22(24):9876. doi: 10.3390/s22249876.
9
Same same, but different: A psychometric examination of three frequently used experimental tasks for cognitive bias assessment in a sample of healthy young adults.大同小异:在健康年轻成年人样本中,三种常用于认知偏差评估的实验任务的心理计量学检验。
Behav Res Methods. 2023 Apr;55(3):1332-1351. doi: 10.3758/s13428-022-01804-9. Epub 2022 Jun 1.
10
Self-Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study.基于分位数误差的自训练在电子病历回归问题中对多变量缺失数据插补的应用:算法开发研究。
JMIR Public Health Surveill. 2021 Oct 13;7(10):e30824. doi: 10.2196/30824.