有界变量的多重插补。

Multiple Imputation for Bounded Variables.

机构信息

Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 915 Greene Street, Columbia, SC, 29208, USA.

出版信息

Psychometrika. 2018 Dec;83(4):919-940. doi: 10.1007/s11336-018-9616-y. Epub 2018 Apr 26.

DOI:10.1007/s11336-018-9616-y

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6662738/

Abstract

Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.

摘要

数据缺失是统计分析中常见的问题。多重插补是一种已应用于无数研究的技术，具有坚实的理论基础。关于多重插补的大多数统计文献都集中在无界连续变量上，对于有界支持的变量则大多采用特定的补救方法。当应用于有界变量时，这些方法可能不太令人满意，因为它们可能会产生误导性的推断。在本文中，我们提出了一种适用于定义在单边界或双边界区间上的分布的灵活分位数插补模型。通过应用具有单边界或双边界范围的变换族来确保插补值的适当支持。模拟研究表明，与竞争方法（如对数正态插补和预测均值匹配）相比，我们的方法能够处理偏度、双峰和异方差，并且具有更好的性能。我们通过分析来自英国千年队列研究的儿童数学发展得分数据来展示所提出的插补程序的应用。我们还展示了我们的方法在一个小型精神科数据集上的特定优势。我们的方法在教育和心理学等多个领域都具有相关性。

相似文献

1

Multiple Imputation for Bounded Variables.有界变量的多重插补。

Psychometrika. 2018 Dec;83(4):919-940. doi: 10.1007/s11336-018-9616-y. Epub 2018 Apr 26.

2

Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques.流行病学研究中心抑郁量表的缺失数据：4种插补技术的比较

Res Social Adm Pharm. 2007 Mar;3(1):1-27. doi: 10.1016/j.sapharm.2006.04.001.

3

Multiple imputation in the presence of non-normal data.非正态数据情况下的多重填补

Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.

4

Comparison of methods for imputing limited-range variables: a simulation study.有限范围变量插补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2014 Apr 26;14:57. doi: 10.1186/1471-2288-14-57.

5

Multiple imputation strategies for a bounded outcome variable in a competing risks analysis.在竞争风险分析中，对有界结局变量的多重插补策略。

Stat Med. 2021 Apr 15;40(8):1917-1929. doi: 10.1002/sim.8879. Epub 2021 Jan 19.

6

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods.处理多问题抑郁量表中的缺失数据：插补方法比较

BMC Med Res Methodol. 2006 Dec 13;6:57. doi: 10.1186/1471-2288-6-57.

7

A bias-corrected estimator in multiple imputation for missing data.一种用于缺失数据多重插补的偏差校正估计器。

Stat Med. 2018 Oct 15;37(23):3373-3386. doi: 10.1002/sim.7833. Epub 2018 May 29.

8

Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal.半参数 Cox 模型中连续协变量的非线性多重插补：在塞内加尔 HIV 数据中的应用。

Stat Med. 2013 Nov 20;32(26):4651-65. doi: 10.1002/sim.5854. Epub 2013 May 28.

9

Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study.使用柯尔莫哥洛夫-斯米尔诺夫检验诊断插补模型中的问题：一项模拟研究。

BMC Med Res Methodol. 2013 Nov 20;13:144. doi: 10.1186/1471-2288-13-144.

10

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时，用于处理纵向数据中缺失值的多种多重填补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

引用本文的文献

1

Comparison performance of the Bayesian Approach with the Weibull and Birnbaum-Saunders distributions in imputation of time-to-event censors.贝叶斯方法与威布尔和 Birnbaum-Saunders 分布在时间事件删失插补中的比较性能。

PLoS One. 2024 Jan 22;19(1):e0295977. doi: 10.1371/journal.pone.0295977. eCollection 2024.

2

Survival analysis of recurrent breast cancer patients using mix Bayesian network.使用混合贝叶斯网络对复发性乳腺癌患者进行生存分析。

Heliyon. 2023 Sep 21;9(10):e20360. doi: 10.1016/j.heliyon.2023.e20360. eCollection 2023 Oct.

3

Measurement error and misclassification in electronic medical records: methods to mitigate bias.电子病历中的测量误差和错误分类：减轻偏差的方法。

Curr Epidemiol Rep. 2018 Dec;5(4):343-356. doi: 10.1007/s40471-018-0164-x. Epub 2018 Sep 10.

4

Are spatial models advantageous for predicting county-level HIV epidemiology across the United States?空间模型对于预测美国县级艾滋病病毒流行病学情况是否具有优势？

Spat Spatiotemporal Epidemiol. 2021 Aug;38:100436. doi: 10.1016/j.sste.2021.100436. Epub 2021 Jun 16.

本文引用的文献

1

CDF-quantile distributions for modelling random variables on the unit interval.用于在单位区间上对随机变量进行建模的累积分布函数（CDF）分位数分布。

Br J Math Stat Psychol. 2017 Nov;70(3):412-438. doi: 10.1111/bmsp.12091. Epub 2017 Mar 17.

2

Multiple imputation in the presence of non-normal data.非正态数据情况下的多重填补

Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.

3

Tuning multiple imputation by predictive mean matching and local residual draws.通过预测均值匹配和局部残差抽样调整多重填补法。

BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.

4

Comparison of methods for imputing limited-range variables: a simulation study.有限范围变量插补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2014 Apr 26;14:57. doi: 10.1186/1471-2288-14-57.

5

Estimation of regression quantiles in complex surveys with data missing at random: An application to birthweight determinants.在数据随机缺失的复杂调查中回归分位数的估计：出生体重决定因素的应用

Stat Methods Med Res. 2016 Aug;25(4):1393-421. doi: 10.1177/0962280213484401. Epub 2013 Apr 23.

6

Multiple imputation using chained equations: Issues and guidance for practice.使用链式方程进行多重插补：实践中的问题和指导。

Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.

7

Multiple imputation under the generalized lambda distribution.广义 lambda 分布下的多重填补

J Biopharm Stat. 2009;19(1):77-89. doi: 10.1080/10543400802527882.

8

The Millennium Cohort Study.千禧队列研究。

Popul Trends. 2002 Spring(107):30-4.

9

Imipramine: clinical effects and pharmacokinetic variability.丙咪嗪：临床疗效与药代动力学变异性

Psychopharmacology (Berl). 1977 Nov 15;54(3):263-72. doi: 10.1007/BF00426574.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验