Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 915 Greene Street, Columbia, SC, 29208, USA.
Psychometrika. 2018 Dec;83(4):919-940. doi: 10.1007/s11336-018-9616-y. Epub 2018 Apr 26.
Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.
数据缺失是统计分析中常见的问题。多重插补是一种已应用于无数研究的技术,具有坚实的理论基础。关于多重插补的大多数统计文献都集中在无界连续变量上,对于有界支持的变量则大多采用特定的补救方法。当应用于有界变量时,这些方法可能不太令人满意,因为它们可能会产生误导性的推断。在本文中,我们提出了一种适用于定义在单边界或双边界区间上的分布的灵活分位数插补模型。通过应用具有单边界或双边界范围的变换族来确保插补值的适当支持。模拟研究表明,与竞争方法(如对数正态插补和预测均值匹配)相比,我们的方法能够处理偏度、双峰和异方差,并且具有更好的性能。我们通过分析来自英国千年队列研究的儿童数学发展得分数据来展示所提出的插补程序的应用。我们还展示了我们的方法在一个小型精神科数据集上的特定优势。我们的方法在教育和心理学等多个领域都具有相关性。