Suppr超能文献

一种针对缺失分类数据的非参数多重填补方法。

A nonparametric multiple imputation approach for missing categorical data.

作者信息

Zhou Muhan, He Yulei, Yu Mandi, Hsu Chiu-Hsieh

机构信息

Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, 1295 N. Martin Ave., Tucson, 85724, USA.

Division of Research and Methodology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, 20782, USA.

出版信息

BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.

Abstract

BACKGROUND

Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities.

METHODS

We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented.

RESULTS

The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method.

CONCLUSIONS

We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

摘要

背景

在公共卫生数据中,具有两个以上类别的不完全分类变量很常见。然而,现有的大多数缺失数据方法并未利用来自无应答(缺失)概率的信息。

方法

我们提出一种最近邻多重填补方法,用于对随机缺失的分类结果进行填补,并估计每个类别的比例。用于填补的捐赠集是通过测量每个缺失值与其他非缺失值之间的距离形成的。距离函数基于一个预测得分来计算,该得分来自两个工作模型:一个拟合多项逻辑回归以预测缺失的分类结果(结果模型),另一个拟合逻辑回归以预测缺失概率(缺失模型)。在生成预测得分时,使用加权方案来兼顾两个工作模型的贡献。通过随机选择距离最小的非缺失值之一来填补缺失值。我们进行了一项模拟,以评估所提出方法的性能,并将其与几种替代方法进行比较。还给出了一个实际数据应用。

结果

模拟研究表明,在所提出的方法中,当工作模型存在一些错误设定且缺失概率不过于极端时,该方法表现良好。然而,同样基于两个工作模型构建的校准估计量,在某些观测值的缺失概率极高时可能会非常不稳定。在这种情况下,所提出的方法能产生更稳定且更好的估计。此外,需要选择合适的权重来平衡两个工作模型的贡献,以使所提出的方法获得最优结果。

结论

我们得出结论,所提出的多重填补方法是处理具有两个以上水平的缺失分类结果数据以评估结果分布的一种合理方法。在工作模型的选择方面,我们建议使用多项逻辑回归来预测缺失结果,使用二元逻辑回归来预测缺失概率。

相似文献

10
Doubly robust multiple imputation using kernel-based techniques.使用基于核技术的双重稳健多重填补
Biom J. 2016 May;58(3):588-606. doi: 10.1002/bimj.201400256. Epub 2015 Dec 9.

本文引用的文献

6
A comparison of incomplete-data methods for categorical data.分类数据不完全数据方法的比较
Stat Methods Med Res. 2016 Apr;25(2):754-74. doi: 10.1177/0962280212465502. Epub 2012 Nov 18.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验