Institute of Health Statistics and Intelligent Analysis, School of Public Health, Lanzhou University, Lanzhou, Gansu, P. R. China.
Department of Epidemiology and Health Statistics, School of Public Health, Lanzhou University, Lanzhou, Gansu, P. R. China.
Proc Inst Mech Eng H. 2023 Dec;237(12):1427-1440. doi: 10.1177/09544119231206456. Epub 2023 Oct 24.
Missing values often affect the data utilization in epidemiological survey. In this study, according to the cut-off point value of the medical diagnostic standard of fasting blood glucose for diabetes, we divide fasting blood glucose test data from the China Health and Nutrition Survey (CHNS) of Shandong province in 2009 into two classes: the normal and the abnormal. Accordingly, for missing fasting blood glucose values, we propose a two-stage prediction filling method with optimized support vector technologies competitively by particle swarm optimization (PSO) or grey wolf optimizer (GWO), which is to first predict the class of the missing data with support vector machine (SVM) in the first stage and then predict the missing value with support vector regression (SVR) within the predicted class in the second stage. In addition, we use the LIBSVM as a gold standard to train both SVM and SVR in different stages. For two kinds of competitive optimizers in stages, in the first stage GWO has the highest classification accuracy (91.1%), and in the second stage PSO has the smallest in-class mean absolute error (0.48). So, GWO-SVM-PSO-SVR is determined as the optimal model and a predicted value with it serves as a fill value. The comparison results of the models in empirical analysis also show that it outdoes any of the other filling models in terms of mean absolute error and mean absolute percentage error. In addition, the sensitivity analysis shows that it presents high tolerance as the sample size changes and has a good stability.
缺失值通常会影响流行病学调查的数据利用。在这项研究中,根据糖尿病空腹血糖医学诊断标准的临界点值,我们将 2009 年中国健康与营养调查(CHNS)山东部分的空腹血糖检测数据分为两类:正常和异常。相应地,对于缺失的空腹血糖值,我们提出了一种两阶段预测填充方法,使用粒子群优化(PSO)或灰狼优化(GWO)竞争优化支持向量技术,首先在第一阶段使用支持向量机(SVM)预测缺失数据的类别,然后在第二阶段在预测的类别中使用支持向量回归(SVR)预测缺失值。此外,我们使用 LIBSVM 作为黄金标准,在不同阶段训练 SVM 和 SVR。对于两个阶段的竞争优化器,在第一阶段 GWO 的分类准确率最高(91.1%),而在第二阶段 PSO 的类内平均绝对误差最小(0.48)。因此,GWO-SVM-PSO-SVR 被确定为最优模型,并用其预测值作为填充值。实证分析中模型的比较结果也表明,它在平均绝对误差和平均绝对百分比误差方面优于任何其他填充模型。此外,敏感性分析表明,随着样本量的变化,它表现出较高的容忍度,具有良好的稳定性。