Zhao Jian, Gao Hanlin, Sun Lei, Shi Lijuan, Kuang Zhejun, Wang Haiyan
College of Computer Science and Technology, Changchun University, Changchun, 130022, China.
Jilin Provincial Key Laboratory of Human Health Status Identification Function & Enhancement, Changchun University, Changchun, 130022, China.
Sci Rep. 2025 Jan 2;15(1):133. doi: 10.1038/s41598-024-83902-6.
Diabetes prediction is an important topic in the field of medical health. Accurate prediction can help early intervention and reduce patients' health risks and medical costs. This paper proposes a data preprocessing method, including removing outliers, filling missing values, and using sparse autoencoder (SAE) feature enhancement. This study proposes a new method for type 2 diabetes classification using a dual Convolutional Neural Network (CNN) teacher-student distillation model (DCTSD-Model), aiming to improve the accuracy and reliability of diabetes prediction. The variables of the original data are expanded by SAE to enhance the expressive power of the features. The proposed CNN and DCTSD-Model models are evaluated on the feature enhancement dataset using 10-fold cross validation. The experimental results show that after data preprocessing, DCTSD-Model adopts the dual teacher model knowledge distillation method to help the student model learn rich category information by generating soft labels, and uses weighted random samplers to learn samples of different categories, which solves the category imbalance problem and achieves excellent classification performance. The accuracy of DCTSD-Model on the classification task reached 98.57%, which is significantly higher than other models, showing higher classification ability and reliability. This method provides an effective solution for diabetes prediction and lays a solid foundation for further research and application.
糖尿病预测是医疗卫生领域的一个重要课题。准确的预测有助于早期干预,降低患者的健康风险和医疗成本。本文提出了一种数据预处理方法,包括去除异常值、填充缺失值以及使用稀疏自编码器(SAE)进行特征增强。本研究提出了一种使用双卷积神经网络(CNN)师生蒸馏模型(DCTSD - 模型)进行2型糖尿病分类的新方法,旨在提高糖尿病预测的准确性和可靠性。通过SAE对原始数据的变量进行扩展,以增强特征的表达能力。所提出的CNN和DCTSD - 模型在特征增强数据集上使用10折交叉验证进行评估。实验结果表明,经过数据预处理后,DCTSD - 模型采用双教师模型知识蒸馏方法,通过生成软标签帮助学生模型学习丰富的类别信息,并使用加权随机采样器学习不同类别的样本,解决了类别不平衡问题,取得了优异的分类性能。DCTSD - 模型在分类任务上的准确率达到了98.57%,显著高于其他模型,显示出更高的分类能力和可靠性。该方法为糖尿病预测提供了一种有效的解决方案,为进一步的研究和应用奠定了坚实的基础。