Suppr超能文献

从数据预处理和机器学习角度看糖尿病的预测与诊断

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective.

作者信息

Olisah Chollette C, Smith Lyndon, Smith Melvyn

机构信息

Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK.

Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK.

出版信息

Comput Methods Programs Biomed. 2022 Jun;220:106773. doi: 10.1016/j.cmpb.2022.106773. Epub 2022 Mar 31.

Abstract

BACKGROUND AND OBJECTIVE

Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis.

METHODS

In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem.

RESULTS

Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively.

CONCLUSION

The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available.

摘要

背景与目的

糖尿病是一种以高血糖为特征的代谢紊乱疾病,其起因是身体分泌胰岛素和对胰岛素产生反应的能力不足。如果糖尿病没有得到妥善管理或未及时诊断,可能会对眼睛、肾脏、神经、心脏和血管等重要身体器官构成风险,甚至可能危及生命。多年来在糖尿病计算诊断方面的研究表明,机器学习是预测糖尿病的一种可行解决方案。然而,迄今为止的准确率表明仍有很大的改进空间。在本文中,我们提出了一个使用皮马印第安人数据集和医疗城医院实验室(LMCH)糖尿病数据集进行糖尿病预测和诊断的机器学习框架。我们假设采用特征选择和缺失值插补方法可以提高糖尿病预测和诊断中分类模型的性能。

方法

本文提出了一个用于构建糖尿病预测模型以辅助糖尿病临床诊断的稳健框架。该框架分别从增强性能的角度采用斯皮尔曼相关性和多项式回归进行特征选择和缺失值插补。此外,还提出了不同的监督机器学习模型,即随机森林(RF)模型、支持向量机(SVM)模型以及我们设计的二次增长深度神经网络(2GDNN)模型进行分类。通过使用网格搜索和重复分层k折交叉验证来调整模型的超参数对模型进行优化,并评估它们对预测问题的扩展能力。

结果

通过对皮马印第安人和LMCH糖尿病数据集进行实验,所提出的2GDNN模型分别取得了97.34%、97.24%、97.26%、99.01%、97.25%以及97.28%、97.33%、97.27%、99.57%、97.33%的精确率、敏感度、F1分数、训练准确率和测试准确率。

结论

机器学习框架中提出的数据预处理方法和经过超参数优化的分类器产生了一个稳健的机器学习模型,该模型在糖尿病预测和诊断方面的性能优于现有技术成果。所提出的机器学习框架模型的源代码已公开提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验