两种用于预测2型糖尿病的机器学习混合模型。

Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.

作者信息

Farnoosh Rahman, Abnoosian Karlo, Isewid Rasha Abbas

机构信息

The School of Mathematics and Computer Science, Statistics, Iran University of Science and Technology, Tehran, Iran.

出版信息

J Med Signals Sens. 2025 Apr 19;15:11. doi: 10.4103/jmss.jmss_29_24. eCollection 2025.

DOI:10.4103/jmss.jmss_29_24

PMID:40351779

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12063970/

Abstract

BACKGROUND

The global increase in diabetes prevalence necessitates advanced diagnostic methods. Machine learning has shown promise in disease diagnosis, including diabetes.

MATERIALS AND METHODS

We used a dataset collected from the Medical City Hospital laboratory and the Specialized Center for Endocrinology and Diabetes at Al-Kindy Teaching Hospital in Iraq. This dataset includes 1000 physical examination samples from both male and female patients. The samples are categorized into three classes: diabetic (Y), nondiabetic (N), and predicted diabetic (P). The dataset contains twelve attributes and includes outlier data. Outliers in medical studies can result from unusual disease attributes. Therefore, consulting with a specialist physician to identify and handle these outliers using statistical methods is necessary. The main contribution of this study is the proposal of two hybrid models for diabetes diagnosis in two scenarios: (1) Scenario 1 (presence of outlier data): Hybrid Model 1 combines the K-medoids clustering algorithm with a Gaussian naive Bayes (GNB) classifier based on kernel density estimation (KDE) to handle outliers and (2) Scenario 2 (after removing outlier data): Hybrid Model 2 combines the K-means clustering algorithm with a GNB classifier based on KDE with suitable bandwidth. We performed principal component analysis to minimize dimensionality and evaluated the models using fivefold cross-validation.

RESULTS

All experiments were conducted in identical settings. Our proposed hybrid models demonstrated superior performance in two scenarios, handling and rejecting outliers, compared to other machine-learning models in this study, including support vector machines (with radial-based, polynomial, linear, and sigmoid kernel functions), decision trees (J48), and GNB classifiers for diabetes prediction. The average accuracy for Scenario 1 with Hybrid Model 1 was 0.9743, and for Scenario 2 with Hybrid Model 2, it was 0.9867. We also evaluated precision, sensitivity, and F1-score as performance metrics.

CONCLUSION

This study presents two hybrid models for diabetes diagnosis, demonstrating high accuracy in distinguishing between diabetic and nondiabetic patients and effectively handling outliers. The findings highlight the potential of machine-learning techniques for improving the early diagnosis and treatment of diabetes.

摘要

背景

全球糖尿病患病率的上升需要先进的诊断方法。机器学习在包括糖尿病在内的疾病诊断中已显示出前景。

材料与方法

我们使用了从伊拉克金迪教学医院的医学城医院实验室和内分泌与糖尿病专科医院收集的数据集。该数据集包括1000份来自男性和女性患者的体格检查样本。样本分为三类：糖尿病患者（Y）、非糖尿病患者（N）和预测糖尿病患者（P）。该数据集包含十二个属性，并且包括异常值数据。医学研究中的异常值可能源于不寻常的疾病属性。因此，有必要咨询专科医生以使用统计方法识别和处理这些异常值。本研究的主要贡献在于针对两种情况提出了两种用于糖尿病诊断的混合模型：（1）情况1（存在异常值数据）：混合模型1将K-中心点聚类算法与基于核密度估计（KDE）的高斯朴素贝叶斯（GNB）分类器相结合以处理异常值；（2）情况2（去除异常值数据后）：混合模型2将K-均值聚类算法与基于具有合适带宽的KDE的GNB分类器相结合。我们进行了主成分分析以最小化维度，并使用五折交叉验证对模型进行评估。

结果

所有实验均在相同设置下进行。与本研究中的其他机器学习模型（包括支持向量机（具有基于径向、多项式、线性和Sigmoid核函数）、决策树（J48）和用于糖尿病预测的GNB分类器）相比，我们提出的混合模型在处理和排除异常值的两种情况下均表现出卓越的性能。混合模型1在情况1下的平均准确率为0.9743，混合模型2在情况2下的平均准确率为0.9867。我们还将精确率、敏感度和F1分数作为性能指标进行了评估。

结论

本研究提出了两种用于糖尿病诊断的混合模型，在区分糖尿病患者和非糖尿病患者方面显示出高准确率，并能有效处理异常值。研究结果突出了机器学习技术在改善糖尿病早期诊断和治疗方面的潜力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

两种用于预测2型糖尿病的机器学习混合模型。

Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景

材料与方法

结果

结论

相似文献

本文引用的文献

两种用于预测2型糖尿病的机器学习混合模型。

Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景

材料与方法

结果

结论

相似文献

本文引用的文献