• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

两种用于预测2型糖尿病的机器学习混合模型。

Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.

作者信息

Farnoosh Rahman, Abnoosian Karlo, Isewid Rasha Abbas

机构信息

The School of Mathematics and Computer Science, Statistics, Iran University of Science and Technology, Tehran, Iran.

出版信息

J Med Signals Sens. 2025 Apr 19;15:11. doi: 10.4103/jmss.jmss_29_24. eCollection 2025.

DOI:10.4103/jmss.jmss_29_24
PMID:40351779
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12063970/
Abstract

BACKGROUND

The global increase in diabetes prevalence necessitates advanced diagnostic methods. Machine learning has shown promise in disease diagnosis, including diabetes.

MATERIALS AND METHODS

We used a dataset collected from the Medical City Hospital laboratory and the Specialized Center for Endocrinology and Diabetes at Al-Kindy Teaching Hospital in Iraq. This dataset includes 1000 physical examination samples from both male and female patients. The samples are categorized into three classes: diabetic (Y), nondiabetic (N), and predicted diabetic (P). The dataset contains twelve attributes and includes outlier data. Outliers in medical studies can result from unusual disease attributes. Therefore, consulting with a specialist physician to identify and handle these outliers using statistical methods is necessary. The main contribution of this study is the proposal of two hybrid models for diabetes diagnosis in two scenarios: (1) Scenario 1 (presence of outlier data): Hybrid Model 1 combines the K-medoids clustering algorithm with a Gaussian naive Bayes (GNB) classifier based on kernel density estimation (KDE) to handle outliers and (2) Scenario 2 (after removing outlier data): Hybrid Model 2 combines the K-means clustering algorithm with a GNB classifier based on KDE with suitable bandwidth. We performed principal component analysis to minimize dimensionality and evaluated the models using fivefold cross-validation.

RESULTS

All experiments were conducted in identical settings. Our proposed hybrid models demonstrated superior performance in two scenarios, handling and rejecting outliers, compared to other machine-learning models in this study, including support vector machines (with radial-based, polynomial, linear, and sigmoid kernel functions), decision trees (J48), and GNB classifiers for diabetes prediction. The average accuracy for Scenario 1 with Hybrid Model 1 was 0.9743, and for Scenario 2 with Hybrid Model 2, it was 0.9867. We also evaluated precision, sensitivity, and F1-score as performance metrics.

CONCLUSION

This study presents two hybrid models for diabetes diagnosis, demonstrating high accuracy in distinguishing between diabetic and nondiabetic patients and effectively handling outliers. The findings highlight the potential of machine-learning techniques for improving the early diagnosis and treatment of diabetes.

摘要

背景

全球糖尿病患病率的上升需要先进的诊断方法。机器学习在包括糖尿病在内的疾病诊断中已显示出前景。

材料与方法

我们使用了从伊拉克金迪教学医院的医学城医院实验室和内分泌与糖尿病专科医院收集的数据集。该数据集包括1000份来自男性和女性患者的体格检查样本。样本分为三类:糖尿病患者(Y)、非糖尿病患者(N)和预测糖尿病患者(P)。该数据集包含十二个属性,并且包括异常值数据。医学研究中的异常值可能源于不寻常的疾病属性。因此,有必要咨询专科医生以使用统计方法识别和处理这些异常值。本研究的主要贡献在于针对两种情况提出了两种用于糖尿病诊断的混合模型:(1)情况1(存在异常值数据):混合模型1将K-中心点聚类算法与基于核密度估计(KDE)的高斯朴素贝叶斯(GNB)分类器相结合以处理异常值;(2)情况2(去除异常值数据后):混合模型2将K-均值聚类算法与基于具有合适带宽的KDE的GNB分类器相结合。我们进行了主成分分析以最小化维度,并使用五折交叉验证对模型进行评估。

结果

所有实验均在相同设置下进行。与本研究中的其他机器学习模型(包括支持向量机(具有基于径向、多项式、线性和Sigmoid核函数)、决策树(J48)和用于糖尿病预测的GNB分类器)相比,我们提出的混合模型在处理和排除异常值的两种情况下均表现出卓越的性能。混合模型1在情况1下的平均准确率为0.9743,混合模型2在情况2下的平均准确率为0.9867。我们还将精确率、敏感度和F1分数作为性能指标进行了评估。

结论

本研究提出了两种用于糖尿病诊断的混合模型,在区分糖尿病患者和非糖尿病患者方面显示出高准确率,并能有效处理异常值。研究结果突出了机器学习技术在改善糖尿病早期诊断和治疗方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/7830e9145164/JMSS-15-11-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/3192f3e3ee21/JMSS-15-11-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/a09c61df71f7/JMSS-15-11-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/9fcc2f69adce/JMSS-15-11-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/7830e9145164/JMSS-15-11-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/3192f3e3ee21/JMSS-15-11-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/a09c61df71f7/JMSS-15-11-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/9fcc2f69adce/JMSS-15-11-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9450/12063970/7830e9145164/JMSS-15-11-g006.jpg

相似文献

1
Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.两种用于预测2型糖尿病的机器学习混合模型。
J Med Signals Sens. 2025 Apr 19;15:11. doi: 10.4103/jmss.jmss_29_24. eCollection 2025.
2
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
3
Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.水质模型的数据驱动演变:创新异常值检测方法的深入研究——以爱尔兰水质指数(IEWQI)模型为例
Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.
4
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.机器学习混合模型预测慢性肾脏病。
Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.
5
Diabetes disease detection and classification on Indian demographic and health survey data using machine learning methods.使用机器学习方法对印度人口与健康调查数据进行糖尿病疾病检测与分类
Diabetes Metab Syndr. 2023 Jan;17(1):102690. doi: 10.1016/j.dsx.2022.102690. Epub 2022 Dec 5.
6
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.一种新的混合集成机器学习模型,用于严重程度风险评估和 COVID 后预测系统。
Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.
7
Gaussian process-based kernel as a diagnostic model for prediction of type 2 diabetes mellitus risk using non-linear heart rate variability features.基于高斯过程的核作为一种诊断模型,用于利用非线性心率变异性特征预测2型糖尿病风险。
Biomed Eng Lett. 2021 Jun 25;11(3):273-286. doi: 10.1007/s13534-021-00196-7. eCollection 2021 Aug.
8
A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease.一种用于预测甲状腺功能减退症的新型布隆智能特征分类模型。
Sensors (Basel). 2023 Jan 18;23(3):1128. doi: 10.3390/s23031128.
9
Particle Swarm Optimized Hybrid Kernel-Based Multiclass Support Vector Machine for Microarray Cancer Data Analysis.基于粒子群优化混合核的多类支持向量机在微阵列癌症数据分析中的应用。
Biomed Res Int. 2019 Dec 14;2019:4085725. doi: 10.1155/2019/4085725. eCollection 2019.
10
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层:缺失值和异常值的作用。
J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

本文引用的文献

1
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
2
A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients.朴素贝叶斯分类器的贝叶斯潜在类别扩展及其在胃癌患者分类中的应用。
BMC Med Res Methodol. 2023 Aug 21;23(1):190. doi: 10.1186/s12874-023-02013-4.
3
The value of machine learning for prognosis prediction of diphenhydramine exposure: National analysis of 50,000 patients in the United States.
机器学习在苯海拉明暴露预后预测中的价值:对美国5万名患者的全国性分析。
J Res Med Sci. 2023 Jun 12;28:49. doi: 10.4103/jrms.jrms_602_22. eCollection 2023.
4
Exploring cost drivers to improve disease management: the case of type 2 diabetes at a tertiary hospital in Burundi, Africa.探索成本驱动因素以改善疾病管理:以非洲布隆迪一家三级医院的2型糖尿病为例。
J Public Health Afr. 2023 Apr 19;14(4):2266. doi: 10.4081/jphia.2023.2266. eCollection 2023 Apr 30.
5
Combining ensemble classification and integrated filter-evolutionary search for breast cancer diagnosis.结合集成分类与集成滤波器-进化搜索用于乳腺癌诊断。
J Cancer Res Clin Oncol. 2023 Sep;149(12):10753-10769. doi: 10.1007/s00432-023-04968-9. Epub 2023 Jun 13.
6
A diabetes prediction model based on Boruta feature selection and ensemble learning.基于 Boruta 特征选择和集成学习的糖尿病预测模型。
BMC Bioinformatics. 2023 Jun 1;24(1):224. doi: 10.1186/s12859-023-05300-5.
7
A novel method to derive personalized minimum viable recommendations for type 2 diabetes prevention based on counterfactual explanations.基于反事实解释的 2 型糖尿病预防个性化最小可行推荐的新方法。
PLoS One. 2022 Nov 17;17(11):e0272825. doi: 10.1371/journal.pone.0272825. eCollection 2022.
8
Maternal and neonatal outcomes in pregnancies with type 2 diabetes in First Nation and other Manitoban people: a population-based study.加拿大曼尼托巴省原住民和其他人群中 2 型糖尿病妊娠的母婴结局:一项基于人群的研究。
CMAJ Open. 2022 Oct 23;10(4):E930-E936. doi: 10.9778/cmajo.20220025. Print 2022 Oct-Dec.
9
Cost of Illness Analysis of Type 2 Diabetes Mellitus: The Findings from a Lower-Middle Income Country.2 型糖尿病疾病负担分析:来自中低收入国家的发现。
Int J Environ Res Public Health. 2022 Oct 2;19(19):12611. doi: 10.3390/ijerph191912611.
10
Maturity-Onset Diabetes of the Young: Rapid Evidence Review.青少年起病的成年型糖尿病:快速证据回顾。
Am Fam Physician. 2022 Feb 1;105(2):162-167.