用于分类和识别重要属性以检测2型糖尿病的机器学习模型。

Machine learning models for classification and identification of significant attributes to detect type 2 diabetes.

作者信息

Howlader Koushik Chandra, Satu Md Shahriare, Awal Md Abdul, Islam Md Rabiul, Islam Sheikh Mohammed Shariful, Quinn Julian M W, Moni Mohammad Ali

机构信息

Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh.

Department of MIS, Noakhali Science and Techology University, Noakhali, Bangladesh.

出版信息

Health Inf Sci Syst. 2022 Feb 9;10(1):2. doi: 10.1007/s13755-021-00168-2. eCollection 2022 Dec.

DOI:10.1007/s13755-021-00168-2

PMID:35178244

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8828812/

Abstract

UNLABELLED

Type 2 Diabetes (T2D) is a chronic disease characterized by abnormally high blood glucose levels due to insulin resistance and reduced pancreatic insulin production. The challenge of this work is to identify T2D-associated features that can distinguish T2D sub-types for prognosis and treatment purposes. We thus employed machine learning (ML) techniques to categorize T2D patients using data from the Pima Indian Diabetes Dataset from the Kaggle ML repository. After data preprocessing, several feature selection techniques were used to extract feature subsets, and a range of classification techniques were used to analyze these. We then compared the derived classification results to identify the best classifiers by considering accuracy, kappa statistics, area under the receiver operating characteristic (AUROC), sensitivity, specificity, and logarithmic loss (logloss). To evaluate the performance of different classifiers, we investigated their outcomes using the summary statistics with a resampling distribution. Therefore, Generalized Boosted Regression modeling showed the highest accuracy (90.91%), followed by kappa statistics (78.77%) and specificity (85.19%). In addition, Sparse Distance Weighted Discrimination, Generalized Additive Model using LOESS and Boosted Generalized Additive Models also gave the maximum sensitivity (100%), highest AUROC (95.26%) and lowest logarithmic loss (30.98%) respectively. Notably, the Generalized Additive Model using LOESS was the top-ranked algorithm according to non-parametric Friedman testing. Of the features identified by these machine learning models, glucose levels, body mass index, diabetes pedigree function, and age were consistently identified as the best and most frequently accurate outcome predictors. These results indicate the utility of ML methods in constructing improved prediction models for T2D and successfully identified outcome predictors for this Pima Indian population.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s13755-021-00168-2.

摘要

未标注

2型糖尿病（T2D）是一种慢性病，其特征是由于胰岛素抵抗和胰腺胰岛素分泌减少导致血糖水平异常升高。这项工作的挑战在于识别与T2D相关的特征，以便区分T2D亚型，用于预后和治疗。因此，我们采用机器学习（ML）技术，利用来自Kaggle ML库的皮马印第安人糖尿病数据集对T2D患者进行分类。经过数据预处理后，使用了几种特征选择技术来提取特征子集，并使用一系列分类技术对其进行分析。然后，我们比较了所得的分类结果，通过考虑准确率、kappa统计量、受试者工作特征曲线下面积（AUROC）、敏感性、特异性和对数损失（logloss）来确定最佳分类器。为了评估不同分类器的性能，我们使用重采样分布的汇总统计量来研究它们的结果。因此，广义增强回归模型显示出最高的准确率（90.91%），其次是kappa统计量（78.77%）和特异性（85.19%）。此外，稀疏距离加权判别、使用局部加权散点平滑估计（LOESS）的广义相加模型和增强广义相加模型分别给出了最大敏感性（100%）、最高AUROC（95.26%）和最低对数损失（30.98%）。值得注意的是，根据非参数弗里德曼检验，使用LOESS的广义相加模型是排名最高的算法。在这些机器学习模型识别出的特征中，血糖水平、体重指数、糖尿病家族史函数和年龄一直被确定为最佳且最常准确的结果预测指标。这些结果表明ML方法在构建改进的T2D预测模型中的效用，并成功识别了该皮马印第安人群体的结果预测指标。

补充信息

在线版本包含可在10.1007/s13755-021-00168-2获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26dd/8828812/9c132b6aefb8/13755_2021_168_Fig1_HTML.jpg

相似文献

Machine learning models for classification and identification of significant attributes to detect type 2 diabetes.用于分类和识别重要属性以检测2型糖尿病的机器学习模型。

Health Inf Sci Syst. 2022 Feb 9;10(1):2. doi: 10.1007/s13755-021-00168-2. eCollection 2022 Dec.

Prediction of Weight Loss to Decrease the Risk for Type 2 Diabetes Using Multidimensional Data in Filipino Americans: Secondary Analysis.利用多维数据预测菲律宾裔美国人的体重减轻以降低2型糖尿病风险：二次分析

JMIR Diabetes. 2023 Apr 11;8:e44018. doi: 10.2196/44018.

Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.基于舌象特征和机器学习技术的无创糖尿病风险预测模型的建立。

Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.

Histologic subtype classification of non-small cell lung cancer using PET/CT images.使用 PET/CT 图像对非小细胞肺癌进行组织学亚型分类。

Eur J Nucl Med Mol Imaging. 2021 Feb;48(2):350-360. doi: 10.1007/s00259-020-04771-5. Epub 2020 Aug 10.

Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层：缺失值和异常值的作用。

J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes.机器学习模型在2型糖尿病早期检测与准确分类中的应用

Diagnostics (Basel). 2023 Jul 15;13(14):2383. doi: 10.3390/diagnostics13142383.

Using Wearable Activity Trackers to Predict Type 2 Diabetes: Machine Learning-Based Cross-sectional Study of the UK Biobank Accelerometer Cohort.使用可穿戴活动追踪器预测2型糖尿病：基于机器学习的英国生物银行加速度计队列横断面研究

JMIR Diabetes. 2021 Mar 19;6(1):e23364. doi: 10.2196/23364.

[Predicting prolonged length of intensive care unit stay machine learning].[预测重症监护病房长期住院时间机器学习]

Beijing Da Xue Xue Bao Yi Xue Ban. 2021 Dec 18;53(6):1163-1170. doi: 10.19723/j.issn.1671-167X.2021.06.026.

Machine Learning-Based Prediction of COVID-19 Mortality With Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study.基于机器学习的有限属性预测COVID-19死亡率以加快患者预后和分诊：回顾性观察研究。

JMIRx Med. 2021 Oct 15;2(4):e29392. doi: 10.2196/29392. eCollection 2021 Oct-Dec.

引用本文的文献

AI-driven prediction of insulin resistance in non-diabetic populations using minimal invasive tests: comparing models and criteria.使用微创检测对非糖尿病人群胰岛素抵抗进行人工智能驱动的预测：模型与标准比较

Diabetol Metab Syndr. 2025 Aug 18;17(1):338. doi: 10.1186/s13098-025-01920-4.

The study of an integrated diabetes prediction model based on user-defined risk decision making strategy.基于用户定义风险决策策略的综合糖尿病预测模型研究。

Medicine (Baltimore). 2025 Jun 13;104(24):e42680. doi: 10.1097/MD.0000000000042680.

OptiStack classifier: optimized stacking framework with ensemble feature engineering for enhanced cardiovascular risk prediction.OptiStack分类器：具有集成特征工程的优化堆叠框架，用于增强心血管风险预测。

Inflamm Res. 2025 May 31;74(1):88. doi: 10.1007/s00011-025-02039-y.

Can some algorithms of machine learning identify osteoporosis patients after training and testing some clinical information about patients?一些机器学习算法能否在对患者的一些临床信息进行训练和测试后识别出骨质疏松症患者？

BMC Med Inform Decis Mak. 2025 Mar 11;25(1):127. doi: 10.1186/s12911-025-02943-7.

Improved feature reduction framework for sign language recognition using autoencoders and adaptive Grey Wolf Optimization.使用自动编码器和自适应灰狼优化算法的改进型手语识别特征约简框架

Sci Rep. 2025 Jan 17;15(1):2300. doi: 10.1038/s41598-024-82785-x.

Next-generation diabetes diagnosis and personalized diet-activity management: A hybrid ensemble paradigm.下一代糖尿病诊断与个性化饮食-运动管理：一种混合集成范式。

PLoS One. 2025 Jan 8;20(1):e0307718. doi: 10.1371/journal.pone.0307718. eCollection 2025.

A novel RFE-GRU model for diabetes classification using PIMA Indian dataset.一种使用皮马印第安人数据集进行糖尿病分类的新型RFE-GRU模型。

Sci Rep. 2025 Jan 6;15(1):982. doi: 10.1038/s41598-024-82420-9.

Automated sample annotation for diabetes mellitus in healthcare integrated biobanking.医疗综合生物样本库中糖尿病的自动样本注释

Comput Struct Biotechnol J. 2024 Oct 23;24:724-733. doi: 10.1016/j.csbj.2024.10.033. eCollection 2024 Dec.

Enhanced detection of diabetes mellitus using novel ensemble feature engineering approach and machine learning model.利用新型集成特征工程方法和机器学习模型增强对糖尿病的检测。

Sci Rep. 2024 Oct 7;14(1):23274. doi: 10.1038/s41598-024-74357-w.

Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications.迈向可靠的糖尿病预测：数据工程与机器学习应用的创新

Digit Health. 2024 Aug 21;10:20552076241271867. doi: 10.1177/20552076241271867. eCollection 2024 Jan-Dec.

本文引用的文献

A novel stacking technique for prediction of diabetes.一种用于预测糖尿病的新型堆叠技术。

Comput Biol Med. 2021 Aug;135:104554. doi: 10.1016/j.compbiomed.2021.104554. Epub 2021 Jun 8.

A remote healthcare monitoring framework for diabetes prediction using machine learning.一种使用机器学习进行糖尿病预测的远程医疗监测框架。

Healthc Technol Lett. 2021 May 2;8(3):45-57. doi: 10.1049/htl2.12010. eCollection 2021 Jun.

Deep learning approach for diabetes prediction using PIMA Indian dataset.使用皮马印第安人数据集的糖尿病预测深度学习方法。

J Diabetes Metab Disord. 2020 Apr 14;19(1):391-403. doi: 10.1007/s40200-020-00520-5. eCollection 2020 Jun.

A Network-Based Bioinformatics Approach to Identify Molecular Biomarkers for Type 2 Diabetes that Are Linked to the Progression of Neurological Diseases.基于网络的生物信息学方法鉴定与神经退行性疾病进展相关的 2 型糖尿病分子标志物。

Int J Environ Res Public Health. 2020 Feb 6;17(3):1035. doi: 10.3390/ijerph17031035.

A Framework to Understand the Progression of Cardiovascular Disease for Type 2 Diabetes Mellitus Patients Using a Network Approach.利用网络方法理解 2 型糖尿病患者心血管疾病进展的框架。

Int J Environ Res Public Health. 2020 Jan 16;17(2):596. doi: 10.3390/ijerph17020596.

Comparing different supervised machine learning algorithms for disease prediction.比较不同的监督机器学习算法在疾病预测中的应用。

BMC Med Inform Decis Mak. 2019 Dec 21;19(1):281. doi: 10.1186/s12911-019-1004-8.

Network-based approach to identify molecular signatures and therapeutic agents in Alzheimer's disease.基于网络的方法鉴定阿尔茨海默病的分子特征和治疗药物。

Comput Biol Chem. 2019 Feb;78:431-439. doi: 10.1016/j.compbiolchem.2018.12.011. Epub 2018 Dec 26.

Relief-based feature selection: Introduction and review.基于缓解的特征选择：介绍与综述。

J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.

Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层：缺失值和异常值的作用。

J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

Machine Learning Methods to Predict Diabetes Complications.预测糖尿病并发症的机器学习方法

J Diabetes Sci Technol. 2018 Mar;12(2):295-302. doi: 10.1177/1932296817706375. Epub 2017 May 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于分类和识别重要属性以检测2型糖尿病的机器学习模型。

Machine learning models for classification and identification of significant attributes to detect type 2 diabetes.

作者信息

机构信息

出版信息

UNLABELLED

SUPPLEMENTARY INFORMATION

未标注

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献