利用夏普利加性解释进行糖尿病预测集成模型中的特征选择。

Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction.

作者信息

Mohanty Prasant Kumar, Francis Sharmila Anand John, Barik Rabindra Kumar, Roy Diptendu Sinha, Saikia Manob Jyoti

机构信息

Department of Computer Science and Engineering, National Institute of Technology, Meghalaya 793003, India.

Department of Computer Science, King Khalid University, Abha Campus, Rijal Alma, Abha 61421, Saudi Arabia.

出版信息

Bioengineering (Basel). 2024 Nov 30;11(12):1215. doi: 10.3390/bioengineering11121215.

DOI:10.3390/bioengineering11121215

PMID:39768033

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11673338/

Abstract

Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative strategies and technology for effective management. This study integrates Shapley Additive explanations (SHAPs) into ensemble machine learning models to improve the accuracy and efficiency of diabetes predictions. By identifying the most influential features using SHAP, this study examined their role in maintaining high predictive performance while minimizing computational demands. The impact of feature selection on model accuracy was assessed across ten models using three feature sets: all features, the top three influential features, and all except these top three. Models focusing on the top three features achieved superior performance, with the ensemble model attaining a better performance in most of the metrics, outperforming comparable approaches. Notably, excluding these features led to a significant decline in performance, reinforcing their critical influence. These findings validate the effectiveness of targeted feature selection for efficient and robust clinical applications.

摘要

糖尿病是一场重大的全球健康危机，在印度，其主要由不健康饮食和久坐不动的生活方式导致，快速的城市化通过以便利为导向的生活方式和有限的体育活动机会加剧了这些影响，这凸显了采用先进预防策略和技术进行有效管理的必要性。本研究将夏普利加性解释（SHAP）集成到集成机器学习模型中，以提高糖尿病预测的准确性和效率。通过使用SHAP识别最具影响力的特征，本研究考察了它们在保持高预测性能的同时将计算需求降至最低的作用。使用三个特征集在十个模型中评估了特征选择对模型准确性的影响：所有特征、最具影响力的前三个特征以及除这三个特征之外的所有特征。专注于前三个特征的模型表现更优，集成模型在大多数指标上表现更好，优于可比方法。值得注意的是，排除这些特征会导致性能显著下降，这强化了它们的关键影响。这些发现验证了针对性特征选择对于高效且稳健的临床应用的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0950/11673338/78878a45ca6d/bioengineering-11-01215-g001.jpg

相似文献

Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction.利用夏普利加性解释进行糖尿病预测集成模型中的特征选择。

Bioengineering (Basel). 2024 Nov 30;11(12):1215. doi: 10.3390/bioengineering11121215.

Enhancing predictive accuracy for urinary tract infections post-pediatric pyeloplasty with explainable AI: an ensemble TabNet approach.使用可解释人工智能提高小儿肾盂成形术后尿路感染的预测准确性：一种集成TabNet方法。

Sci Rep. 2025 Jan 19;15(1):2455. doi: 10.1038/s41598-024-82282-1.

Interpretable lung cancer risk prediction using ensemble learning and XAI based on lifestyle and demographic data.基于生活方式和人口统计学数据，使用集成学习和可解释人工智能进行可解释的肺癌风险预测。

Comput Biol Chem. 2025 Aug;117:108438. doi: 10.1016/j.compbiolchem.2025.108438. Epub 2025 Mar 27.

Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model.利用堆叠集成模型，通过SHapley加性解释（SHAP）方法预测沟壑侵蚀敏感性。

J Environ Manage. 2025 May;383:125478. doi: 10.1016/j.jenvman.2025.125478. Epub 2025 Apr 25.

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

Enhancing the Predictive Performance of Molecularly Imprinted Polymer-Based Electrochemical Sensors Using a Stacking Regressor Ensemble of Machine Learning Models.使用机器学习模型的堆叠回归器集成来提高基于分子印迹聚合物的电化学传感器的预测性能。

ACS Sens. 2025 Apr 25;10(4):3123-3133. doi: 10.1021/acssensors.5c00364. Epub 2025 Apr 17.

Prediction of lateral lymph node metastasis with short diameter less than 8 mm in papillary thyroid carcinoma based on radiomics.基于放射组学的甲状腺乳头状癌短径小于 8mm 预测侧颈部淋巴结转移

Cancer Imaging. 2024 Nov 15;24(1):155. doi: 10.1186/s40644-024-00803-7.

Enhanced desalination with polyamide thin-film membranes using ensemble ML chemometric methods and SHAP analysis.使用集成机器学习化学计量方法和SHAP分析增强聚酰胺薄膜膜的脱盐性能

RSC Adv. 2024 Oct 1;14(43):31259-31273. doi: 10.1039/d4ra06078d.

Optimizing hypertension prediction using ensemble learning approaches.使用集成学习方法优化高血压预测。

PLoS One. 2024 Dec 23;19(12):e0315865. doi: 10.1371/journal.pone.0315865. eCollection 2024.

Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。

Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.

引用本文的文献

An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance.一种针对类别不平衡情况下糖尿病预测的有效方法。

Bioengineering (Basel). 2025 Jan 6;12(1):35. doi: 10.3390/bioengineering12010035.

本文引用的文献

An ensemble learning approach for diabetes prediction using boosting techniques.一种使用提升技术进行糖尿病预测的集成学习方法。

Front Genet. 2023 Oct 26;14:1252159. doi: 10.3389/fgene.2023.1252159. eCollection 2023.

Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model.在一个种族多样化人群中，基于机器学习和传统逻辑回归的妊娠期糖尿病预测模型的比较；莫纳什妊娠期糖尿病机器学习模型

Int J Med Inform. 2023 Nov;179:105228. doi: 10.1016/j.ijmedinf.2023.105228. Epub 2023 Sep 21.

Machine learning in precision diabetes care and cardiovascular risk prediction.机器学习在精准糖尿病护理和心血管风险预测中的应用。

Cardiovasc Diabetol. 2023 Sep 25;22(1):259. doi: 10.1186/s12933-023-01985-3.

A Feasibility Study of Diabetic Retinopathy Detection in Type II Diabetic Patients Based on Explainable Artificial Intelligence.基于可解释人工智能的 2 型糖尿病患者糖尿病视网膜病变检测的可行性研究。

J Med Syst. 2023 Aug 8;47(1):85. doi: 10.1007/s10916-023-01976-7.

A diabetes prediction model based on Boruta feature selection and ensemble learning.基于 Boruta 特征选择和集成学习的糖尿病预测模型。

BMC Bioinformatics. 2023 Jun 1;24(1):224. doi: 10.1186/s12859-023-05300-5.

Diabetes prediction using machine learning and explainable AI techniques.使用机器学习和可解释人工智能技术进行糖尿病预测。

Healthc Technol Lett. 2022 Dec 14;10(1-2):1-10. doi: 10.1049/htl2.12039. eCollection 2023 Feb-Apr.

Prevalence of Diabetes in India: A Review of IDF Diabetes Atlas 10th Edition.印度糖尿病患病率：IDF 糖尿病地图集第 10 版综述。

Curr Diabetes Rev. 2024;20(1):e130423215752. doi: 10.2174/1573399819666230413094200.

Predicting the Onset of Diabetes with Machine Learning Methods.使用机器学习方法预测糖尿病的发病

J Pers Med. 2023 Feb 24;13(3):406. doi: 10.3390/jpm13030406.

Early Prediction of Diabetes Using an Ensemble of Machine Learning Models.使用机器学习模型集成进行糖尿病早期预测。

Int J Environ Res Public Health. 2022 Sep 28;19(19):12378. doi: 10.3390/ijerph191912378.

Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques.使用机器学习技术预测中国老年人患2型糖尿病的风险

J Pers Med. 2022 May 31;12(6):905. doi: 10.3390/jpm12060905.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用夏普利加性解释进行糖尿病预测集成模型中的特征选择。

Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献