Suppr超能文献

利用夏普利加性解释进行糖尿病预测集成模型中的特征选择。

Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction.

作者信息

Mohanty Prasant Kumar, Francis Sharmila Anand John, Barik Rabindra Kumar, Roy Diptendu Sinha, Saikia Manob Jyoti

机构信息

Department of Computer Science and Engineering, National Institute of Technology, Meghalaya 793003, India.

Department of Computer Science, King Khalid University, Abha Campus, Rijal Alma, Abha 61421, Saudi Arabia.

出版信息

Bioengineering (Basel). 2024 Nov 30;11(12):1215. doi: 10.3390/bioengineering11121215.

Abstract

Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative strategies and technology for effective management. This study integrates Shapley Additive explanations (SHAPs) into ensemble machine learning models to improve the accuracy and efficiency of diabetes predictions. By identifying the most influential features using SHAP, this study examined their role in maintaining high predictive performance while minimizing computational demands. The impact of feature selection on model accuracy was assessed across ten models using three feature sets: all features, the top three influential features, and all except these top three. Models focusing on the top three features achieved superior performance, with the ensemble model attaining a better performance in most of the metrics, outperforming comparable approaches. Notably, excluding these features led to a significant decline in performance, reinforcing their critical influence. These findings validate the effectiveness of targeted feature selection for efficient and robust clinical applications.

摘要

糖尿病是一场重大的全球健康危机,在印度,其主要由不健康饮食和久坐不动的生活方式导致,快速的城市化通过以便利为导向的生活方式和有限的体育活动机会加剧了这些影响,这凸显了采用先进预防策略和技术进行有效管理的必要性。本研究将夏普利加性解释(SHAP)集成到集成机器学习模型中,以提高糖尿病预测的准确性和效率。通过使用SHAP识别最具影响力的特征,本研究考察了它们在保持高预测性能的同时将计算需求降至最低的作用。使用三个特征集在十个模型中评估了特征选择对模型准确性的影响:所有特征、最具影响力的前三个特征以及除这三个特征之外的所有特征。专注于前三个特征的模型表现更优,集成模型在大多数指标上表现更好,优于可比方法。值得注意的是,排除这些特征会导致性能显著下降,这强化了它们的关键影响。这些发现验证了针对性特征选择对于高效且稳健的临床应用的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0950/11673338/78878a45ca6d/bioengineering-11-01215-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验