Suppr超能文献

分析不同糖尿病数据集上用于糖尿病预测的分类和特征选择策略。

Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets.

作者信息

Kaliappan Jayakumar, Saravana Kumar I J, Sundaravelan S, Anesh T, Rithik R R, Singh Yashbir, Vera-Garcia Diana V, Himeur Yassine, Mansoor Wathiq, Atalla Shadi, Srinivasan Kathiravan

机构信息

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.

Radiology, Mayo Clinic, Rochester, MN, United States.

出版信息

Front Artif Intell. 2024 Aug 21;7:1421751. doi: 10.3389/frai.2024.1421751. eCollection 2024.

Abstract

INTRODUCTION

In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care.

METHODS

This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support VectorMachine (SVM), across numerousmedical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable Artificial Intelligence (Explainable AI). By utilizing Explainable AI techniques, specifically Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions.

RESULTS

Features identified by RF in Wrapper-based techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. A notable precision and recall values, reaching up to 0.9 is achieved in predicting diabetes.

DISCUSSION

Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes.

摘要

引言

在不断发展的医疗保健和医学领域,将大量医学数据集与机器学习(ML)模型的强大功能相结合,为变革诊断、治疗和患者护理提供了重大机遇。

方法

本研究论文深入探讨了数据驱动的医疗保健领域,特别关注识别用于糖尿病预测的最有效ML模型,并揭示有助于这种预测的关键特征。使用多种ML模型,如随机森林(RF)、XGBoost(XGB)、线性回归(LR)、梯度提升(GB)和支持向量机(SVM),对众多医学数据集的预测性能进行分析。使用基于过滤器、基于包装器的技术和可解释人工智能(可解释AI)等方法进行特征重要性研究。通过利用可解释AI技术,特别是局部可解释模型无关解释(LIME)和夏普利加法解释(SHAP),确保模型的决策过程透明,从而增强对人工智能驱动决策的信任。

结果

基于包装器技术的RF和基于过滤器技术的卡方检验所识别的特征已被证明可提高预测性能。在预测糖尿病方面实现了显著的精度和召回率值,高达0.9。

讨论

两种方法都发现对年龄、糖尿病家族史、多尿、多饮和高血压等与糖尿病密切相关的特征赋予了相当大的重要性。在这个数据驱动的医疗保健时代,本文提出的研究旨在大幅改善医疗保健结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2863/11371799/26fe35bedacd/frai-07-1421751-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验