基于软投票分类器和可解释 AI 的糖尿病预测集成方法。

Diabetes is a chronic disease that continues to be a primary and worldwide health concern since the health of the entire population has been affected by it. Over the years, many academics have attempted to develop a reliable diabetes prediction model using machine learning (ML) algorithms. However, these research investigations have had a minimal impact on clinical practice as the current studies focus mainly on improving the performance of complicated ML models while ignoring their explainability to clinical situations. Therefore, the physicians find it difficult to understand these models and rarely trust them for clinical use. In this study, a carefully constructed, efficient, and interpretable diabetes detection method using an explainable AI has been proposed. The Pima Indian diabetes dataset was used, containing a total of 768 instances where 268 are diabetic, and 500 cases are non-diabetic with several diabetic attributes. Here, six machine learning algorithms (artificial neural network (ANN), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost, XGBoost) have been used along with an ensemble classifier to diagnose the diabetes disease. For each machine learning model, global and local explanations have been produced using the Shapley additive explanations (SHAP), which are represented in different types of graphs to help physicians in understanding the model predictions. The balanced accuracy of the developed weighted ensemble model was 90% with a F1 score of 89% using a five-fold cross-validation (CV). The median values were used for the imputation of the missing values and the synthetic minority oversampling technique (SMOTETomek) was used to balance the classes of the dataset. The proposed approach can improve the clinical understanding of a diabetes diagnosis and help in taking necessary action at the very early stages of the disease.

糖尿病是一种慢性疾病，由于它影响了整个人口的健康，因此一直是一个主要的全球性健康问题。多年来，许多学者试图使用机器学习 (ML) 算法开发可靠的糖尿病预测模型。然而，由于当前的研究主要集中在提高复杂 ML 模型的性能上，而忽略了它们对临床情况的可解释性，这些研究对临床实践的影响微乎其微。因此，医生们发现很难理解这些模型，很少信任它们用于临床使用。在这项研究中，提出了一种使用可解释 AI 的精心构建、高效且可解释的糖尿病检测方法。使用了 Pima 印度糖尿病数据集，其中包含总共 768 个实例，其中 268 个是糖尿病患者，500 个是非糖尿病患者，有几个糖尿病属性。在这里，使用了六种机器学习算法（人工神经网络 (ANN)、随机森林 (RF)、支持向量机 (SVM)、逻辑回归 (LR)、AdaBoost、XGBoost）以及集成分类器来诊断糖尿病疾病。对于每个机器学习模型，使用 Shapley 加法解释 (SHAP) 生成了全局和局部解释，这些解释以不同类型的图表表示，以帮助医生理解模型预测。使用五折交叉验证 (CV) 开发的加权集成模型的平衡准确率为 90%，F1 得分为 89%。中位数用于缺失值的插补，并且使用合成少数过采样技术 (SMOTETomek) 来平衡数据集的类。该方法可以提高对糖尿病诊断的临床理解，并有助于在疾病的早期阶段采取必要的行动。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具