Suppr超能文献

基于沙猫群优化算法优化的随机森林并结合SHAP和基于DiCE的可解释性进行胆结石分类

Gallstone Classification Using Random Forest Optimized by Sand Cat Swarm Optimization Algorithm with SHAP and DiCE-Based Interpretability.

作者信息

Sarker Proshenjit, Tiang Jun-Jiat, Nahid Abdullah-Al

机构信息

Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh.

Centre for Wireless Technology, CoE for Intelligent Network, Faculty of Artificial Intelligence & Engineering, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Selangor, Malaysia.

出版信息

Sensors (Basel). 2025 Sep 3;25(17):5489. doi: 10.3390/s25175489.

Abstract

Gallstone disease affects approximately 10-20% of the global adult population, with early diagnosis being essential for effective treatment and management. While image-based machine learning (ML) models have shown high accuracy in gallstone detection, tabular data approaches remain less explored. In this study, we have proposed a Random Forest (RF) classifier optimized using the Sand Cat Swarm Optimization (SCSO) algorithm for gallstone prediction based on a tabular dataset. Our experiments have been conducted across four frameworks: only RF without cross-validation (CV), RF with CV, RF-SCSO without CV, and RF-SCSO with CV. Only RF without CV model has achieved 81.25%, 79.07%, 85%, and 73.91% accuracy, F-score, precision, and recall, respectively, using all 38 features, while the RF with CV has obtained a 10-fold cross-validation accuracy of 78.42% using the same feature set. With SCSO-based feature reduction, the RF-SCSO without and with CV models have delivered a comparable accuracy of 79.17% and 78.32%, respectively, using only 13 features, indicating effective dimensionality reduction. SHAP analysis has identified CRP, Vitamin D, and AAST as the most influential features, and DiCE has further illustrated the model's behavior by highlighting corrective counterfactuals for misclassified instances. These findings demonstrate the potential of interpretable, feature-optimized ML models for gallstone diagnosis using structured clinical data.

摘要

胆结石疾病影响着全球约10%-20%的成年人口,早期诊断对于有效治疗和管理至关重要。虽然基于图像的机器学习(ML)模型在胆结石检测中已显示出高精度,但表格数据方法的探索仍较少。在本研究中,我们提出了一种基于表格数据集,使用沙猫群优化(SCSO)算法优化的随机森林(RF)分类器用于胆结石预测。我们在四个框架下进行了实验:仅使用无交叉验证(CV)的RF、带CV的RF、无CV的RF-SCSO和带CV的RF-SCSO。仅使用无CV模型,在使用所有38个特征时,准确率、F值、精确率和召回率分别达到了81.25%、79.07%、85%和73.91%,而带CV的RF使用相同特征集获得了10倍交叉验证准确率78.42%。通过基于SCSO的特征约简,无CV和带CV的RF-SCSO模型仅使用13个特征时分别实现了79.17%和78.32%的可比准确率,表明有效降维。SHAP分析确定了CRP、维生素D和AAST为最具影响力的特征,DiCE通过突出误分类实例的纠正反事实进一步说明了模型的行为。这些发现证明了使用结构化临床数据的可解释、特征优化的ML模型在胆结石诊断中的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1f/12431373/37871fd6209a/sensors-25-05489-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验