Sarker Proshenjit, Tiang Jun-Jiat, Nahid Abdullah-Al
Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh.
Centre for Wireless Technology, CoE for Intelligent Network, Faculty of Artificial Intelligence & Engineering, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Selangor, Malaysia.
Sensors (Basel). 2025 Sep 3;25(17):5489. doi: 10.3390/s25175489.
Gallstone disease affects approximately 10-20% of the global adult population, with early diagnosis being essential for effective treatment and management. While image-based machine learning (ML) models have shown high accuracy in gallstone detection, tabular data approaches remain less explored. In this study, we have proposed a Random Forest (RF) classifier optimized using the Sand Cat Swarm Optimization (SCSO) algorithm for gallstone prediction based on a tabular dataset. Our experiments have been conducted across four frameworks: only RF without cross-validation (CV), RF with CV, RF-SCSO without CV, and RF-SCSO with CV. Only RF without CV model has achieved 81.25%, 79.07%, 85%, and 73.91% accuracy, F-score, precision, and recall, respectively, using all 38 features, while the RF with CV has obtained a 10-fold cross-validation accuracy of 78.42% using the same feature set. With SCSO-based feature reduction, the RF-SCSO without and with CV models have delivered a comparable accuracy of 79.17% and 78.32%, respectively, using only 13 features, indicating effective dimensionality reduction. SHAP analysis has identified CRP, Vitamin D, and AAST as the most influential features, and DiCE has further illustrated the model's behavior by highlighting corrective counterfactuals for misclassified instances. These findings demonstrate the potential of interpretable, feature-optimized ML models for gallstone diagnosis using structured clinical data.
胆结石疾病影响着全球约10%-20%的成年人口,早期诊断对于有效治疗和管理至关重要。虽然基于图像的机器学习(ML)模型在胆结石检测中已显示出高精度,但表格数据方法的探索仍较少。在本研究中,我们提出了一种基于表格数据集,使用沙猫群优化(SCSO)算法优化的随机森林(RF)分类器用于胆结石预测。我们在四个框架下进行了实验:仅使用无交叉验证(CV)的RF、带CV的RF、无CV的RF-SCSO和带CV的RF-SCSO。仅使用无CV模型,在使用所有38个特征时,准确率、F值、精确率和召回率分别达到了81.25%、79.07%、85%和73.91%,而带CV的RF使用相同特征集获得了10倍交叉验证准确率78.42%。通过基于SCSO的特征约简,无CV和带CV的RF-SCSO模型仅使用13个特征时分别实现了79.17%和78.32%的可比准确率,表明有效降维。SHAP分析确定了CRP、维生素D和AAST为最具影响力的特征,DiCE通过突出误分类实例的纠正反事实进一步说明了模型的行为。这些发现证明了使用结构化临床数据的可解释、特征优化的ML模型在胆结石诊断中的潜力。