多项式 SHAP 分析在机器学习模型中捕获复杂特征交互的肝病标志物。

Polynomial-SHAP analysis of liver disease markers for capturing of complex feature interactions in machine learning models.

机构信息

College of Nuclear Technology and Automation Engineering, Sichuan Industrial Internet Intelligent Monitoring and Application Engineering Research Center, Chengdu University of Technology, Sichuan, Chengdu, China; Network and Data Security Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.

出版信息

Comput Biol Med. 2024 Nov;182:109168. doi: 10.1016/j.compbiomed.2024.109168. Epub 2024 Sep 28.

DOI:10.1016/j.compbiomed.2024.109168

PMID:39342675

Abstract

Liver disease diagnosis is pivotal for effective patient management, and machine learning techniques have shown promise in this domain. In this study, we investigate the impact of Polynomial-SHapley Additive exPlanations analysis on enhancing the performance and interpretability of machine learning models for liver disease classification. Our results demonstrate significant improvements in accuracy, precision, recall, F1_score, and Matthews correlation coefficient across various algorithms when polynomial- SHapley Additive exPlanations analysis is applied. Specifically, the Light Gradient Boosting Machine model achieves exceptional performance with 100 % accuracy in both scenarios. Furthermore, by comparing the results obtained with and without the approach, we observe substantial differences in the performance, highlighting the importance of incorporating Polynomial-SHapley Additive exPlanations analysis for improved model performance. The Polynomial features and SHapley Additive exPlanations values also enhance the interpretability of machine learning models by capturing complex feature interactions, enabling users to gain deeper insights into the underlying mechanisms driving the diagnosis. Moreover, data rebalancing using Synthetic Minority Over-sampling Technique and parameter tuning were employed to optimize the performance of the models. These findings underscore the significance of employing this analytical approach in machine-learning-based diagnostic systems for liver diseases, offering superior performance and enhanced interpretability for informed decision-making in clinical practice.

摘要

肝脏疾病的诊断对于有效的患者管理至关重要，机器学习技术在这一领域显示出了巨大的潜力。在这项研究中，我们研究了多项式 SHapley 加法解释分析对提高机器学习模型进行肝脏疾病分类的性能和可解释性的影响。我们的结果表明，在应用多项式 SHapley 加法解释分析时，各种算法的准确性、精度、召回率、F1 分数和马修斯相关系数都有显著提高。特别是在两种情况下，Light Gradient Boosting Machine 模型都取得了 100%的准确率，表现出色。此外，通过比较有和没有该方法时的结果，我们观察到性能上的显著差异，突出了纳入多项式 SHapley 加法解释分析以提高模型性能的重要性。多项式特征和 SHapley 加法解释值还通过捕捉复杂的特征交互，提高了机器学习模型的可解释性，使用户能够更深入地了解驱动诊断的潜在机制。此外，还使用了 Synthetic Minority Over-sampling Technique 进行数据再平衡和参数调整，以优化模型的性能。这些发现强调了在基于机器学习的肝脏疾病诊断系统中采用这种分析方法的重要性，为临床实践中的决策提供了卓越的性能和增强的可解释性。