通过使用改进的裸鼹鼠算法进行两级机器学习优化实现可解释的甲状腺癌诊断

Explainable Thyroid Cancer Diagnosis Through Two-Level Machine Learning Optimization with an Improved Naked Mole-Rat Algorithm.

作者信息

Książek Wojciech

机构信息

Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Cracow, Poland.

出版信息

Cancers (Basel). 2024 Dec 10;16(24):4128. doi: 10.3390/cancers16244128.

DOI:10.3390/cancers16244128

PMID:39766028

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11674737/

Abstract

Modern technologies, particularly artificial intelligence methods such as machine learning, hold immense potential for supporting doctors with cancer diagnostics. This study explores the enhancement of popular machine learning methods using a bio-inspired algorithm-the naked mole-rat algorithm (NMRA)-to assess the malignancy of thyroid tumors. The study utilized a novel dataset released in 2022, containing data collected at Shengjing Hospital of China Medical University. The dataset comprises 1232 records described by 19 features. In this research, 10 well-known classifiers, including XGBoost, LightGBM, and random forest, were employed to evaluate the malignancy of thyroid tumors. A key innovation of this study is the application of the naked mole-rat algorithm for parameter optimization and feature selection within the individual classifiers. Among the models tested, the LightGBM classifier demonstrated the highest performance, achieving a classification accuracy of 81.82% and an F1-score of 86.62%, following two-level parameter optimization and feature selection using the naked mole-rat algorithm. Additionally, explainability analysis of the LightGBM model was conducted using SHAP values, providing insights into the decision-making process of the model.

摘要

现代技术，特别是诸如机器学习之类的人工智能方法，在支持医生进行癌症诊断方面具有巨大潜力。本研究探索了使用一种受生物启发的算法——裸鼹鼠算法（NMRA）来增强流行的机器学习方法，以评估甲状腺肿瘤的恶性程度。该研究使用了2022年发布的一个新颖数据集，其中包含在中国医科大学盛京医院收集的数据。该数据集由19个特征描述的1232条记录组成。在本研究中，使用了10种著名的分类器，包括XGBoost、LightGBM和随机森林，来评估甲状腺肿瘤的恶性程度。本研究的一项关键创新是将裸鼹鼠算法应用于各个分类器中的参数优化和特征选择。在测试的模型中，LightGBM分类器表现出最高的性能，在使用裸鼹鼠算法进行两级参数优化和特征选择后，实现了81.82%的分类准确率和86.62%的F1分数。此外，还使用SHAP值对LightGBM模型进行了可解释性分析，深入了解了该模型的决策过程。