Liu Yuxi, Wu Yunfeng, Xia Qing, He Hao, Yu Haining, Che Ying
Department of Ultrasound, The First Affiliated Hospital of Dalian Medical University, Liaoning, Dalian, P.R. China.
Department of Ultrasound, Affiliated Hospital of Shandong Second Medical University, Shan Dong, Weifang, P.R. China.
J Transl Med. 2025 Aug 11;23(1):892. doi: 10.1186/s12967-025-06686-x.
To develop explainable machine learning models that integrate multimodal imaging and pathological biomarkers to predict axillary lymph node metastasis (ALNM) in breast cancer patients and assess their clinical utility.
A retrospective study was conducted on clinical data from 401 patients with pathologically confirmed breast cancer. Ten machine learning algorithms-including Naïve Bayes, Random Forest, Logistic Regression, and Support Vector Machines-were implemented to construct predictive models. Model performance was assessed using standard metrics such as the area under the receiver operating characteristic curve (AUC). To enhance interpretability, SHapley Additive exPlanations (SHAP) were applied to determine feature importance and elucidate model predictions.
The most influential predictive features included lymph node parenchymal thickness, lymph node enlargement, and tumor width. Among all models, the Naive Bayes classifier demonstrated the highest performance. In the training cohort, the accuracy, precision, recall, and F1-score were 81.0%, 84.0%, 82.0%, and 82.0%, respectively. In the validation cohort, these values were 82.6%, 83.4%, 82.6%, and 82.0%. The AUCs for the training and validation cohorts were 0.880 and 0.902, respectively.
The Naïve Bayes model demonstrated robust performance and interpretability in predicting ALNM. As a non-invasive and explainable tool, it provides clinical value for risk stratification, accurate diagnosis, and the development of individualized treatment strategies.
开发可解释的机器学习模型,整合多模态成像和病理生物标志物,以预测乳腺癌患者的腋窝淋巴结转移(ALNM),并评估其临床效用。
对401例经病理证实的乳腺癌患者的临床资料进行回顾性研究。实施了包括朴素贝叶斯、随机森林、逻辑回归和支持向量机在内的10种机器学习算法来构建预测模型。使用标准指标如受试者操作特征曲线下面积(AUC)评估模型性能。为提高可解释性,应用SHapley加性解释(SHAP)来确定特征重要性并阐明模型预测。
最具影响力的预测特征包括淋巴结实质厚度、淋巴结肿大和肿瘤宽度。在所有模型中,朴素贝叶斯分类器表现出最高性能。在训练队列中,准确率、精确率、召回率和F1分数分别为81.0%、84.0%、82.0%和82.0%。在验证队列中,这些值分别为82.6%、83.4%、82.6%和82.0%。训练和验证队列的AUC分别为0.880和0.902。
朴素贝叶斯模型在预测ALNM方面表现出强大的性能和可解释性。作为一种非侵入性且可解释的工具,它为风险分层、准确诊断和个性化治疗策略的制定提供了临床价值。