Suppr超能文献

利用基线临床和病理特征进行可解释的机器学习以预测乳腺癌新辅助化疗反应

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics.

作者信息

Fang Shan, Zhang Jun, Han Chengyan, Kong Mingxiang, Zhang Haibo, Zhong Miaochun, Chen Wuzhen, Yuan Hongjun, Xia Wenjie, Zhang Wei

机构信息

Center for Rehabilitation Medicine, Rehabilitation & Sports Medicine Research Institute of Zhejiang Province, Department of Rehabilitation Medicine, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China.

Department of Breast Surgery, Weifang People's Hospital, Weifang, Shandong, China.

出版信息

Cancer Med. 2025 Sep;14(17):e71221. doi: 10.1002/cam4.71221.

Abstract

BACKGROUND

The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.

METHODS

Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models-XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)-were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.

RESULTS

A total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.

CONCLUSIONS

The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.

摘要

背景

新辅助化疗(NAC)的病理反应已成为乳腺癌(BC)患者重要的预后指标。新生成的模型依赖于相当基础的影像学和病理学特征,未能充分阐明纳入数据的重要性。本研究的目的是建立并验证一种机器学习模型,用于利用BC患者的基线临床和病理特征预测对NAC的病理完全缓解。

方法

收集2014年1月至2023年8月期间在浙江省人民医院接受NAC治疗的住院BC患者的数据。数据集随机拆分,70%用于模型训练,30%用于验证。采用LASSO回归选择预测特征。开发了六种机器学习模型——XGBoost、LightGBM、CatBoost、逻辑回归、随机森林(RF)和支持向量机(SVM),使用曲线下面积(AUC)以及准确性、精确性、召回率、F1分数和布里尔分数评估性能。使用决策曲线分析(DCA)评估临床获益,并应用SHapley加性解释(SHAP)来解释最佳机器学习模型的特征。

结果

共纳入303例接受NAC治疗的BC患者,病理完全缓解率为29.37%(89/303)。选择了年龄、绝经状态、孕激素受体(PR)、人表皮生长因子受体2(HER2)状态、Ki-67表达、基质肿瘤浸润淋巴细胞(sTILs)等12个特征用于模型构建。在六种模型中,CatBoost模型表现出最佳预测性能,经贝叶斯超参数调整后AUC达到0.853。SHAP分析将sTILs列为最关键的预测特征。在五折交叉验证中,纳入sTILs的CatBoost模型平均AUC为0.83。

结论

基于机器学习的病理完全缓解预测模型能够在基线时更准确地预测BC患者的病理完全缓解,有助于优化治疗策略。此外,可解释的SHAP框架提高了模型的透明度,增强了临床医生之间的信任和理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e6d/12415587/7d4a0187f090/CAM4-14-e71221-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验