Li Zhaoyi, Miao Hao, Bao Wei, Zhang Lansheng
Department of Radiotherapy, The Second Affiliated Hospital of Xuzhou Medical University, Meijian Road 32, Xuzhou, 221000, China.
Xuzhou Medical University, Tongshan Road 209, Xuzhou, 221000, China.
BMC Cancer. 2025 Apr 14;25(1):692. doi: 10.1186/s12885-025-14101-3.
The relationship between cytokines and lung metastasis (LM) in breast cancer (BC) remains unclear and current clinical methods for identifying breast cancer lung metastasis (BCLM) lack precision, thus underscoring the need for an accurate risk prediction model. This study aimed to apply machine learning algorithms for identifying the key risk factors for BCLM before developing a reliable prediction model centered on cytokines.
This population-based retrospective study included 326 BC patients admitted to the Second Affiliated Hospital of Xuzhou Medical University between September 2018 and September 2023. After randomly assigning the patients to a training cohort (70%; n = 228) or a validation cohort (30%; n = 98) the risk factors for BCLM were identified using Least Absolute Shrinkage and Selection Operator (LASSO), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) models. Significant risk factors were visualized with a Venn diagram and incorporated into a nomogram model, the performance of which was then evaluated according to three criteria, namely discrimination, calibration and clinical utility using calibration plots, receiver operating characteristic (ROC) curves and decision curve analysis (DCA).
Among the cohort, 70 patients developed LM. A nomogram was then developed to predict the 5-year and 10-year BCLM risk by incorporating five key variables, namely endocrine therapy, hsCRP, IL6, IFN-ɑ and TNF-ɑ. For the 5-year prediction model, the training and validation cohorts had AUC values of 0.786 (95% CI: 0.691-0.881) and 0.627 (95% CI: 0.441-0.813), respectively, while for the 10-year prediction model, the corresponding AUC values were 0.687 (95% CI: 0.528-0.847) and 0.797 (95% CI: 0.605-0.988), respectively. ROC analysis further confirmed the model's strong discriminative ability, while calibration plots indicated that the predicted and observed outcomes were in good agreement in both cohorts. Finally, DCA demonstrated the model's effectiveness in clinical practice.
Using machine learning algorithms, this study developed aa nomogram that could effectively identify BC patients who were at a higher risk of developing LM, thus providing a valuable tool for decision-making in clinical settings.
细胞因子与乳腺癌(BC)肺转移(LM)之间的关系尚不清楚,目前用于识别乳腺癌肺转移(BCLM)的临床方法缺乏精确性,因此强调需要一种准确的风险预测模型。本研究旨在应用机器学习算法识别BCLM的关键风险因素,然后开发一个以细胞因子为中心的可靠预测模型。
这项基于人群的回顾性研究纳入了2018年9月至2023年9月在徐州医科大学第二附属医院住院的326例BC患者。在将患者随机分配到训练队列(70%;n = 228)或验证队列(30%;n = 98)后,使用最小绝对收缩和选择算子(LASSO)、极端梯度提升(XGBoost)和随机森林(RF)模型识别BCLM的风险因素。用维恩图可视化显著风险因素,并将其纳入列线图模型,然后根据校准图、受试者工作特征(ROC)曲线和决策曲线分析(DCA)这三个标准评估其性能,即区分度、校准度和临床实用性。
在该队列中,70例患者发生了LM。然后通过纳入五个关键变量,即内分泌治疗、hsCRP、IL6、IFN-α和TNF-α,开发了一个列线图来预测5年和10年BCLM风险。对于5年预测模型,训练队列和验证队列的AUC值分别为0.786(95%CI:0.691 - 0.881)和0.627(95%CI:0.441 - 0.813),而对于10年预测模型,相应的AUC值分别为0.687(95%CI:0.528 - 0.847)和0.797(95%CI:0.605 - 0.988)。ROC分析进一步证实了该模型具有很强的区分能力,而校准图表明两个队列中预测结果与观察结果高度一致。最后,DCA证明了该模型在临床实践中的有效性。
本研究使用机器学习算法开发了一个列线图,该列线图可以有效地识别发生LM风险较高的BC患者,从而为临床决策提供了一个有价值的工具。