Li Lin-Wei, Liu Xin, Shen Meng-Lu, Zhao Meng-Jun, Liu Hong
The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer Tianjin 300060, China.
Tianjin's Clinical Research Center for Cancer Tianjin 300060, China.
Am J Cancer Res. 2024 Apr 15;14(4):1609-1621. doi: 10.62347/OJTY4008. eCollection 2024.
Young breast cancer (YBC) patients often face a poor prognosis, hence it's necessary to construct a model that can accurately predict their long-term survival in early stage. To realize this goal, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) databases between January 2010 and December 2020, and meanwhile, enrolled an independent external cohort from Tianjin Medical University Cancer Institute and Hospital. The study aimed to develop and validate a prediction model constructed using the Random Survival Forest (RSF) machine learning algorithm. By applying the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis, we pinpointed key prognostic factors for YBC patients, which were used to create a prediction model capable of forecasting the 3-year, 5-year, 7-year, and 10-year survival rates of YBC patients. The RSF model constructed in the study demonstrated exceptional performance, achieving C-index values of 0.920 in the training set, 0.789 in the internal validation set, and 0.701 in the external validation set, outperforming the Cox regression model. The model's calibration was confirmed by Brier scores at various time points, showcasing its excellent accuracy in prediction. Decision curve analysis (DCA) underscored the model's importance in clinical application, and the Shapley Additive Explanations (SHAP) plots highlighted the importance of key variables. The RSF model also proved valuable in risk stratification, which has effectively categorized patients based on their survival risks. In summary, this study has constructed a well-performed prediction model for the evaluation of prognostic factors influencing the long-term survival of early-stage YBC patients, which is significant in risk stratification when physicians handle YBC patients in clinical settings.
年轻乳腺癌(YBC)患者往往预后较差,因此有必要构建一个能够在早期准确预测其长期生存情况的模型。为实现这一目标,我们利用了2010年1月至2020年12月期间监测、流行病学和最终结果(SEER)数据库的数据,同时,从天津医科大学肿瘤研究所和医院纳入了一个独立的外部队列。本研究旨在开发并验证一个使用随机生存森林(RSF)机器学习算法构建的预测模型。通过应用最小绝对收缩和选择算子(LASSO)回归分析,我们确定了YBC患者的关键预后因素,这些因素被用于创建一个能够预测YBC患者3年、5年、7年和10年生存率的预测模型。本研究构建的RSF模型表现出色,在训练集中的C指数值为0.920,在内部验证集中为0.789,在外部验证集中为0.701,优于Cox回归模型。该模型在各个时间点的Brier评分证实了其校准情况,显示出其在预测方面的出色准确性。决策曲线分析(DCA)强调了该模型在临床应用中的重要性,而Shapley附加解释(SHAP)图突出了关键变量的重要性。RSF模型在风险分层方面也被证明是有价值的,它有效地根据患者的生存风险对患者进行了分类。总之,本研究构建了一个性能良好的预测模型,用于评估影响早期YBC患者长期生存的预后因素,这在临床医生处理YBC患者时进行风险分层具有重要意义。