School of Public Health, Xiamen University, Xiang'an South Road, Xiang'an District, Xiamen, Fujian 361102, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen, Fujian, China; School of Nursing, Faculty of Health and Social Sciences, The Hong Kong Polytechnic University, Hong Kong SAR, China.
School of Public Health, Xiamen University, Xiang'an South Road, Xiang'an District, Xiamen, Fujian 361102, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen, Fujian, China.
Comput Methods Programs Biomed. 2024 Sep;254:108310. doi: 10.1016/j.cmpb.2024.108310. Epub 2024 Jun 25.
Studies have found that first primary cancer (FPC) survivors are at high risk of developing second primary breast cancer (SPBC). However, there is a lack of prognostic studies specifically focusing on patients with SPBC.
This retrospective study used data from Surveillance, Epidemiology and End Results Program. We selected female FPC survivors diagnosed with SPBC from 12 registries (from January 1998 to December 2018) to construct prognostic models. Meanwhile, SPBC patients selected from another five registries (from January 2010 to December 2018) were used as the validation set to test the model's generalization ability. Four machine learning models and a Cox proportional hazards regression (CoxPH) were constructed to predict the overall survival of SPBC patients. Univariate and multivariate Cox regression analyses were used for feature selection. Model performance was assessed using time-dependent area under the ROC curve (t-AUC) and integrated Brier score (iBrier).
A total of 10,321 female FPC survivors with SPBC (mean age [SD]: 66.03 [11.17]) were included for model construction. These patients were randomly split into a training set (mean age [SD]: 65.98 [11.15]) and a test set (mean age [SD]: 66.15 [11.23]) with a ratio of 7:3. In validation set, a total of 3,638 SPBC patients (mean age [SD]: 66.28 [10.68]) were finally enrolled. Sixteen features were selected for model construction through univariate and multivariable Cox regression analyses. Among five models, random survival forest model showed excellent performance with a t-AUC of 0.805 (95 %CI: 0.803 - 0.807) and an iBrier of 0.123 (95 %CI: 0.122 - 0.124) on testing set, as well as a t-AUC of 0.803 (95 %CI: 0.801 - 0.807) and an iBrier of 0.098 (95 %CI: 0.096 - 0.103) on validation set. Through feature importance ranking, the top one and other top five key predictive features of the random survival forest model were identified, namely age, stage, regional nodes positive, latency, radiotherapy, and surgery.
The random survival forest model outperformed CoxPH and other machine learning models in predicting the overall survival of patients with SPBC, which was helpful for the monitoring of high-risk populations.
研究发现,首发原发性癌症(FPC)幸存者发生第二原发乳腺癌(SPBC)的风险较高。然而,目前缺乏专门针对 SPBC 患者的预后研究。
本回顾性研究使用了来自监测、流行病学和最终结果计划(Surveillance, Epidemiology and End Results Program)的数据。我们从 12 个登记处(1998 年 1 月至 2018 年 12 月)中选择了诊断为 SPBC 的 FPC 女性幸存者,构建了预后模型。同时,从另外 5 个登记处(2010 年 1 月至 2018 年 12 月)中选择了 SPBC 患者作为验证集,以测试模型的泛化能力。构建了四个机器学习模型和一个 Cox 比例风险回归(CoxPH)来预测 SPBC 患者的总体生存率。使用单变量和多变量 Cox 回归分析进行特征选择。使用时间依赖性 ROC 曲线下面积(t-AUC)和综合 Brier 评分(iBrier)评估模型性能。
共纳入 10321 名 FPC 伴 SPBC 的女性(平均年龄[标准差]:66.03[11.17])用于模型构建。这些患者被随机分为训练集(平均年龄[标准差]:65.98[11.15])和测试集(平均年龄[标准差]:66.15[11.23]),比例为 7:3。在验证集中,最终纳入了 3638 名 SPBC 患者(平均年龄[标准差]:66.28[10.68])。通过单变量和多变量 Cox 回归分析,选择了 16 个特征进行模型构建。在五个模型中,随机生存森林模型在测试集上表现出色,t-AUC 为 0.805(95%CI:0.803-0.807),iBrier 为 0.123(95%CI:0.122-0.124),在验证集上,t-AUC 为 0.803(95%CI:0.801-0.807),iBrier 为 0.098(95%CI:0.096-0.103)。通过特征重要性排名,确定了随机生存森林模型的前一个和其他前五个关键预测特征,即年龄、分期、区域淋巴结阳性、潜伏期、放疗和手术。
随机生存森林模型在预测 SPBC 患者的总体生存率方面优于 CoxPH 和其他机器学习模型,有助于对高危人群的监测。