Li Yi, Li Handong, Ye Xuan, Zhu Zhigang, Qiu Yixuan
Department of Geriatrics, Hematology and Oncology Ward, The Second Affiliated Hospital, School of Medicine, South China University of Technology, No.1 Panfu Road, Guangzhou, China.
Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China.
Discov Oncol. 2024 Dec 18;15(1):809. doi: 10.1007/s12672-024-01676-9.
With the tremendous leap of various adjuvant therapies, breast cancer (BC)-related deaths have decreased significantly. Increasing attention was focused on the effect of cardiac disease on BC survivors, while limited existing population-based studies lay emphasis on the young age population.
Data of BC patients aged less than 50 years was collected from the SEER database. A competing risk model was introduced to analyze the effects of clinicopathology variables on the cardiac disease-specific death (CDSD) risks of these patients. Further, an XGBoost prediction model was constructed to predict the risk of CDSD. Prediction performance was assessed using the receiver operating characteristic (ROC) analysis, area under the POC curve (AUC) values, calibration curves, decision curves, and confusion matrix, and SHapley Additive exPlanations (SHAP) were used to interpret the models.
Our competing risk analysis proved that young BC patients with older age, low household income, non-metropolitan residential environment, black race, unmarried status, HR + subtype, higher T stage (T2-4), receiving chemotherapy, and non-surgery are under higher risk of CDSD. Further, five machine learning models were constructed to predict the CDSD risks of young BC patients, among which the XGBoost models showed the highest AUC value (train set: AUC = 0.846; test set: AUC = 0.836). The confusion matrix of the XGBoost model demonstrated that the sensitivity, specificity, and correction were 0.81, 0.94, and 0.94 for the train set, and 0.82, 0.95, and 0.96 for the test set, respectively. The SHAP graph indicated that median household income, marital status, race, and age at diagnosis were the top four strongest predictors.
Independent CDSD risk factors for young BC patients were identified, and machine-learning prognostic models were constructed to predict their CDSD risks. Our validation results indicated that the predicted probability of our XGBoost model agrees well with the actual CDSD risks, and it can help recognize high-risk populations and therefore develop effective cardioprotection strategies. Hopefully, our findings can support the growth of the new field of cardio-oncology.
随着各种辅助治疗的巨大飞跃,乳腺癌(BC)相关死亡人数显著下降。人们越来越关注心脏病对BC幸存者的影响,而现有的基于人群的研究中,针对年轻人群的研究有限。
从监测、流行病学和最终结果(SEER)数据库收集年龄小于50岁的BC患者的数据。引入竞争风险模型来分析临床病理变量对这些患者心脏病特异性死亡(CDSD)风险的影响。此外,构建了一个XGBoost预测模型来预测CDSD风险。使用受试者工作特征(ROC)分析、ROC曲线下面积(AUC)值、校准曲线、决策曲线和混淆矩阵评估预测性能,并使用夏普利值附加解释(SHAP)来解释模型。
我们的竞争风险分析证明,年龄较大、家庭收入低、非大都市居住环境、黑人种族、未婚状态、HR +亚型、较高T分期(T2 - 4)、接受化疗且未接受手术的年轻BC患者发生CDSD的风险较高。此外,构建了五个机器学习模型来预测年轻BC患者的CDSD风险,其中XGBoost模型的AUC值最高(训练集:AUC = 0.846;测试集:AUC = 0.836)。XGBoost模型的混淆矩阵显示,训练集的灵敏度、特异性和校正值分别为0.81、0.94和0.94,测试集分别为0.82、0.95和0.96。SHAP图表明家庭收入中位数、婚姻状况、种族和诊断年龄是前四个最强的预测因素。
确定了年轻BC患者独立的CDSD风险因素,并构建了机器学习预后模型来预测他们的CDSD风险。我们的验证结果表明,我们的XGBoost模型的预测概率与实际CDSD风险吻合良好,它可以帮助识别高危人群,从而制定有效的心脏保护策略。希望我们研究结果能够推动心脏肿瘤学这一新兴领域的发展。