Jin Yudi, Su Tong, Fan Yanjia, Zheng Yineng, Tian Cheng, Ouyang Zubin, Lv Fajin
Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
BMC Med Inform Decis Mak. 2025 Jul 25;25(1):277. doi: 10.1186/s12911-025-03086-5.
Breast cancer is a prevalent malignancy globally, with approximately 1 in 10 breast cancer patients at risk of developing additional primary malignant tumors. This study seeks to explore the risk factors linked to the development of multiple primary cancers (MPCs) in breast cancer patients and to develop predictive models to aid in clinical decision-making.
A cohort of patients from the Surveillance, Epidemiology, and End Results (SEER) database was analyzed to identify key factors contributing to the occurrence of MPCs. Machine learning models, including logistic regression and random forest, were established and tested to predict the risk of developing multiple primary cancers.
A total of 120,434 breast cancer patients were included in the study. After random undersampling of the majority calss and random selected a quarter of populations, there were 3432 patients in each of the one primary breast cancer (OPBC) group and the MPCs group. A logistic regression and a random forest model were constructed based on age, marital status, laterality, histological type, tumor grade, American Joint Committee on Cance (AJCC) stage, T and N stage, molecular subtype, surgery, chemotherapy, and radiotherapy. The logistic regression model achieved an area under the curve (AUC) of 0.902, a specificity of 0.905, and a sensitivity of 0.767 in the training set, and an AUC of 0.886, a specificity of 0.882, and a sensitivity of 0.782 In the testing set. The random forest model achieved an AUC of 0.955, a specificity of 0.916, and a sensitivity of 0.859 in the training set, and an AUC of 0.874, a specificity of 0.858, and a sensitivity of 0.769 in the testing set. A nomogram was plotted based on the logistic regression model. The Kaplan-Meier (K-M) curves demonstrated statistically significant differences in prognosis among the various risk groups that were stratified based on the nomogram.
This study assessed several risk factors influencing the development of MPCs in breast cancer patients. The machine learning model could offer a practical tool for personalized risk assessment in this patient population.
乳腺癌是全球一种常见的恶性肿瘤,约十分之一的乳腺癌患者有发生其他原发性恶性肿瘤的风险。本研究旨在探讨与乳腺癌患者发生多原发性癌症(MPCs)相关的危险因素,并开发预测模型以辅助临床决策。
对来自监测、流行病学和最终结果(SEER)数据库的一组患者进行分析,以确定导致MPCs发生的关键因素。建立并测试了包括逻辑回归和随机森林在内的机器学习模型,以预测发生多原发性癌症的风险。
本研究共纳入120434例乳腺癌患者。在对多数类进行随机欠采样并随机选择四分之一的人群后,单原发性乳腺癌(OPBC)组和MPCs组各有3432例患者。基于年龄、婚姻状况、患侧、组织学类型、肿瘤分级、美国癌症联合委员会(AJCC)分期、T和N分期、分子亚型、手术、化疗和放疗构建了逻辑回归模型和随机森林模型。逻辑回归模型在训练集中的曲线下面积(AUC)为0.902,特异性为0.905,敏感性为0.767;在测试集中的AUC为0.886,特异性为0.882,敏感性为0.782。随机森林模型在训练集中的AUC为0.955,特异性为0.916,敏感性为0.859;在测试集中的AUC为0.874,特异性为0.858,敏感性为0.769。基于逻辑回归模型绘制了列线图。Kaplan-Meier(K-M)曲线显示,根据列线图分层的不同风险组之间的预后存在统计学显著差异。
本研究评估了影响乳腺癌患者发生MPCs的几个危险因素。机器学习模型可为该患者群体进行个性化风险评估提供实用工具。