Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China.
Medical Big-Data Center, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China.
Cancer Sci. 2024 Nov;115(11):3755-3766. doi: 10.1111/cas.16327. Epub 2024 Sep 2.
This study utilized data from 140,294 prostate cancer cases from the Surveillance, Epidemiology, and End Results (SEER) database. Here, 10 different machine learning algorithms were applied to develop treatment options for predicting patients with prostate cancer, differentiating between surgical and non-surgical treatments. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value. The Shapley Additive Explanations (SHAP) method was employed to investigate the key factors influencing the prediction process. Survival analysis methods were used to compare the survival rates of different treatment options. The CatBoost model yielded the best results (AUC = 0.939, sensitivity = 0.877, accuracy = 0.877). SHAP interpreters revealed that the T stage, cancer stage, age, cores positive percentage, prostate-specific antigen, and Gleason score were the most critical factors in predicting treatment options. The study found that surgery significantly improved survival rates, with patients undergoing surgery experiencing a 20.36% increase in 10-year survival rates compared with those receiving non-surgical treatments. Among surgical options, radical prostatectomy had the highest 10-year survival rate at 89.2%. This study successfully developed a predictive model to guide treatment decisions for prostate cancer. Moreover, the model enhanced the transparency of the decision-making process, providing clinicians with a reference for formulating personalized treatment plans.
本研究利用了来自监测、流行病学和最终结果(SEER)数据库的 140294 例前列腺癌病例的数据。在这里,应用了 10 种不同的机器学习算法来开发针对前列腺癌患者的治疗选择预测模型,区分手术和非手术治疗。通过接收者操作特征曲线(AUC)下面积、准确性、敏感性、特异性、阳性预测值、阴性预测值来衡量算法的性能。使用 Shapley 加法解释(SHAP)方法来研究影响预测过程的关键因素。生存分析方法用于比较不同治疗选择的生存率。CatBoost 模型产生了最佳结果(AUC=0.939,敏感性=0.877,准确性=0.877)。SHAP 解释器表明,T 期、癌症分期、年龄、核心阳性百分比、前列腺特异性抗原和 Gleason 评分是预测治疗选择的最关键因素。研究发现,手术显著提高了生存率,与接受非手术治疗的患者相比,接受手术治疗的患者 10 年生存率提高了 20.36%。在手术选择中,根治性前列腺切除术的 10 年生存率最高,为 89.2%。本研究成功开发了一种预测模型,为前列腺癌的治疗决策提供指导。此外,该模型提高了决策过程的透明度,为临床医生制定个性化治疗计划提供了参考。