Lee Jeong Hyun, Jeong Jaeyun, Ahn Young Jin, Lee Kwang Suk, Lee Jong Soo, Lee Seung Hwan, Ham Won Sik, Chung Byung Ha, Koo Kyo Chul
Department of Urology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea.
SKTelecom, Seoul 04539, Republic of Korea.
J Pers Med. 2025 Sep 8;15(9):432. doi: 10.3390/jpm15090432.
Accurate survival prediction is essential for optimizing the treatment planning in patients with castration-resistant prostate cancer (CRPC). However, the traditional statistical models often underperform due to limited variable inclusion and an inability to account for complex, multidimensional data interactions. We retrospectively collected 46 clinical, laboratory, and pathological variables from 801 patients with CRPC, covering the disease course from the initial disease diagnosis to CRPC progression. Multiple machine learning (ML) models, including random survival forests (RSFs), XGBoost, LightGBM, and logistic regression, were developed to predict cancer-specific mortality (CSM), overall mortality (OM), and 2- and 3-year survival status. The dataset was split into training and test cohorts (80:20), with 10-fold cross-validation. The performance was assessed using the C-index for regression models and the AUC, accuracy, precision, recall, and F1-score for classification models. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). Over a median follow-up of 24 months, 70.6% of patients experienced CSM. RSFs achieved the highest C-index in the test set for both CSM (0.772) and OM (0.771). For classification tasks, RSFs demonstrated a superior performance in predicting 2-year survival, while XGBoost yielded the highest F1-score for 3-year survival. The SHAP analysis identified time to first-line CRPC treatment and hemoglobin and alkaline phosphatase levels as key predictors of survival outcomes. The RSF and XGBoost ML models demonstrated a superior performance over that of traditional statistical methods in predicting survival in CRPC. These models offer accurate and interpretable prognostic tools that may inform personalized treatment strategies. External validation and the integration of emerging therapies are warranted for broader clinical applicability.
准确的生存预测对于优化去势抵抗性前列腺癌(CRPC)患者的治疗方案至关重要。然而,传统的统计模型往往表现不佳,原因在于纳入的变量有限,且无法考虑复杂的多维数据相互作用。我们回顾性收集了801例CRPC患者的46项临床、实验室和病理变量,涵盖了从疾病初诊到CRPC进展的病程。开发了多种机器学习(ML)模型,包括随机生存森林(RSF)、XGBoost、LightGBM和逻辑回归,以预测癌症特异性死亡率(CSM)、总死亡率(OM)以及2年和3年生存状态。数据集按80:20比例分为训练集和测试集,并进行10折交叉验证。使用回归模型的C指数以及分类模型的AUC、准确率、精确率、召回率和F1分数评估模型性能。使用SHapley加性解释(SHAP)评估模型的可解释性。在中位随访24个月期间,70.6%的患者发生了CSM。RSF在测试集中对CSM(0.772)和OM(0.771)均取得了最高的C指数。对于分类任务,RSF在预测2年生存方面表现出色,而XGBoost在预测3年生存方面获得了最高的F1分数。SHAP分析确定一线CRPC治疗时间、血红蛋白和碱性磷酸酶水平是生存结果的关键预测因素。RSF和XGBoost ML模型在预测CRPC患者生存方面表现优于传统统计方法。这些模型提供了准确且可解释的预后工具,可为个性化治疗策略提供参考。为了更广泛的临床应用,有必要进行外部验证并整合新兴疗法。