Dong Jinkai, Duan Minjie, Liu Xiaozhu, Li Huan, Zhang Yang, Zhang Tingting, Fu Chengwei, Yu Jie, Hu Weike, Peng Shengxian
Senior Department of Urology, the Third Medical Center of PLA General Hospital, Beijing, People's Republic of China.
Medical School of Chinese PLA, Beijing, People's Republic of China.
J Multidiscip Healthc. 2025 Jan 16;18:195-207. doi: 10.2147/JMDH.S480747. eCollection 2025.
The traditional tool for predicting distant metastasis in renal cell carcinoma (RCC) is still insufficient. We aimed to establish an interpretable machine learning model for predicting distant metastasis in RCC patients.
We involved a population-based cohort of 121433 patients (mean age = 63 years; 63.58% men) diagnosed with RCC between 2004 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) database. The lightGBM algorithm was used to develop prediction model and assessed by the area under the receiver-operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The LightGBM model was then externally validated in 36395 RCC patients enrolled from the SEER database between 2016 and 2018. Shapley Additive exPlanations (SHAP) method was applied to provide insights into the model's outcome or prediction.
Of 121433 patients involved in the study cohort, 10730 (8.84%) had distant metastasis. The LightGBM model showed good performance in the internal validation set (AUC: 0.955, 95% CI: 0.951-0.959) and temporal external validation sets (0.963, 95% CI: 0.959-0.967; 0.961, 95% CI: 0.954-0.966). Performance for the prediction model was also well performed in different sub-cohort stratified by age, gender, and ethnicity. The calibration curve indicated that the predicted values are highly consistent with the actual observed values. SHAP plots demonstrated that chemotherapy was the most vital variable for prediction of distant metastasis of RCC patients.
We developed an interpretable machine learning model that is capable of accurately predicting the risk of distant metastasis of RCC patients. The presented model could help identify high-risk patients who require additional treatment strategies and follow-up regimens.
预测肾细胞癌(RCC)远处转移的传统工具仍存在不足。我们旨在建立一个可解释的机器学习模型,用于预测RCC患者的远处转移。
我们纳入了一个基于人群的队列,该队列由2004年至2015年期间从监测、流行病学和最终结果(SEER)数据库中诊断为RCC的121433例患者组成(平均年龄 = 63岁;男性占63.58%)。使用轻梯度提升机(lightGBM)算法开发预测模型,并通过受试者操作特征曲线(AUC)下的面积、准确性、敏感性和特异性进行评估。然后,在2016年至2018年期间从SEER数据库中纳入的36395例RCC患者中对LightGBM模型进行外部验证。应用Shapley值相加解释(SHAP)方法来深入了解模型的结果或预测。
在研究队列的121433例患者中,10730例(8.84%)发生了远处转移。LightGBM模型在内部验证集(AUC:0.955,95%CI:0.951 - 0.959)和时间外部验证集(0.963,95%CI:0.959 - 0.967;0.961,95%CI:0.954 - 0.966)中表现良好。预测模型在按年龄、性别和种族分层的不同亚组中也表现良好。校准曲线表明预测值与实际观察值高度一致。SHAP图显示化疗是预测RCC患者远处转移的最重要变量。
我们开发了一个可解释的机器学习模型,该模型能够准确预测RCC患者远处转移的风险。所提出的模型有助于识别需要额外治疗策略和随访方案的高危患者。