Jiang Weixing, Chen Zhenghao, Chen Cancan, Wang Lei, Han Tiandong, Wen Li
Department of Urology, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
Department of Urology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
Transl Androl Urol. 2024 Jan 31;13(1):53-63. doi: 10.21037/tau-23-319. Epub 2024 Jan 23.
The clinical prognosis assessment of renal cell carcinoma (RCC) still relies on nuclear grading and nuclear score by naked eye with microscope, which has defects long time, low efficiency, and uneven evaluation level criteria. There are few machine learning (ML) studies investigating the prognosis in the RCC literature which could also quantify the risk of postoperative recurrence of RCC patients and guide cancer patients to conduct individualized postoperative clinical management. This study evaluated the suitability of ML algorithms for survival prediction in patients with RCC.
A total of 192,912 RCC patients from the Surveillance, Epidemiology, and End Results (SEER) were obtained from 2004 to 2015. Six ML algorithms including support vector machine (SVM), Bayesian method, decision tree, random forest, neural network, and Extreme Gradient Boosting (XGBoost) were applied to predict overall survival (OS) of RCC.
Patients from the SEER with a median age of 62 years and the pathological types were clear cell RCC (47.6%), papillary RCC (9.5%), chromophobe RCC (4.0%) and others (4.1%) were collected. In the deleting patients with missing data, the highest accurate model was XGBoost [area under the curve (AUC) 67.0%]. In the deleting patients with missing data and survival time <5 years, the accuracy of random forest, neural network and XGBoost were high, with AUC of 80.8%, 81.5% and 81.8%, respectively. In the only deleting the missing tumor diameter and filling the missing dataset with missForest, the highest accurate model was random forest (AUC: 71.9%). In this study, the overall accuracy of the SVM model was not high, apart from in the population of patients with deleting the missing tumor diameter and survival time <5 years, and filling the missing data with missForest. Random forest, neural network and XGBoost had high accuracy, with AUC of 84.1%, 84.7% and 84.8%, respectively.
ML algorithms could be used to predict the prognosis of RCC. It could quantify the recurrence possibility of patients and help more individualized postoperative clinical management. Given the limitations and complexity of datasets, ML may be used as an auxiliary tool to analyze and process larger datasets and complex data.
肾细胞癌(RCC)的临床预后评估仍依赖于肉眼和显微镜下的核分级及核评分,长期存在缺陷,效率低下且评估水平标准不统一。在RCC文献中,很少有机器学习(ML)研究来调查预后情况,这些研究也无法量化RCC患者术后复发的风险,以及指导癌症患者进行个体化的术后临床管理。本研究评估了ML算法对RCC患者生存预测的适用性。
从监测、流行病学和最终结果(SEER)数据库中获取了2004年至2015年期间共192,912例RCC患者。应用包括支持向量机(SVM)、贝叶斯方法、决策树、随机森林、神经网络和极端梯度提升(XGBoost)在内的六种ML算法来预测RCC的总生存期(OS)。
收集了来自SEER的患者,中位年龄为62岁,病理类型为透明细胞RCC(47.6%)、乳头状RCC(9.5%)、嫌色细胞RCC(4.0%)和其他类型(4.1%)。在删除有缺失数据的患者后,准确率最高的模型是XGBoost[曲线下面积(AUC)为67.0%]。在删除有缺失数据且生存时间<5年的患者后,随机森林、神经网络和XGBoost的准确率较高,AUC分别为80.8%、81.5%和81.8%。在仅删除缺失的肿瘤直径并用missForest填充缺失数据集的情况下,准确率最高的模型是随机森林(AUC:71.9%)。在本研究中,SVM模型的总体准确率不高,除了在删除缺失肿瘤直径且生存时间<5年并用missForest填充缺失数据的患者群体中。随机森林、神经网络和XGBoost的准确率较高,AUC分别为84.1%、84.7%和84.8%。
ML算法可用于预测RCC的预后。它可以量化患者的复发可能性,并有助于更个体化的术后临床管理。鉴于数据集的局限性和复杂性,ML可作为一种辅助工具来分析和处理更大的数据集及复杂数据。