Wei Liwei, Huang Yongdi, Chen Zheng, Lei Hongyu, Qin Xiaoping, Cui Lihong, Zhuo Yumin
Department of Urology, the First Affiliated Hospital of Jinan University, Guangzhou, China.
College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China.
Front Oncol. 2021 Oct 14;11:763381. doi: 10.3389/fonc.2021.763381. eCollection 2021.
A more accurate preoperative prediction of lymph node involvement (LNI) in prostate cancer (PCa) would improve clinical treatment and follow-up strategies of this disease. We developed a predictive model based on machine learning (ML) combined with big data to achieve this.
Clinicopathological characteristics of 2,884 PCa patients who underwent extended pelvic lymph node dissection (ePLND) were collected from the U.S. National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2015. Eight variables were included to establish an ML model. Model performance was evaluated by the receiver operating characteristic (ROC) curves and calibration plots for predictive accuracy. Decision curve analysis (DCA) and cutoff values were obtained to estimate its clinical utility.
Three hundred and forty-four (11.9%) patients were identified with LNI. The five most important factors were the Gleason score, T stage of disease, percentage of positive cores, tumor size, and prostate-specific antigen levels with 158, 137, 128, 113, and 88 points, respectively. The XGBoost (XGB) model showed the best predictive performance and had the highest net benefit when compared with the other algorithms, achieving an area under the curve of 0.883. With a 5%~20% cutoff value, the XGB model performed best in reducing omissions and avoiding overtreatment of patients when dealing with LNI. This model also had a lower false-negative rate and a higher percentage of ePLND was avoided. In addition, DCA showed it has the highest net benefit across the whole range of threshold probabilities.
We established an ML model based on big data for predicting LNI in PCa, and it could lead to a reduction of approximately 50% of ePLND cases. In addition, only ≤3% of patients were misdiagnosed with a cutoff value ranging from 5% to 20%. This promising study warrants further validation by using a larger prospective dataset.
对前列腺癌(PCa)淋巴结受累情况(LNI)进行更准确的术前预测,将改善该疾病的临床治疗和随访策略。我们基于机器学习(ML)结合大数据开发了一种预测模型来实现这一目标。
从美国国立癌症研究所的监测、流行病学和最终结果(SEER)数据库中收集了2010年至2015年期间接受扩大盆腔淋巴结清扫术(ePLND)的2884例PCa患者的临床病理特征。纳入八个变量建立ML模型。通过受试者操作特征(ROC)曲线和校准图评估模型性能以预测准确性。获得决策曲线分析(DCA)和临界值以评估其临床实用性。
344例(11.9%)患者被确诊为LNI。五个最重要的因素分别是 Gleason评分、疾病T分期、阳性核心百分比、肿瘤大小和前列腺特异性抗原水平,分别为158分、137分、128分、113分和88分。与其他算法相比,XGBoost(XGB)模型显示出最佳的预测性能和最高的净效益,曲线下面积达到0.883。在5%至20%的临界值下,XGB模型在处理LNI时减少漏诊和避免患者过度治疗方面表现最佳。该模型还具有较低的假阴性率,避免ePLND的比例更高。此外,DCA显示在整个阈值概率范围内它具有最高的净效益。
我们基于大数据建立了一种用于预测PCa中LNI的ML模型,它可使ePLND病例减少约50%。此外,在5%至20%的临界值范围内,只有≤3%的患者被误诊。这项有前景的研究值得使用更大的前瞻性数据集进行进一步验证。