Department of Urology, the Affiliated Hospital of Xuzhou Medical University, Xuzhou, China.
Nanjing First Hospital, Nanjing, China.
Front Public Health. 2022 Jun 29;10:916513. doi: 10.3389/fpubh.2022.916513. eCollection 2022.
Distant metastasis other than non-regional lymph nodes and lung (i.e., M1b stage) significantly contributes to the poor survival prognosis of patients with germ cell testicular cancer (GCTC). The aim of this study was to develop a machine learning (ML) algorithm model to predict the risk of patients with GCTC developing the M1b stage, which can be used to assist in early intervention of patients.
The clinical and pathological data of patients with GCTC were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Combing the patient's characteristic variables, we applied six machine learning (ML) algorithms to develop the predictive models, including logistic regression(LR), eXtreme Gradient Boosting (XGBoost), light Gradient Boosting Machine (lightGBM), random forest (RF), multilayer perceptron (MLP), and k-nearest neighbor (kNN). Model performances were evaluated by 10-fold cross-receiver operating characteristic (ROC) curves, which calculated the area under the curve (AUC) of models for predictive accuracy. A total of 54 patients from our own center (October 2006 to June 2021) were collected as the external validation cohort.
A total of 4,323 patients eligible for inclusion were screened for enrollment from the SEER database, of which 178 (4.12%) developing M1b stage. Multivariate logistic regression showed that lymph node dissection (LND), T stage, N stage, lung metastases, and distant lymph node metastases were the independent predictors of developing M1b stage risk. The models based on both the XGBoost and RF algorithms showed stable and efficient prediction performance in the training and external validation groups.
S-stage is not an independent factor for predicting the risk of developing the M1b stage of patients with GCTC. The ML models based on both XGBoost and RF algorithms have high predictive effectiveness and may be used to predict the risk of developing the M1b stage of patients with GCTC, which is of promising value in clinical decision-making. Models still need to be tested with a larger sample of real-world data.
远处转移(非区域淋巴结和肺)(即 M1b 期)显著影响生殖细胞睾丸癌(GCTC)患者的生存预后。本研究旨在开发机器学习(ML)算法模型来预测 GCTC 患者发生 M1b 期的风险,以协助对患者进行早期干预。
从监测、流行病学和最终结果(SEER)数据库中获取 GCTC 患者的临床和病理数据。结合患者的特征变量,我们应用 6 种机器学习(ML)算法来开发预测模型,包括逻辑回归(LR)、极端梯度提升(XGBoost)、轻梯度提升机(lightGBM)、随机森林(RF)、多层感知机(MLP)和 K-最近邻(kNN)。通过 10 折交叉接收器操作特征(ROC)曲线评估模型性能,计算模型预测准确性的曲线下面积(AUC)。我们还收集了来自我们中心的 54 名患者(2006 年 10 月至 2021 年 6 月)作为外部验证队列。
从 SEER 数据库中筛选出符合纳入标准的 4323 名患者,其中 178 名(4.12%)发生 M1b 期。多变量逻辑回归显示,淋巴结清扫术(LND)、T 分期、N 分期、肺转移和远处淋巴结转移是发生 M1b 期风险的独立预测因素。基于 XGBoost 和 RF 算法的模型在训练和外部验证组中表现出稳定和高效的预测性能。
S 期不是预测 GCTC 患者发生 M1b 期风险的独立因素。基于 XGBoost 和 RF 算法的 ML 模型具有较高的预测效果,可用于预测 GCTC 患者发生 M1b 期的风险,在临床决策中具有潜在价值。模型仍需要使用更大的真实世界数据样本进行测试。