Seo Ji Won, Park Ki Bum, Lim Seung Taek, Jun Kyong Hwa, Chin Hyung Min
Department of Surgery, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea Seoul, Republic of Korea.
Am J Cancer Res. 2024 Aug 25;14(8):3842-3851. doi: 10.62347/KREL8138. eCollection 2024.
The prognosis of early gastric cancer (EGC) patients is associated with lymph node metastasis (LNM). Considering the relatively high rate of LNM in T1b EGC patients, it is crucial to determine the factors associated with LNM. In this study, we constructed and validated predictive models based on machine learning (ML) algorithms for LNM in patients with T1b EGC. Data from patients with T1b gastric cancer were extracted from the Korean Gastric Cancer Association database. ML algorithms such as logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM) were applied for model construction utilizing five-fold cross-validation. The performances of these models were assessed in terms of discrimination, calibration, and clinical applicability. Moreover, external validation of XGBoost models was performed using the T1b gastric cancer database of The Catholic University Medical Center. In total, 3,468 T1b EGC patients were included in the analysis, whom 550 (15.9%) had LNM. Eleven variables were selected to construct the models. The LR, RF, XGBoost, and SVM models were established, revealing area under the receiver operating characteristic curve values of 0.8284, 0.7921, 0.8776, and 0.8323, respectively. Among the models, the XGBoost model exhibited the best predictive performance in terms of discrimination, calibration, and clinical applicability. ML models are reliable for predicting LNM in T1b EGC patients. The XGBoost model exhibited the best predictive performance and can be used by surgeons for the identification of EGC patients with a high-risk of LNM, thereby facilitating treatment selection.
早期胃癌(EGC)患者的预后与淋巴结转移(LNM)相关。鉴于T1b期EGC患者的LNM发生率相对较高,确定与LNM相关的因素至关重要。在本研究中,我们构建并验证了基于机器学习(ML)算法的T1b期EGC患者LNM预测模型。T1b期胃癌患者的数据从韩国胃癌协会数据库中提取。使用逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)和支持向量机(SVM)等ML算法,采用五折交叉验证进行模型构建。这些模型的性能通过区分度、校准度和临床适用性进行评估。此外,使用天主教大学医学中心的T1b期胃癌数据库对XGBoost模型进行外部验证。分析共纳入3468例T1b期EGC患者,其中550例(15.9%)发生LNM。选择11个变量构建模型。建立了LR、RF、XGBoost和SVM模型,其受试者操作特征曲线下面积值分别为0.8284、0.7921、0.8776和0.8323。在这些模型中,XGBoost模型在区分度、校准度和临床适用性方面表现出最佳的预测性能。ML模型在预测T1b期EGC患者的LNM方面是可靠的。XGBoost模型表现出最佳的预测性能,外科医生可使用该模型识别具有高LNM风险的EGC患者,从而便于治疗选择。