Department of President's Office, Youjiang Medical University for Nationalities, Baise, China.
Department of ECG Diagnostics, The People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, China.
Sci Rep. 2024 Nov 6;14(1):26969. doi: 10.1038/s41598-024-77988-1.
This study aimed to construct and assess a machine-learning algorithm designed to forecast survival rates and risk stratification for patients with gastric neuroendocrine neoplasms (gNENs) after diagnosis. Data on patients with gNENs were extracted and randomly divided into training and validation sets using the Surveillance, Epidemiology, and End Results database. We developed a prediction model using 10 machine learning algorithms across 101 combinations to forecast cancer-related mortality in patients with gNENs, selecting the best model using the highest mean over a sequence of time-dependent area under the receiver operating characteristic (ROC) curve (AUC). The performance of the final model was assessed through time-dependent ROC curves for discrimination and calibration curves for calibration. The maximum selection rank method was used to determine the best prognostic risk score threshold for classifying patients into high- and low-risk groups. Afterward, Kaplan-Meier analysis and log-rank test were used to compare survival rates among these groups. Our study examined 775 patients with gNENs, dividing them into training and validation sets. A training set comprised 543 patients, with a median follow-up of 42 months and cumulative mortality rates of 40.0% at 1 year, 48.6% at 3 years, and 54.0% at 5 years. A validation set comprised 232 patients, with cumulative mortality rates of 29.1% at 1 year, 43.5% at 3 years, and 53.2% at 5 years. The optimal random survival forest (RSF) model (mtry = 4, node size = 5) achieved an AUC of 0.839 for survival prediction in the training set. Comprising 11 variables such as demographics, treatment details, tumor characteristics, T staging, N staging, and M staging, the RSF model revealed high predictive accuracy with AUCs of 0.92, 0.96, and 0.96 for 1-, 3-, and 5-year survival, respectively, which was consistently reflected in the validation set with AUCs of 0.88, 0.92, and 0.89, respectively. Moreover, patients were risk-stratified. Although our RSF model effectively stratified patients into different prognostic groups, it needs external validation to confirm its utility for noninvasive prognostic prediction and risk stratification in gNENs. Further research is required to verify its broader clinical applicability.
本研究旨在构建并评估一种机器学习算法,用于预测胃神经内分泌肿瘤(gNEN)患者的生存率和风险分层。使用 Surveillance, Epidemiology, and End Results 数据库提取并随机将 gNEN 患者数据分为训练集和验证集。我们使用 10 种机器学习算法和 101 种组合开发了一种预测模型,以预测 gNEN 患者的癌症相关死亡率,通过时间依赖性接受者操作特征(ROC)曲线(AUC)序列中的最高平均值选择最佳模型。通过时间依赖性 ROC 曲线评估最终模型的区分能力,通过校准曲线评估校准能力。使用最大选择秩方法确定最佳预后风险评分阈值,将患者分为高风险和低风险组。然后,进行 Kaplan-Meier 分析和对数秩检验,比较这些组之间的生存率。我们的研究共纳入 775 例 gNEN 患者,将其分为训练集和验证集。训练集包括 543 例患者,中位随访时间为 42 个月,1 年累积死亡率为 40.0%,3 年累积死亡率为 48.6%,5 年累积死亡率为 54.0%。验证集包括 232 例患者,1 年累积死亡率为 29.1%,3 年累积死亡率为 43.5%,5 年累积死亡率为 53.2%。最优随机生存森林(RSF)模型(mtry=4,node size=5)在训练集中的生存预测中获得了 0.839 的 AUC。该模型包含 11 个变量,如人口统计学特征、治疗细节、肿瘤特征、T 分期、N 分期和 M 分期,其 AUC 分别为 0.92、0.96 和 0.96,用于预测 1 年、3 年和 5 年的生存率,在验证集中的 AUC 分别为 0.88、0.92 和 0.89,结果一致。此外,患者被进行了风险分层。虽然我们的 RSF 模型能够有效地将患者分为不同的预后组,但仍需要外部验证来确认其在 gNEN 中的非侵入性预后预测和风险分层的实用性。需要进一步研究以验证其更广泛的临床适用性。