Zhou Huali, Gu Qiong, Bao Rong, Qiu Liping, Zhang Yuhan, Wang Fang, Liu Wenlian, Wu Lingling, Li Li, Ren Yihua, Qiu Lei, Wang Qian, Zhang Gaomin, Qiao Xiaoqing, Yuan Wenjie, Ren Juan, Luo Min, Huang Rong, Yang Qing
School of Nursing, Chengdu Medical College, Chengdu, China.
Department of Gastric Surgery, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.
Front Oncol. 2025 Jan 13;14:1503047. doi: 10.3389/fonc.2024.1503047. eCollection 2024.
Presentation delay of cancer patients prevents the patient from timely diagnosis and treatment leading to poor prognosis. Predicting the risk of presentation delay is crucial to improve the treatment outcomes. This study aimed to develop and validate prediction models of presentation delay risk in gastric cancer patients by using various machine learning models.
875 cases of gastric cancer patients admitted to a tertiary oncology hospital from July 2023 to June 2024 were used as derivation cohort, 200 cases of gastric cancer patients admitted to other 4 tertiary hospital were used as external validation cohort. After collecting the data, statistical analysis was performed to identify discriminative variables for the prediction of presentation delay and 13 statistically significant variables are selected to develop machine learning models. The derivation cohort was randomly assigned to the training and internal validation set by the ratio of 7:3. Prediction models were developed based on six machine learning algorithms, which are logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosted trees (GBDT), extremely gradient boosting (XGBoost) and muti-layer perceptron (MLP). The discrimination and calibration of each model were assessed based on various metrics including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-Score and area under curve (AUC), calibration curves and Brier scores. The best model was selected based on comparing of various metrics. Based on the selected best model, the impact of features to the prediction result was analyzed with the permutation feature importance method.
The incidence of presentation delay for gastric cancer patients was 39.3%. The developed models achieved performance metrics as AUC (0.893-0.925), accuracy (0.817-0.847), sensitivity (0.857-0.905), specificity (0.783-0.854), PPV (0.728-0.798), NPV (0.897-0.927), F1 score (0.791-0.826) and Brier score (0.107-0.138) in internal validation set, which indicated good discrimination and calibration for the prediction of presentation delay in gastric cancer patients. Among all models, RF based model was selected as the best one as it achieved good discrimination and calibration performance on both of internal and external validation set. Feature ranking results indicated that both of subjective and objective factors have significant impact on the occurrence of presentation delay in gastric cancer patients.
This study demonstrated that the RF based model has favorable performance for the prediction of presentation delay in gastric cancer patients. It can help medical staffs to screen out high-risk gastric cancer patients for presentation delay, and to take appropriate and specific interventions to reduce the risk of presentation delay.
癌症患者的就诊延迟会妨碍患者及时诊断和治疗,导致预后不良。预测就诊延迟风险对于改善治疗结果至关重要。本研究旨在通过使用各种机器学习模型来开发和验证胃癌患者就诊延迟风险的预测模型。
将2023年7月至2024年6月在一家三级肿瘤医院收治的875例胃癌患者作为推导队列,将在其他4家三级医院收治的200例胃癌患者作为外部验证队列。收集数据后,进行统计分析以确定用于预测就诊延迟的判别变量,并选择13个具有统计学意义的变量来开发机器学习模型。推导队列按7:3的比例随机分配到训练集和内部验证集。基于六种机器学习算法开发预测模型,即逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)、梯度提升树(GBDT)、极端梯度提升(XGBoost)和多层感知器(MLP)。基于各种指标评估每个模型的判别力和校准度,包括准确性、敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)、F1分数和曲线下面积(AUC)、校准曲线和布里尔分数。通过比较各种指标选择最佳模型。基于所选的最佳模型,使用排列特征重要性方法分析特征对预测结果的影响。
胃癌患者就诊延迟的发生率为39.3%。所开发的模型在内部验证集中的性能指标为AUC(0.893 - 0.925)、准确性(0.817 - 0.847)、敏感性(0.857 - 0.905)、特异性(0.783 - 0.854)、PPV(0.728 - 0.798)、NPV(0.897 - 0.927)、F1分数(0.791 - 0.826)和布里尔分数(0.107 - 0.138),这表明在预测胃癌患者就诊延迟方面具有良好的判别力和校准度。在所有模型中,基于RF的模型被选为最佳模型,因为它在内部和外部验证集上均具有良好的判别力和校准性能。特征排序结果表明,主观和客观因素均对胃癌患者就诊延迟的发生有显著影响。
本研究表明,基于RF的模型在预测胃癌患者就诊延迟方面具有良好的性能。它可以帮助医务人员筛查出就诊延迟的高危胃癌患者,并采取适当的针对性干预措施以降低就诊延迟风险。