Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Asian Pac J Cancer Prev. 2024 Jan 1;25(1):333-342. doi: 10.31557/APJCP.2024.25.1.333.
Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths. This study aimed to predict survival outcomes of CRC patients using machine learning (ML) methods.
A retrospective analysis included 1853 CRC patients admitted to three prominent tertiary hospitals in Iran from October 2006 to July 2019. Six ML methods, namely logistic regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (DT), and Light Gradient Boosting Machine (LGBM), were developed with 10-fold cross-validation. Feature selection employed the Random Forest method based on mean decrease GINI criteria. Model performance was assessed using Area Under the Curve (AUC).
Time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type emerged as crucial predictors of survival based on mean decrease GINI. The NB (AUC = 0.70, 95% Confidence Interval [CI] 0.65-0.75) and LGBM (AUC = 0.70, 95% CI 0.65-0.75) models achieved the highest predictive AUC values for CRC patient survival.
This study highlights the significance of variables including time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type in predicting CRC survival. The NB model exhibited optimal efficacy in mortality prediction, maintaining a balanced sensitivity and specificity. Policy recommendations encompass early diagnosis and treatment initiation for CRC patients, improved data collection through digital health records and standardized protocols, support for predictive analytics integration in clinical decisions, and the inclusion of identified prognostic variables in treatment guidelines to enhance patient outcomes.
结直肠癌(CRC)是癌症相关死亡的第二大主要原因。本研究旨在使用机器学习(ML)方法预测 CRC 患者的生存结果。
回顾性分析纳入了 2006 年 10 月至 2019 年 7 月期间伊朗三家知名三级医院收治的 1853 例 CRC 患者。采用 10 折交叉验证开发了 6 种 ML 方法,包括逻辑回归(LR)、朴素贝叶斯(NB)、支持向量机(SVM)、神经网络(NN)、决策树(DT)和轻梯度提升机(LGBM)。特征选择采用基于基尼减少均值的随机森林方法。使用曲线下面积(AUC)评估模型性能。
基于基尼减少均值,从诊断到生存时间、年龄、肿瘤大小、转移状态、淋巴结受累和治疗类型被确定为生存的关键预测因子。NB(AUC = 0.70,95%置信区间 [CI] 0.65-0.75)和 LGBM(AUC = 0.70,95%CI 0.65-0.75)模型对 CRC 患者生存的预测 AUC 值最高。
本研究强调了从诊断到生存时间、年龄、肿瘤大小、转移状态、淋巴结受累和治疗类型等变量在预测 CRC 生存中的重要性。NB 模型在死亡率预测方面表现出最佳效果,保持了敏感和特异性的平衡。政策建议包括对 CRC 患者进行早期诊断和治疗启动、通过数字健康记录和标准化方案改进数据收集、支持预测分析在临床决策中的整合、以及将确定的预后变量纳入治疗指南以改善患者结局。