Zhou Yongjie, Zhao Jinhong, Zou Fei, Tan Yongming, Zeng Wei, Jiang Jiahui, Hu Jiale, Zeng Qiao, Gong Lianggeng, Liu Lan, Zhong Linhua
Department of Radiology, Jiangxi Cancer Hospital & Institute, Jiangxi Clinical Research Center for Cancer, The Second Affiliated Hospital of Nanchang Medical College, Nanchang, China.
Department of Radiology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China.
Comput Methods Programs Biomed. 2025 Sep;269:108874. doi: 10.1016/j.cmpb.2025.108874. Epub 2025 May 22.
Colorectal cancer (CRC) ranks among the most prevalent cancers worldwide, with early postoperative recurrence remaining a major cause of mortality. Body composition and inflammatory-nutritional indices (BCINI) have demonstrated potential in reflecting patients' physiological states; however, their association with early recurrence (ER) after CRC resection remains unclear. This study aimed to establish and validate interpretable machine learning (ML) models based on BCINI to predict ER after CRC resection.
Data from three hospitals were collected, including CT-based body composition metrics and blood test variables. After variable selection, six ML algorithms-XGBoost, Complement Naive Bayes (CNB), support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), and Gaussian Naive Bayes (GNB)-were used to construct ER prediction models. Optimal model selection was based on receiver operating characteristic (ROC) curve analysis. The selected model was externally validated using independent datasets to assess generalizability, while its accuracy and clinical utility were evaluated via calibration curves and decision curve analysis. Additionally, SHapley Additive exPlanations were employed to visualize prediction processes for clinical interpretability.
The XGBoost algorithm outperformed other methods in model selection, demonstrating superior accuracy and stability with area under the ROC curve (AUC) values of 0.837 and 0.777 in internal training and validation sets, respectively. This model achieved the lowest Brier score of 0.131 on calibration curves, surpassing the five other ML algorithms. External validation further confirmed its generalizability, yielding AUC values of 0.783 and 0.773 in two independent datasets. Consistent predictive performance was observed across age subgroups (<60 years: AUC 0.762-0.834; ≥60 years: AUC 0.777-0.800) and tumor location subgroups (colon: AUC 0.785-0.845; rectum: AUC 0.751-0.799).
The interpretable ML model developed based on BCINI shows promise in predicting ER of CRC. This approach may provide valuable insights for clinical decision-making, enabling early detection and intervention to improve patient outcomes.
结直肠癌(CRC)是全球最常见的癌症之一,术后早期复发仍是主要的死亡原因。身体成分和炎症营养指标(BCINI)已显示出反映患者生理状态的潜力;然而,它们与CRC切除术后早期复发(ER)的关联仍不清楚。本研究旨在建立并验证基于BCINI的可解释机器学习(ML)模型,以预测CRC切除术后的ER。
收集了三家医院的数据,包括基于CT的身体成分指标和血液检测变量。在变量选择后,使用六种ML算法——XGBoost、互补朴素贝叶斯(CNB)、支持向量机(SVM)、k近邻(KNN)、随机森林(RF)和高斯朴素贝叶斯(GNB)——构建ER预测模型。基于受试者工作特征(ROC)曲线分析进行最优模型选择。使用独立数据集对所选模型进行外部验证以评估其通用性,同时通过校准曲线和决策曲线分析评估其准确性和临床实用性。此外,采用SHapley加性解释来可视化预测过程以实现临床可解释性。
在模型选择方面,XGBoost算法优于其他方法,在内部训练集和验证集中的ROC曲线下面积(AUC)值分别为0.837和0.777,显示出更高的准确性和稳定性。该模型在校准曲线上的Brier评分为0.131,是六种ML算法中最低的。外部验证进一步证实了其通用性,在两个独立数据集中的AUC值分别为0.783和0.773。在不同年龄亚组(<60岁:AUC 0.762 - 0.834;≥60岁:AUC 0.777 - 0.800)和肿瘤位置亚组(结肠:AUC 0.785 - 0.845;直肠:AUC 0.751 - 0.799)中均观察到一致的预测性能。
基于BCINI开发的可解释ML模型在预测CRC的ER方面显示出前景。这种方法可能为临床决策提供有价值的见解,有助于早期检测和干预以改善患者预后。