基于机器学习的Ⅰ至Ⅲ期结直肠癌术后患者预后预测模型

Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.

作者信息

Ji Xiao-Lin, Xu Shuo, Li Xiao-Yu, Xu Jin-Huan, Han Rong-Shuang, Guo Ying-Jie, Duan Li-Ping, Tian Zi-Bin

机构信息

Department of Gastroenterology, Beijing Key Laboratory for Helicobacter Pylori Infection and Upper Gastrointestinal Diseases, Peking University Third Hospital, Beijing 100191, China.

Beijing Aerospace Wanyuan Science Technology Co., Ltd., China Academy of Launch Vehicle Technology, Beijing 100176, China.

出版信息

World J Gastrointest Oncol. 2024 Dec 15;16(12):4597-4613. doi: 10.4251/wjgo.v16.i12.4597.

DOI:10.4251/wjgo.v16.i12.4597

PMID:39678810

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11577370/

Abstract

BACKGROUND

Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates. With machine learning (ML) algorithms, patient, tumor, and treatment features can be used to develop and validate models for predicting survival. In addition, important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.

AIM

To construct prognostic prediction models and screen important variables for patients with stage I to III CRC.

METHODS

More than 1000 postoperative CRC patients were grouped according to survival time (with cutoff values of 3 years and 5 years) and assigned to training and testing cohorts (7:3). For each 3-category survival time, predictions were made by 4 ML algorithms (all-variable and important variable-only datasets), each of which was validated 5-fold cross-validation and bootstrap validation. Important variables were screened with multivariable regression methods. Model performance was evaluated and compared before and after variable screening with the area under the curve (AUC). SHapley Additive exPlanations (SHAP) further demonstrated the impact of important variables on model decision-making. Nomograms were constructed for practical model application.

RESULTS

Our ML models performed well; the model performance before and after important parameter identification was consistent, and variable screening was effective. The highest pre- and postscreening model AUCs 95% confidence intervals in the testing set were 0.87 (0.81-0.92) and 0.89 (0.84-0.93) for overall survival, 0.75 (0.69-0.82) and 0.73 (0.64-0.81) for disease-free survival, 0.95 (0.88-1.00) and 0.88 (0.75-0.97) for recurrence-free survival, and 0.76 (0.47-0.95) and 0.80 (0.53-0.94) for distant metastasis-free survival. Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets. The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors. The nomograms were created.

CONCLUSION

We constructed a comprehensive, high-accuracy, important variable-based ML architecture for predicting the 3-category survival times. This architecture could serve as a vital reference for managing CRC patients.

摘要

背景

结直肠癌（CRC）具有高度异质性、侵袭性以及高发病率和死亡率。借助机器学习（ML）算法，可以利用患者、肿瘤和治疗特征来开发和验证预测生存的模型。此外，还可以筛选重要变量并提供不同的应用，这些在临床决策时可作为重要参考，并有可能改善临床环境中的患者预后。

目的

构建I至III期CRC患者的预后预测模型并筛选重要变量。

方法

1000多名CRC术后患者根据生存时间（截止值为3年和5年）进行分组，并分配到训练和测试队列（7:3）。对于每个3分类生存时间，通过4种ML算法（全变量和仅重要变量数据集）进行预测，每种算法都经过5折交叉验证和自助验证。使用多变量回归方法筛选重要变量。在变量筛选前后，用曲线下面积（AUC）评估和比较模型性能。SHapley加法解释（SHAP）进一步证明了重要变量对模型决策的影响。构建列线图用于实际模型应用。

结果

我们的ML模型表现良好；重要参数识别前后模型性能一致，变量筛选有效。测试集中总体生存的最高筛选前和筛选后模型AUC 95%置信区间分别为0.87（0.81 - 0.92）和0.89（0.84 - 0.93），无病生存为0.75（0.69 - 0.82）和0.73（0.64 - 0.81），无复发生存为0.95（0.88 - 1.00）和0.88（0.75 - 0.97），无远处转移生存为0.76（0.47 - 0.95）和0.80（0.53 - 0.94）。在训练和测试数据集中都进行了重复交叉验证和自助验证。重要变量的SHAP值与肿瘤患者的临床病理特征一致。创建了列线图。

结论

我们构建了一个全面、高精度、基于重要变量的ML架构来预测3分类生存时间。该架构可为CRC患者的管理提供重要参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b339/11577370/d74e7aa90a3f/WJGO-16-4597-g001.jpg

相似文献

Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.

World J Gastrointest Oncol. 2024 Dec 15;16(12):4597-4613. doi: 10.4251/wjgo.v16.i12.4597.

A Prediction Model for Tumor Recurrence in Stage II-III Colorectal Cancer Patients: From a Machine Learning Model to Genomic Profiling.

Biomedicines. 2022 Feb 1;10(2):340. doi: 10.3390/biomedicines10020340.

Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.

J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417.

A machine learning-based model for predicting distant metastasis in patients with rectal cancer.

Front Oncol. 2023 Aug 15;13:1235121. doi: 10.3389/fonc.2023.1235121. eCollection 2023.

Prediction of peripheral lymph node metastasis (LNM) in thyroid cancer using delta radiomics derived from enhanced CT combined with multiple machine learning algorithms.

Eur J Med Res. 2025 Mar 13;30(1):164. doi: 10.1186/s40001-025-02438-1.

Development and validation of prognostic nomograms based on De Ritis ratio and clinicopathological features for patients with stage II/III colorectal cancer.

BMC Cancer. 2023 Jul 3;23(1):620. doi: 10.1186/s12885-023-11125-5.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

Machine-learning Models Predict 30-Day Mortality, Cardiovascular Complications, and Respiratory Complications After Aseptic Revision Total Joint Arthroplasty.

Clin Orthop Relat Res. 2022 Nov 1;480(11):2137-2145. doi: 10.1097/CORR.0000000000002276. Epub 2022 Jun 20.

Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.

JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.

Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage.

J Transl Med. 2024 Mar 4;22(1):236. doi: 10.1186/s12967-024-04896-3.

引用本文的文献

Constructing a Prognostic Model for Subtypes of Colorectal Cancer Based on Machine Learning and Immune Infiltration-Related Genes.

J Cell Mol Med. 2025 Feb;29(4):e70437. doi: 10.1111/jcmm.70437.

本文引用的文献

Accelerated Organic Crystal Structure Prediction with Genetic Algorithms and Machine Learning.

J Chem Theory Comput. 2023 Dec 26;19(24):9388-9402. doi: 10.1021/acs.jctc.3c00853. Epub 2023 Dec 7.

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease.

BMC Med Inform Decis Mak. 2023 Jul 25;23(1):137. doi: 10.1186/s12911-023-02238-9.

Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back.

PLoS One. 2023 Jul 12;18(7):e0288274. doi: 10.1371/journal.pone.0288274. eCollection 2023.

KDE-GAN: A multimodal medical image-fusion model based on knowledge distillation and explainable AI modules.

Comput Biol Med. 2022 Dec;151(Pt A):106273. doi: 10.1016/j.compbiomed.2022.106273. Epub 2022 Nov 3.

Construction of a new clinical staging system for colorectal cancer based on the lymph node ratio: A validation study.

Front Surg. 2022 Aug 25;9:929576. doi: 10.3389/fsurg.2022.929576. eCollection 2022.

XGBoost Machine Learning Algorithm for Prediction of Outcome in Aneurysmal Subarachnoid Hemorrhage.

Neuropsychiatr Dis Treat. 2022 Mar 29;18:659-667. doi: 10.2147/NDT.S349956. eCollection 2022.

Detection of ovarian cancer via the spectral fingerprinting of quantum-defect-modified carbon nanotubes in serum by machine learning.

Nat Biomed Eng. 2022 Mar;6(3):267-275. doi: 10.1038/s41551-022-00860-y. Epub 2022 Mar 17.

More Than Incremental: Harnessing Machine Learning to Predict Breast Cancer Risk.

J Clin Oncol. 2022 Jun 1;40(16):1713-1717. doi: 10.1200/JCO.21.02733. Epub 2022 Mar 4.

Designing sensitive viral diagnostics with machine learning.

Nat Biotechnol. 2022 Jul;40(7):1123-1131. doi: 10.1038/s41587-022-01213-5. Epub 2022 Mar 3.

Machine learning methods to predict presence of residual cancer following hysterectomy.

Sci Rep. 2022 Feb 17;12(1):2738. doi: 10.1038/s41598-022-06585-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的Ⅰ至Ⅲ期结直肠癌术后患者预后预测模型

Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.

作者信息

机构信息

出版信息

BACKGROUND

AIM

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献