Liu Xin, Shu Xingming, Zhou Yejiang, Jiang Yifan
Department of Clinical Medicine, Southwest Medical University, Luzhou, China.
Department of Gastrointestinal Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China.
Front Oncol. 2024 Nov 27;14:1499794. doi: 10.3389/fonc.2024.1499794. eCollection 2024.
Colorectal cancer is a prevalent malignancy of the digestive system, with an increasing incidence. Lower extremity deep vein thrombosis (DVT) is a frequent postoperative complication, occurring in up to 40% of cases.
This research aims to develop and validate a machine learning model (ML) to predict the risk of lower limb deep vein thrombosis in patients with colorectal cancer, facilitating preventive and therapeutic measures to enhance recovery and ensure safety.
In this retrospective cohort study, we collected data from 429 colorectal cancer patients from January 2021 to January 2024. The medical records included age, blood test results, body mass index, underlying diseases, clinical staging, histological typing, surgical methods, and postoperative complications. We employed the Synthetic Minority Oversampling Technique to address imbalanced data and split the dataset into training and validation sets in a 7:3 ratio. Feature selection was performed using Random Forest (RF), XGBoost, and Least Absolute Shrinkage and Selection Operator algorithms (LASSO). We then trained six machine learning models: Logistic Regression (LR), Naive Bayes (NB), Gaussian Process (GP), Random Forest, XGBoost, and Multilayer Perceptron (MLP). The model's performance was evaluated using metrics such as area under the Receiver Operating Characteristic curve, accuracy, sensitivity, specificity, F1 score, and confusion matrix. Additionally, SHAP and LIME were used to enhance the interpretability of the results.
The study combined Random Forest, XGBoost algorithms, and LASSO regression with univariate regression analysis to identify significant predictive factors, including age, preoperative prealbumin, preoperative albumin, preoperative hemoglobin, operation time, PIKVA2, CEA, and preoperative neutrophil count. The XGBoost model outperformed other ML algorithms, achieving an AUC of 0.996, an accuracy of 0.9636, a specificity of 0.9778, and an F1 score of 0.9576. Moreover, the SHAP method identified age and preoperative prealbumin as the primary determinants influencing ML model predictions. Finally, the study employed LIME for more precise prediction and interpretation of individual predictions.
The machine learning algorithms effectively predicted postoperative lower limb deep vein thrombosis in colorectal cancer patients. The XGBoost model demonstrated strong potential for improving early detection and treatment in clinical settings.
结直肠癌是消化系统常见的恶性肿瘤,发病率呈上升趋势。下肢深静脉血栓形成(DVT)是常见的术后并发症,发生率高达40%。
本研究旨在开发并验证一种机器学习模型(ML),以预测结直肠癌患者下肢深静脉血栓形成的风险,促进采取预防和治疗措施,以加快康复并确保安全。
在这项回顾性队列研究中,我们收集了2021年1月至2024年1月期间429例结直肠癌患者的数据。病历包括年龄、血液检查结果、体重指数、基础疾病、临床分期、组织学类型、手术方法和术后并发症。我们采用合成少数过采样技术来处理数据不平衡问题,并将数据集按7:3的比例分为训练集和验证集。使用随机森林(RF)、XGBoost和最小绝对收缩和选择算子算法(LASSO)进行特征选择。然后我们训练了六种机器学习模型:逻辑回归(LR)、朴素贝叶斯(NB)、高斯过程(GP)、随机森林、XGBoost和多层感知器(MLP)。使用受试者工作特征曲线下面积、准确率、敏感性、特异性、F1分数和混淆矩阵等指标评估模型的性能。此外,使用SHAP和LIME来增强结果的可解释性。
该研究将随机森林、XGBoost算法和LASSO回归与单变量回归分析相结合,以确定显著的预测因素,包括年龄、术前前白蛋白、术前白蛋白、术前血红蛋白、手术时间、PIKVA2、癌胚抗原(CEA)和术前中性粒细胞计数。XGBoost模型优于其他ML算法,曲线下面积(AUC)为0.996,准确率为0.9636,特异性为0.9778,F1分数为0.9576。此外,SHAP方法确定年龄和术前前白蛋白是影响ML模型预测的主要决定因素。最后,该研究使用LIME对个体预测进行更精确的预测和解释。
机器学习算法有效地预测了结直肠癌患者术后下肢深静脉血栓形成。XGBoost模型在临床环境中改善早期检测和治疗方面显示出强大的潜力。