Suppr超能文献

用于预测结直肠癌患者静脉血栓栓塞的机器学习模型的开发与验证:一项中国的队列研究。

Development and validation of machine learning models for predicting venous thromboembolism in colorectal cancer patients: A cohort study in China.

作者信息

Hu Zuhai, Li Xiaosheng, Yuan Yuliang, Xu Qianjie, Zhang Wei, Lei Haike

机构信息

Chongqing Cancer Multiomics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China.

Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized Treatment, Chongqing University Cancer Hospital, Chongqing 400030, China.

出版信息

Int J Med Inform. 2025 Mar;195:105770. doi: 10.1016/j.ijmedinf.2024.105770. Epub 2024 Dec 19.

Abstract

BACKGROUND

With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods.

OBJECTIVE

Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility.

METHODS

Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer. In the modeling process, LASSO regression was employed to identify and exclude variables not associated with VTE. Additionally, hyperparameter tuning was conducted via 5-fold cross-validation to mitigate overfitting, and 200 bootstrap samples were used to adjust the apparent performance on the training set. The selection of the VTE assessment model was determined by a thorough evaluation of performance criteria, such as the AUC, ACC and F1 score.

RESULTS

The RF model exhibits consistent and efficient performance. Specifically, in the internally validation dataset, where generalizability was adjusted, the RF model achieved the highest scores across multiple metrics: AD-AUC (0.895), AD-ACC (0.871), AD-F1 (0.311), AD-MCC (0.316), AD-Precision (0.241), AD-Specificity (0.888). For external validation on unseen colon cancer data, the RF model also performed best in terms of ACC (0.728), F1 (0.292), MCC (0.225), Precision (0.192), and Specificity (0.740), with a suboptimal AUC of 0.745 and a Sensitivity (Recall) of 0.615. Additionally, the RF model demonstrates strong performance not only on the original dataset but also on datasets processed via alternative imbalance handling techniques.

CONCLUSIONS

Our research successfully established and validated a risk assessment model for assessing the risk of VTE in colorectal cancer patients.

摘要

背景

随着医疗保健的进步,传统的静脉血栓栓塞症(VTE)风险评估工具越来越不足以满足高质量护理的需求,这凸显了对创新和专门评估方法的需求。

目的

由于机器学习在监督学习和疾病预测方面取得了显著成功,我们的目标是通过利用我们医疗机构内结直肠癌患者的基础数据和临床特征,开发一种可靠且高效的VTE风险评估模型。

方法

我们的研究使用了六种常用的机器学习算法来预测直肠癌患者VTE的发生。在建模过程中,采用LASSO回归来识别和排除与VTE无关的变量。此外,通过5折交叉验证进行超参数调整以减轻过拟合,并使用200个自助抽样样本调整训练集上的表观性能。通过对性能标准(如AUC、ACC和F1分数)的全面评估来确定VTE评估模型的选择。

结果

随机森林(RF)模型表现出一致且高效的性能。具体而言,在调整了泛化性的内部验证数据集中,RF模型在多个指标上获得了最高分:AD - AUC(0.895)、AD - ACC(0.871)、AD - F1(0.311)、AD - MCC(0.316)、AD - Precision(0.241)、AD - Specificity(0.888)。对于对未见过的结肠癌数据进行的外部验证,RF模型在ACC(0.728)、F1(0.292)、MCC(0.225)、Precision(0.192)和Specificity(0.740)方面也表现最佳,AUC为0.745,灵敏度(召回率)为0.615。此外,RF模型不仅在原始数据集上,而且在通过替代不平衡处理技术处理的数据集上都表现出强大的性能。

结论

我们的研究成功建立并验证了一种用于评估结直肠癌患者VTE风险的风险评估模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验