机器学习模型在优化印度结直肠腺瘤检测中的应用。

Application of machine-learning model to optimize colonic adenoma detection in India.

机构信息

Department of Medical Gastroenterology, Asian Institute of Gastroenterology, Hyderabad, 500 082, India.

Department of Pathology, Asian Institute of Gastroenterology, Hyderabad, 500 082, India.

出版信息

Indian J Gastroenterol. 2024 Oct;43(5):995-1001. doi: 10.1007/s12664-024-01530-4. Epub 2024 May 17.

Abstract

AIMS

There is limited data on the prevalence and risk factors of colonic adenoma from the Indian sub-continent. We aimed at developing a machine-learning model to optimize colonic adenoma detection in a prospective cohort.

METHODS

All consecutive adult patients undergoing diagnostic colonoscopy were enrolled between October 2020 and November 2022. Patients with a high risk of colonic adenoma were excluded. The predictive model was developed using the gradient-boosting machine (GBM)-learning method. The GBM model was optimized further by adjusting the learning rate and the number of trees and 10-fold cross-validation.

RESULTS

Total 10,320 patients (mean age 45.18 ± 14.82 years; 69% men) were included in the study. In the overall population, 1152 (11.2%) patients had at least one adenoma. In patients with age > 50 years, hospital-based adenoma prevalence was 19.5% (808/4144). The area under the receiver operating curve (AUC) (SD) of the logistic regression model was 72.55% (4.91), while the AUCs for deep learning, decision tree, random forest and gradient-boosted tree model were 76.25% (4.22%), 65.95% (4.01%), 79.38% (4.91%) and 84.76% (2.86%), respectively. After model optimization and cross-validation, the AUC of the gradient-boosted tree model has increased to 92.2% (1.1%).

CONCLUSIONS

Machine-learning models may predict colorectal adenoma more accurately than logistic regression. A machine-learning model may help optimize the use of colonoscopy to prevent colorectal cancers.

TRIAL REGISTRATION

ClinicalTrials.gov (ID: NCT04512729).

摘要

目的

来自印度次大陆的结直肠腺瘤患病率和风险因素的数据有限。我们旨在开发一种机器学习模型,以优化前瞻性队列中结直肠腺瘤的检测。

方法

2020 年 10 月至 2022 年 11 月期间,连续纳入接受诊断性结肠镜检查的所有成年患者。排除结直肠腺瘤高危患者。使用梯度提升机(GBM)学习方法开发预测模型。通过调整学习率和树的数量以及 10 倍交叉验证进一步优化 GBM 模型。

结果

共纳入 10320 例患者(平均年龄 45.18±14.82 岁;69%为男性)。在总体人群中,1152 例(11.2%)患者至少有 1 个腺瘤。在年龄>50 岁的患者中,医院结直肠腺瘤患病率为 19.5%(808/4144)。逻辑回归模型的接收者操作特征曲线(ROC)下面积(SD)为 72.55%(4.91),而深度学习、决策树、随机森林和梯度提升树模型的 AUC 分别为 76.25%(4.22%)、65.95%(4.01%)、79.38%(4.91%)和 84.76%(2.86%)。经过模型优化和交叉验证,梯度提升树模型的 AUC 增加到 92.2%(1.1%)。

结论

机器学习模型可能比逻辑回归更准确地预测结直肠腺瘤。机器学习模型可能有助于优化结肠镜的使用以预防结直肠癌。

试验注册

ClinicalTrials.gov(注册号:NCT04512729)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索