Suppr超能文献

训练数据集的数量和分布对用于诊断内镜图像中结直肠息肉的深度学习模型开发的影响

Impact of the Volume and Distribution of Training Datasets in the Development of Deep-Learning Models for the Diagnosis of Colorectal Polyps in Endoscopy Images.

作者信息

Gong Eun Jeong, Bang Chang Seok, Lee Jae Jun, Yang Young Joo, Baik Gwang Ho

机构信息

Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Korea.

Institute of New Frontier Research, Hallym University College of Medicine, Chuncheon 24253, Korea.

出版信息

J Pers Med. 2022 Aug 24;12(9):1361. doi: 10.3390/jpm12091361.

Abstract

BACKGROUND

Establishment of an artificial intelligence model in gastrointestinal endoscopy has no standardized dataset. The optimal volume or class distribution of training datasets has not been evaluated. An artificial intelligence model was previously created by the authors to classify endoscopic images of colorectal polyps into four categories, including advanced colorectal cancer, early cancers/high-grade dysplasia, tubular adenoma, and nonneoplasm. The aim of this study was to evaluate the impact of the volume and distribution of training dataset classes in the development of deep-learning models for colorectal polyp histopathology prediction from endoscopic images.

METHODS

The same 3828 endoscopic images that were used to create earlier models were used. An additional 6838 images were used to find the optimal volume and class distribution for a deep-learning model. Various amounts of data volume and class distributions were tried to establish deep-learning models. The training of deep-learning models uniformly used no-code platform Neuro-T. Accuracy was the primary outcome on four-class prediction.

RESULTS

The highest internal-test classification accuracy in the original dataset, doubled dataset, and tripled dataset was commonly shown by doubling the proportion of data for fewer categories (2:2:1:1 for advanced colorectal cancer: early cancers/high-grade dysplasia: tubular adenoma: non-neoplasm). Doubling the proportion of data for fewer categories in the original dataset showed the highest accuracy (86.4%, 95% confidence interval: 85.0-97.8%) compared to that of the doubled or tripled dataset. The total required number of images in this performance was only 2418 images. Gradient-weighted class activation mapping confirmed that the part that the deep-learning model pays attention to coincides with the part that the endoscopist pays attention to.

CONCLUSION

As a result of a data-volume-dependent performance plateau in the classification model of colonoscopy, a dataset that has been doubled or tripled is not always beneficial to training. Deep-learning models would be more accurate if the proportion of fewer category lesions was increased.

摘要

背景

胃肠道内镜检查中人工智能模型的建立尚无标准化数据集。训练数据集的最佳容量或类别分布尚未得到评估。作者此前创建了一个人工智能模型,将大肠息肉的内镜图像分为四类,包括晚期大肠癌、早期癌症/高级别异型增生、管状腺瘤和非肿瘤性病变。本研究的目的是评估训练数据集类别的容量和分布对基于内镜图像的大肠息肉组织病理学预测深度学习模型开发的影响。

方法

使用与创建早期模型相同的3828张内镜图像。另外使用6838张图像来寻找深度学习模型的最佳容量和类别分布。尝试了各种数据量和类别分布来建立深度学习模型。深度学习模型的训练统一使用无代码平台Neuro-T。准确率是四分类预测的主要结果。

结果

在原始数据集、双倍数据集和三倍数据集中,最高的内部测试分类准确率通常出现在将较少类别数据的比例翻倍时(晚期大肠癌:早期癌症/高级别异型增生:管状腺瘤:非肿瘤性病变为2:2:1:1)。与双倍或三倍数据集相比,将原始数据集中较少类别数据的比例翻倍显示出最高的准确率(86.4%,95%置信区间:85.0-97.8%)。此性能下所需的图像总数仅为2418张。梯度加权类激活映射证实,深度学习模型关注的部分与内镜医师关注的部分一致。

结论

由于结肠镜检查分类模型中存在数据量依赖的性能平台期,翻倍或三倍的数据集并不总是有利于训练。如果增加较少类别病变的比例,深度学习模型将更准确。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48aa/9505038/81d4a451e4f1/jpm-12-01361-g001.jpg

相似文献

4
Automated classification of gastric neoplasms in endoscopic images using a convolutional neural network.
Endoscopy. 2019 Dec;51(12):1121-1129. doi: 10.1055/a-0981-6133. Epub 2019 Aug 23.
6
Performance of artificial intelligence for detection of subtle and advanced colorectal neoplasia.
Dig Endosc. 2022 May;34(4):862-869. doi: 10.1111/den.14187. Epub 2021 Dec 1.
7
Deep learning in CT colonography: differentiating premalignant from benign colorectal polyps.
Eur Radiol. 2022 Jul;32(7):4749-4759. doi: 10.1007/s00330-021-08532-2. Epub 2022 Jan 26.

引用本文的文献

1
Edge Artificial Intelligence Device in Real-Time Endoscopy for the Classification of Colonic Neoplasms.
Diagnostics (Basel). 2025 Jun 10;15(12):1478. doi: 10.3390/diagnostics15121478.
2
Multi-step validation of a deep learning-based system with visual explanations for optical diagnosis of polyps with advanced features.
iScience. 2024 Mar 8;27(4):109461. doi: 10.1016/j.isci.2024.109461. eCollection 2024 Apr 19.
3
Application of Machine Learning Based on Structured Medical Data in Gastroenterology.
Biomimetics (Basel). 2023 Oct 28;8(7):512. doi: 10.3390/biomimetics8070512.

本文引用的文献

2
Preparation of image databases for artificial intelligence algorithm development in gastrointestinal endoscopy.
Clin Endosc. 2022 Sep;55(5):594-604. doi: 10.5946/ce.2021.229. Epub 2022 May 31.
5
Tens of images can suffice to train neural networks for malignant leukocyte detection.
Sci Rep. 2021 Apr 12;11(1):7995. doi: 10.1038/s41598-021-86995-5.
6
Artificial Intelligence in Lower Gastrointestinal Endoscopy: The Current Status and Future Perspective.
Clin Endosc. 2021 May;54(3):329-339. doi: 10.5946/ce.2020.082. Epub 2021 Jan 13.
7
Computer-aided diagnosis of esophageal cancer and neoplasms in endoscopic images: a systematic review and meta-analysis of diagnostic test accuracy.
Gastrointest Endosc. 2021 May;93(5):1006-1015.e13. doi: 10.1016/j.gie.2020.11.025. Epub 2020 Dec 5.
9
Addressing class imbalance in deep learning for small lesion detection on medical images.
Comput Biol Med. 2020 May;120:103735. doi: 10.1016/j.compbiomed.2020.103735. Epub 2020 Apr 1.
10
[Deep Learning in Upper Gastrointestinal Disorders: Status and Future Perspectives].
Korean J Gastroenterol. 2020 Mar 25;75(3):120-131. doi: 10.4166/kjg.2020.75.3.120.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验