Gong Eun Jeong, Bang Chang Seok, Lee Jae Jun, Yang Young Joo, Baik Gwang Ho
Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Korea.
Institute of New Frontier Research, Hallym University College of Medicine, Chuncheon 24253, Korea.
J Pers Med. 2022 Aug 24;12(9):1361. doi: 10.3390/jpm12091361.
Establishment of an artificial intelligence model in gastrointestinal endoscopy has no standardized dataset. The optimal volume or class distribution of training datasets has not been evaluated. An artificial intelligence model was previously created by the authors to classify endoscopic images of colorectal polyps into four categories, including advanced colorectal cancer, early cancers/high-grade dysplasia, tubular adenoma, and nonneoplasm. The aim of this study was to evaluate the impact of the volume and distribution of training dataset classes in the development of deep-learning models for colorectal polyp histopathology prediction from endoscopic images.
The same 3828 endoscopic images that were used to create earlier models were used. An additional 6838 images were used to find the optimal volume and class distribution for a deep-learning model. Various amounts of data volume and class distributions were tried to establish deep-learning models. The training of deep-learning models uniformly used no-code platform Neuro-T. Accuracy was the primary outcome on four-class prediction.
The highest internal-test classification accuracy in the original dataset, doubled dataset, and tripled dataset was commonly shown by doubling the proportion of data for fewer categories (2:2:1:1 for advanced colorectal cancer: early cancers/high-grade dysplasia: tubular adenoma: non-neoplasm). Doubling the proportion of data for fewer categories in the original dataset showed the highest accuracy (86.4%, 95% confidence interval: 85.0-97.8%) compared to that of the doubled or tripled dataset. The total required number of images in this performance was only 2418 images. Gradient-weighted class activation mapping confirmed that the part that the deep-learning model pays attention to coincides with the part that the endoscopist pays attention to.
As a result of a data-volume-dependent performance plateau in the classification model of colonoscopy, a dataset that has been doubled or tripled is not always beneficial to training. Deep-learning models would be more accurate if the proportion of fewer category lesions was increased.
胃肠道内镜检查中人工智能模型的建立尚无标准化数据集。训练数据集的最佳容量或类别分布尚未得到评估。作者此前创建了一个人工智能模型,将大肠息肉的内镜图像分为四类,包括晚期大肠癌、早期癌症/高级别异型增生、管状腺瘤和非肿瘤性病变。本研究的目的是评估训练数据集类别的容量和分布对基于内镜图像的大肠息肉组织病理学预测深度学习模型开发的影响。
使用与创建早期模型相同的3828张内镜图像。另外使用6838张图像来寻找深度学习模型的最佳容量和类别分布。尝试了各种数据量和类别分布来建立深度学习模型。深度学习模型的训练统一使用无代码平台Neuro-T。准确率是四分类预测的主要结果。
在原始数据集、双倍数据集和三倍数据集中,最高的内部测试分类准确率通常出现在将较少类别数据的比例翻倍时(晚期大肠癌:早期癌症/高级别异型增生:管状腺瘤:非肿瘤性病变为2:2:1:1)。与双倍或三倍数据集相比,将原始数据集中较少类别数据的比例翻倍显示出最高的准确率(86.4%,95%置信区间:85.0-97.8%)。此性能下所需的图像总数仅为2418张。梯度加权类激活映射证实,深度学习模型关注的部分与内镜医师关注的部分一致。
由于结肠镜检查分类模型中存在数据量依赖的性能平台期,翻倍或三倍的数据集并不总是有利于训练。如果增加较少类别病变的比例,深度学习模型将更准确。