使用卷积视觉变换器(ConvVit)混合模型对内镜图像进行增强的胃肠道疾病分类

Enhanced gastrointestinal disease classification using a convvit hybrid model on endoscopic images.

作者信息

Utku Anıl

机构信息

Computer Engineering, Faculty of Engineering, Munzur University, Tunceli, Turkey.

出版信息

Phys Eng Sci Med. 2025 Jul 21. doi: 10.1007/s13246-025-01600-7.

Abstract

Endoscopy is a procedure that allows examination of the gastrointestinal system, including the stomach, esophagus, large intestine, and duodenum, with the help of an endoscope. Processing of endoscopic images is important for early detection and treatment of gastrointestinal diseases. In this study, hybrid ConvViT was developed using CNN and ViT to increase the classification accuracy of pathologies in gastrointestinal endoscopic images. CNNs are well-suited for capturing local spatial features through hierarchical convolutions, making them highly effective in detecting fine-grained textures and edge patterns. These capabilities complement the ViT's global attention mechanism, which excels at modeling long-range dependencies in images. The motivation of this study is to increase the classification accuracy and reliability with the ConvViT model, which was developed by combining the practical features of CNN and ViT models, which are individually successful in different aspects of image processing. The ConvViT model was compared with VGG-16, ResNet-50, Inception-V3 and ViT. Comparable models were tested using a gastrointestinal endoscopic image dataset containing ulcers, polyps, inflammation, bleeding, and regular anatomical features. Experiments showed that ConvViT had better prediction performance than compared models, with 95.87% classification accuracy.

摘要

内窥镜检查是一种借助内窥镜对胃肠系统进行检查的程序,胃肠系统包括胃、食管、大肠和十二指肠。内窥镜图像的处理对于胃肠道疾病的早期检测和治疗至关重要。在本研究中,使用卷积神经网络(CNN)和视觉Transformer(ViT)开发了混合ConvViT,以提高胃肠道内窥镜图像中病变的分类准确率。卷积神经网络非常适合通过分层卷积捕捉局部空间特征,使其在检测细粒度纹理和边缘模式方面非常有效。这些能力补充了视觉Transformer的全局注意力机制,该机制擅长对图像中的长距离依赖关系进行建模。本研究的动机是通过结合CNN和ViT模型的实际特征来提高ConvViT模型的分类准确率和可靠性,这两种模型在图像处理的不同方面都取得了成功。将ConvViT模型与VGG-16、ResNet-50、Inception-V3和ViT进行了比较。使用包含溃疡、息肉、炎症、出血和正常解剖特征的胃肠道内窥镜图像数据集对可比模型进行了测试。实验表明,ConvViT比其他比较模型具有更好的预测性能,分类准确率为95.87%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索