Tanwar Vishesh, Sharma Bhisham, Yadav Dhirendra Prasad, Mehbodniya Abolfazl
Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, India.
Centre of Research Impact and Outcome, Chitkara University, Rajpura, Punjab, 140401, India.
Sci Rep. 2025 Jul 24;15(1):26982. doi: 10.1038/s41598-025-12128-x.
GI diseases are one of the leading causes of morbidity and mortality worldwide, and early and accurate diagnosis is considered to be very important. Traditional methods like endoscopy take time and depend majorly on the judgment of the physician. The proposed Efficient Vision Transformer (EfficientViT) is a new deep learning-based model using EfficientNetB0 in combination with the Vision Transformer (ViT) for the classification of eight different types of diseases in the GI system. EfficientViT utilizes the features of EfficientNetB0 to capture local textures and multi-scale features to achieve structural changes in the GI tract. At the same time, it includes the capacity of the ViT model to recognize the context of images of the GI tract for the detection of slight disease patterns and precursors of disease diffusion. Furthermore, we designed a dual-block in which input is divided into two parts (q1, q2) to better optimize the model q1 processed through an EfficientNet for local details and a q2 through encoder block for capturing the global dependencies, which enables EfficientViT to pay attention to multiple image regions simultaneously. We have tested the model using fivefold cross-validation and achieved an outstanding accuracy of 99.82% compared to the MobileNetV2-based model which reached 99.60%. In addition, EfficientViT demonstrated excellent precision, recall, and F1 scores. Our model, in general, outperforms existing methods, offering a promising tool for clinicians to more reliably and accurately diagnose GI diseases from endoscopic images.
胃肠道疾病是全球发病和死亡的主要原因之一,早期准确诊断被认为非常重要。像内窥镜检查这样的传统方法耗时且主要依赖医生的判断。所提出的高效视觉Transformer(EfficientViT)是一种基于深度学习的新模型,它将EfficientNetB0与视觉Transformer(ViT)相结合,用于胃肠道系统中八种不同类型疾病的分类。EfficientViT利用EfficientNetB0的特征来捕捉局部纹理和多尺度特征,以实现胃肠道的结构变化。同时,它具备ViT模型识别胃肠道图像上下文的能力,用于检测轻微疾病模式和疾病扩散的先兆。此外,我们设计了一个双块结构,将输入分为两部分(q1,q2),以更好地优化模型,q1通过EfficientNet处理以获取局部细节,q2通过编码器块处理以捕捉全局依赖性,这使得EfficientViT能够同时关注多个图像区域。我们使用五折交叉验证对模型进行了测试,与基于MobileNetV2的模型(准确率为99.60%)相比,取得了99.82%的出色准确率。此外,EfficientViT还展示了出色的精确率、召回率和F1分数。总体而言,我们的模型优于现有方法,为临床医生从内窥镜图像中更可靠、准确地诊断胃肠道疾病提供了一个有前景的工具。