Oukdach Yassine, Garbaz Anass, Kerkaou Zakaria, El Ansari Mohamed, Koutti Lahcen, Papachrysos Nikolaos, El Ouafdi Ahmed Fouad, de Lange Thomas, Distante Cosimo
Ibn Zohr University, LabSIV, Department of Computer Science, Faculty of Sciences, Agadir, Morocco.
Moulay Ismail University, Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Meknes, Morocco.
J Med Imaging (Bellingham). 2025 Jan;12(1):014505. doi: 10.1117/1.JMI.12.1.014505. Epub 2025 Feb 5.
Wireless capsule endoscopy (WCE) is a non-invasive technology used for diagnosing gastrointestinal abnormalities. A single examination generates images, making manual review both time-consuming and costly for doctors. Therefore, the development of computer vision-assisted systems is highly desirable to aid in the diagnostic process.
We presents a deep learning approach leveraging knowledge distillation (KD) from a convolutional neural network (CNN) teacher model to a vision transformer (ViT) student model for gastrointestinal abnormality recognition. The CNN teacher model utilizes attention mechanisms and depth-wise separable convolutions to extract features from WCE images, supervising the ViT in learning these representations.
The proposed method achieves accuracy of 97% and 96% on the Kvasir and KID datasets, respectively, demonstrating its effectiveness in distinguishing normal from abnormal regions and bleeding from non-bleeding cases. The proposed approach offers computational efficiency and generalization to unseen datasets, outperforming several state-of-the-art methods.
We proposed a deep learning approach utilizing CNNs and a ViT with KD to effectively classify gastrointestinal diseases in WCE images. It demonstrates promising performance on public datasets, distinguishing normal from abnormal regions and bleeding from non-bleeding cases while offering optimal computational efficiency compared with existing methods, making it suitable for GI disease applications.
无线胶囊内镜检查(WCE)是一种用于诊断胃肠道异常的非侵入性技术。单次检查会生成大量图像,这使得医生进行人工检查既耗时又昂贵。因此,非常需要开发计算机视觉辅助系统来辅助诊断过程。
我们提出了一种深度学习方法,利用知识蒸馏(KD)从卷积神经网络(CNN)教师模型到视觉Transformer(ViT)学生模型进行胃肠道异常识别。CNN教师模型利用注意力机制和深度可分离卷积从WCE图像中提取特征,指导ViT学习这些表示。
所提出的方法在Kvasir和KID数据集上分别达到了97%和96%的准确率,证明了其在区分正常区域和异常区域以及出血和非出血病例方面的有效性。所提出的方法具有计算效率和对未见数据集的泛化能力,优于几种先进方法。
我们提出了一种利用CNN和带有KD的ViT的深度学习方法,以有效地对WCE图像中的胃肠道疾病进行分类。它在公共数据集上表现出了有前景的性能,能够区分正常区域和异常区域以及出血和非出血病例,同时与现有方法相比提供了最佳的计算效率,使其适用于胃肠道疾病应用。