Habe Tsedeke Temesgen, Haataja Keijo, Toivanen Pekka
School of Computing, University of Eastern Finland, Kuopio, North Savo, Finland.
Front Artif Intell. 2025 Apr 30;8:1529814. doi: 10.3389/frai.2025.1529814. eCollection 2025.
Wireless Capsule Endoscopy (WCE) enables non-invasive imaging of the gastrointestinal tract but generates vast video data, making real-time and accurate abnormality detection challenging. Traditional detection methods struggle with uncontrolled illumination, complex textures, and high-speed processing demands.
This study presents a novel approach using Real-Time Detection Transformer (RT-DETR), a transformer-based object detection model, specifically optimized for WCE video analysis. The model captures contextual information between frames and handles variable image conditions. It was evaluated using the Kvasir-Capsule dataset, with performance assessed across three RT-DETR variants: Small (S), Medium (M), and X-Large (X).
RT-DETR-X achieved the highest detection precision. RT-DETR-M offered a practical trade-off between accuracy and speed, while RT-DETR-S processed frames at 270 FPS, enabling real-time performance. All three models demonstrated improved detection accuracy and computational efficiency compared to baseline methods.
The RT-DETR framework significantly enhances precision and real-time performance in gastrointestinal abnormality detection using WCE. Its clinical potential lies in supporting faster and more accurate diagnosis. Future work will focus on further optimization and deployment in endoscopic video analysis systems.
无线胶囊内镜(WCE)能够对胃肠道进行无创成像,但会产生大量视频数据,这使得实时准确地检测异常具有挑战性。传统的检测方法难以应对光照不受控制、纹理复杂以及高速处理需求等问题。
本研究提出了一种使用实时检测变压器(RT-DETR)的新方法,RT-DETR是一种基于变压器的目标检测模型,专门针对WCE视频分析进行了优化。该模型捕捉帧之间的上下文信息,并处理各种图像条件。使用Kvasir-Capsule数据集对其进行评估,通过三种RT-DETR变体进行性能评估:小(S)、中(M)和超大(X)。
RT-DETR-X实现了最高的检测精度。RT-DETR-M在准确性和速度之间提供了一个实际的权衡,而RT-DETR-S以270帧每秒的速度处理帧,实现了实时性能。与基线方法相比,所有三种模型都展示了更高的检测精度和计算效率。
RT-DETR框架显著提高了使用WCE进行胃肠道异常检测的精度和实时性能。其临床潜力在于支持更快、更准确的诊断。未来的工作将集中在内窥镜视频分析系统中的进一步优化和部署。