Jaspers Tim J M, Boers Tim G W, Kusters Carolus H J, Jong Martijn R, Jukema Jelmer B, de Groof Albert J, Bergman Jacques J, de With Peter H N, van der Sommen Fons
Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
Med Image Anal. 2024 May;94:103157. doi: 10.1016/j.media.2024.103157. Epub 2024 Mar 29.
Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called domain gap between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (±1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (±2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss.
内镜检查中的计算机辅助检测和诊断系统(CADe/CADx)通常使用高质量图像进行训练,而这些图像并不能代表临床实践中常见的异质输入。在内镜检查中,图像质量在很大程度上依赖于内镜医师的技能和经验以及用于筛查的系统规格。诸如光照不足、运动模糊和特定的后处理设置等因素会显著改变这些图像的质量和总体外观。这种在开发系统时使用的数据与部署后遇到的数据之间的所谓领域差距,及其对支持内镜CAD系统的深度神经网络(DNN)性能的影响,在很大程度上仍未得到探索。由于许多这样的系统,例如息肉检测系统,已经在临床实践中推出,这在特别是社区医院中带来了严重的患者风险,因为那里的成像设备和经验差异很大。因此,本研究旨在评估这种领域差距对各种内镜应用中CADe/CADx临床性能的影响。为此,我们利用了两个公开可用的数据集(KVASIR-SEG和GIANA)以及两个内部数据集。我们研究了常用DNN架构在合成的、经过临床校准的图像退化情况下以及在一个前瞻性收集的包含342张主观质量较低的内镜图像的数据集上的性能。此外,我们评估了DNN架构和复杂度、数据增强以及预训练技术对提高鲁棒性的影响。结果显示,与参考相比,在临床校准的图像退化范围内,性能显著下降了11.6%(±1.5)。然而,采用更先进的DNN架构和自监督的领域内预训练有效地将这种下降减轻到了7.7%(±2.03)。此外,这些增强措施在包括主观质量较低图像的手动收集测试集上产生了最高性能。通过全面评估多个数据集上流行DNN架构和训练策略的鲁棒性,本研究为它们在内镜应用中的性能和局限性提供了有价值的见解。研究结果强调了在开发用于内镜应用的DNN时进行鲁棒性评估的重要性,并提出了减轻性能损失的策略。