Roß Tobias, Bruno Pierangela, Reinke Annika, Wiesenfarth Manuel, Koeppel Lisa, Full Peter M, Pekdemir Bünyamin, Godau Patrick, Trofimova Darya, Isensee Fabian, Adler Tim J, Tran Thuy N, Moccia Sara, Calimeri Francesco, Müller-Stich Beat P, Kopp-Schneider Annette, Maier-Hein Lena
Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty, Heidelberg University, Heidelberg, Germany; Helmholtz Imaging, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; Department of Mathematics and Computer Science, University of Calabria, Rende, Italy.
Med Image Anal. 2023 May;86:102765. doi: 10.1016/j.media.2023.102765. Epub 2023 Mar 1.
Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond.
挑战赛已成为以比较方式对图像分析算法进行基准测试的先进方法。虽然在相同数据集上进行验证是向前迈出的一大步,但结果分析往往仅限于纯粹的排名表,使得一些相关问题未得到解答。具体而言,对于最先进算法在哪些图像上失败的特征,人们很少进行系统研究。为了填补文献中的这一空白,我们(1)提出了一个从挑战赛中学习的统计框架,(2)并将其应用于腹腔镜视频中器械实例分割的特定任务。我们的框架依赖于图像的语义元数据标注,这为广义线性混合模型(GLMM)分析奠定了基础。基于对2728张图像进行的51542次元数据标注,我们将我们的方法应用于2019年稳健医疗器械分割挑战赛(ROBUST - MIS)的结果,发现器械曝光不足、运动和遮挡以及背景中存在烟雾或其他物体是算法失败的主要原因。我们随后针对特定的遗留问题进行方法开发,得到了一个深度学习模型,该模型具有最先进的整体性能,并且在处理先前方法容易失败的图像方面具有特定优势。由于我们方法的客观性和广泛适用性,它可能成为医学图像分析及其他领域验证的宝贵工具。