Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir, Turkey.
Department of Radiology, Faculty Of Medicine, Dokuz Eylul University, Izmir, Turkey.
Med Image Anal. 2021 Apr;69:101950. doi: 10.1016/j.media.2020.101950. Epub 2020 Dec 25.
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) introduced new state-of-the-art segmentation systems. Despite outperforming the overall accuracy of existing systems, the effects of DL model properties and parameters on the performance are hard to interpret. This makes comparative analysis a necessary tool towards interpretable studies and systems. Moreover, the performance of DL for emerging learning approaches such as cross-modality and multi-modal semantic segmentation tasks has been rarely discussed. In order to expand the knowledge on these topics, the CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy. Abdominal organ segmentation from routine acquisitions plays an important role in several clinical applications, such as pre-surgical planning or morphological and volumetric follow-ups for various diseases. These applications require a certain level of performance on a diverse set of metrics such as maximum symmetric surface distance (MSSD) to determine surgical error-margin or overlap errors for tracking size and shape differences. Previous abdomen related challenges are mainly focused on tumor/lesion detection and/or classification with a single modality. Conversely, CHAOS provides both abdominal CT and MR data from healthy subjects for single and multiple abdominal organ segmentation. Five different but complementary tasks were designed to analyze the capabilities of participating approaches from multiple perspectives. The results were investigated thoroughly, compared with manual annotations and interactive methods. The analysis shows that the performance of DL models for single modality (CT / MR) can show reliable volumetric analysis performance (DICE: 0.98 ± 0.00 / 0.95 ± 0.01), but the best MSSD performance remains limited (21.89 ± 13.94 / 20.85 ± 10.63 mm). The performances of participating models decrease dramatically for cross-modality tasks both for the liver (DICE: 0.88 ± 0.15 MSSD: 36.33 ± 21.97 mm). Despite contrary examples on different applications, multi-tasking DL models designed to segment all organs are observed to perform worse compared to organ-specific ones (performance drop around 5%). Nevertheless, some of the successful models show better performance with their multi-organ versions. We conclude that the exploration of those pros and cons in both single vs multi-organ and cross-modality segmentations is poised to have an impact on further research for developing effective algorithms that would support real-world clinical applications. Finally, having more than 1500 participants and receiving more than 550 submissions, another important contribution of this study is the analysis on shortcomings of challenge organizations such as the effects of multiple submissions and peeking phenomenon.
腹部器官的分割一直是一个全面但尚未解决的研究领域,已经有多年了。在过去的十年中,深度学习(DL)的深入发展引入了新的最先进的分割系统。尽管这些系统的整体准确性超过了现有系统,但 DL 模型属性和参数对性能的影响却难以解释。这使得比较分析成为可解释研究和系统的必要工具。此外,DL 在新兴学习方法(如跨模态和多模态语义分割任务)中的性能很少被讨论。为了扩展关于这些主题的知识,CHAOS - 联合(CT-MR)健康腹部器官分割挑战赛于 2019 年在意大利威尼斯与 IEEE 生物医学成像国际研讨会(ISBI)联合举办。来自常规采集的腹部器官分割在许多临床应用中发挥着重要作用,例如术前规划或各种疾病的形态和体积随访。这些应用程序需要在各种指标上达到一定的性能水平,例如最大对称表面距离(MSSD),以确定手术误差范围或跟踪大小和形状差异的重叠误差。以前与腹部相关的挑战主要集中在单一模态的肿瘤/病变检测和/或分类上。相反,CHAOS 为单一和多个腹部器官分割提供了来自健康受试者的腹部 CT 和 MR 数据。设计了五个不同但互补的任务,从多个角度分析参与方法的能力。结果经过深入调查,并与手动注释和交互式方法进行了比较。分析表明,DL 模型在单一模态(CT / MR)上的性能可以显示出可靠的体积分析性能(DICE:0.98 ± 0.00 / 0.95 ± 0.01),但最佳 MSSD 性能仍然有限(21.89 ± 13.94 / 20.85 ± 10.63 毫米)。对于跨模态任务,参与模型的性能对于肝脏(DICE:0.88 ± 0.15 MSSD:36.33 ± 21.97 毫米)都急剧下降。尽管在不同的应用中有相反的例子,但设计用于分割所有器官的多任务 DL 模型与特定器官的模型相比,其性能下降(约 5%)。然而,一些成功的模型在其多器官版本中显示出更好的性能。我们得出的结论是,在单器官与多器官和跨模态分割中探索这些优缺点,有望对开发支持实际临床应用的有效算法的进一步研究产生影响。最后,这项研究的另一个重要贡献是分析了挑战组织的缺点,如多次提交和偷看现象的影响,该研究有超过 1500 名参与者和 550 多个提交。