Guérendel Corentin, Petrychenko Liliana, Chupetlovska Kalina, Bodalal Zuhir, Beets-Tan Regina G H, Benson Sean
Department of Radiology, Antoni van Leeuwenhoek-The Netherlands Cancer Institute, Amsterdam, The Netherlands.
GROW-Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
Eur Radiol. 2024 Dec 31. doi: 10.1007/s00330-024-11321-2.
This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning.
We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations.
The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart.
Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods.
Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.
本研究旨在评估和比较两种先进的深度学习方法,用于在放射治疗计划的背景下,在CT图像中分割四个胸部危及器官(OAR)——食管、气管、心脏和主动脉。
我们比较了一种多器官分割方法和多个单器官模型的融合,每个单器官模型专门用于一个OAR。所有模型均使用nnU-Net并采用默认参数和全分辨率配置进行训练。我们通过对抗性扰动评估它们的鲁棒性,以及在外部数据集上的泛化能力,并探讨与完全手动描绘相比,专家校正引入的潜在偏差。
两种方法均表现出优异的性能,多类设置下的平均Dice分数为0.928,融合四个单器官模型时为0.930。对外部数据集和常见程序对抗性噪声的评估证明了这些模型具有良好的泛化能力。此外,两种模型的专家校正均显示出相对于原始自动分割存在显著偏差。两次校正之间的平均Dice分数为0.93,范围从气管的0.88到心脏的0.98。
两种方法在分割四个胸部OAR方面均表现出优异的性能和泛化能力,可能提高放射治疗计划的效率。然而,多器官设置因其效率优势而被证明是有利的,所需的训练时间和资源更少,使其成为该任务的首选。此外,临床医生对人工智能分割的校正可能会导致人工智能方法的结果出现偏差。应使用手动标注的测试集来评估此类方法的性能。
问题 虽然手动描绘胸部危及器官既费力、容易出错又耗时,但对执行此任务的人工智能模型的评估缺乏鲁棒性。发现 使用nnU-Net框架的深度学习模型在分割CT中的胸部器官方面表现出优异的性能、泛化能力和鲁棒性,提高了放射治疗计划效率。临床意义 自动分割胸部危及器官可以节省临床医生的时间,同时不影响描绘质量,并且在不同环境下的广泛评估证明了将此类模型整合到临床实践中的潜力。