基于深度学习的不同自动分割方法在小儿颅脊髓照射治疗计划中对危及器官进行自动轮廓勾画的权衡。
Trade-off of different deep learning-based auto-segmentation approaches for treatment planning of pediatric craniospinal irradiation autocontouring of OARs for pediatric CSI.
作者信息
Thibodeau-Antonacci Alana, Popovic Marija, Ates Ozgur, Hua Chia-Ho, Schneider James, Skamene Sonia, Freeman Carolyn, Enger Shirin Abbasinejad, Tsui James Man Git
机构信息
Medical Physics Unit, Department of Oncology, McGill University, Montreal, Quebec, Canada.
Department of Radiation Oncology, St. Jude Children's Research Hospital, Memphis, USA.
出版信息
Med Phys. 2025 Jun;52(6):3541-3556. doi: 10.1002/mp.17782. Epub 2025 Apr 1.
BACKGROUND
As auto-segmentation tools become integral to radiotherapy, more commercial products emerge. However, they may not always suit our needs. One notable example is the use of adult-trained commercial software for the contouring of organs at risk (OARs) of pediatric patients.
PURPOSE
This study aimed to compare three auto-segmentation approaches in the context of pediatric craniospinal irradiation (CSI): commercial, out-of-the-box, and in-house.
METHODS
CT scans from 142 pediatric patients undergoing CSI were obtained from St. Jude Children's Research Hospital (training: 115; validation: 27). A test dataset comprising 16 CT scans was collected from the McGill University Health Centre. All images underwent manual delineation of 18 OARs. LimbusAI v1.7 served as the commercial product, while nnU-Net was trained for benchmarking. Additionally, a two-step in-house approach was pursued where smaller 3D CT scans containing the OAR of interest were first recovered and then used as input to train organ-specific models. Three variants of the U-Net architecture were explored: a basic U-Net, an attention U-Net, and a 2.5D U-Net. The dice similarity coefficient (DSC) assessed segmentation accuracy, and the DSC trend with age was investigated (Mann-Kendall test). A radiation oncologist determined the clinical acceptability of all contours using a five-point Likert scale.
RESULTS
Differences in the contours between the validation and test datasets reflected the distinct institutional standards. The lungs and left kidney displayed an increasing age-related trend of the DSC values with LimbusAI on the validation and test datasets. LimbusAI contours of the esophagus were often truncated distally and mistaken for the trachea for younger patients, resulting in a DSC score of less than 0.5 on both datasets. Additionally, the kidneys frequently exhibited false negatives, leading to mean DSC values that were up to 0.11 lower on the validation set and 0.07 on the test set compared to the other models. Overall, nnU-Net achieved good performance for body organs but exhibited difficulty differentiating the laterality of head structures, resulting in a large variation of DSC values with the standard deviation reaching 0.35 for the lenses. All in-house models generally had similar DSC values when compared against each other and nnU-Net. Inference time on the test data was between 47-55 min on a Central Processing Unit (CPU) for the in-house models, while it was 1h 21m with a V100 Graphics Processing Unit (GPU) for nnU-Net.
CONCLUSIONS
LimbusAI could not adapt well to pediatric anatomy for the esophagus and the kidneys. When commercial products do not suit the study population, the nnU-Net is a viable option but requires adjustments. In resource-constrained settings, the in-house model provides an alternative. Implementing an automated segmentation tool requires careful monitoring and quality assurance regardless of the approach.
背景
随着自动分割工具成为放射治疗不可或缺的一部分,出现了更多的商业产品。然而,它们可能并不总是符合我们的需求。一个显著的例子是使用针对成人训练的商业软件来勾画儿科患者的危及器官(OARs)轮廓。
目的
本研究旨在比较儿科全脑全脊髓照射(CSI)背景下的三种自动分割方法:商业的、开箱即用的和内部开发的。
方法
从圣裘德儿童研究医院获取了142例接受CSI的儿科患者的CT扫描图像(训练:115例;验证:27例)。从麦吉尔大学健康中心收集了包含16例CT扫描的测试数据集。所有图像均进行了18个OARs的手动勾画。LimbusAI v1.7作为商业产品,而nnU-Net经过训练用于基准测试。此外,采用了一种两步的内部方法,首先恢复包含感兴趣OAR的较小3D CT扫描,然后将其用作输入来训练器官特异性模型。探索了U-Net架构的三种变体:基本U-Net、注意力U-Net和2.5D U-Net。使用骰子相似系数(DSC)评估分割准确性,并研究DSC随年龄的趋势(曼-肯德尔检验)。一位放射肿瘤学家使用五点李克特量表确定所有轮廓的临床可接受性。
结果
验证数据集和测试数据集之间轮廓的差异反映了不同的机构标准。在验证和测试数据集中,LimbusAI显示肺和左肾的DSC值随年龄呈上升趋势。对于年轻患者,LimbusAI勾画的食管轮廓在远端经常被截断,并被误认为气管,导致两个数据集上的DSC得分均低于0.5。此外,肾脏经常出现假阴性,导致验证集上的平均DSC值比其他模型低达0.11,测试集上低0.07。总体而言,nnU-Net在身体器官方面表现良好,但在区分头部结构的左右侧方面存在困难,导致晶状体的DSC值变化很大,标准差达到0.35。与nnU-Net相比,所有内部模型的DSC值通常相似。内部模型在中央处理器(CPU)上对测试数据的推理时间在47 - 55分钟之间,而nnU-Net在V100图形处理器(GPU)上的推理时间为1小时21分钟。
结论
LimbusAI在食管和肾脏的儿科解剖结构方面适应性不佳。当商业产品不适合研究人群时,nnU-Net是一个可行的选择,但需要进行调整。在资源有限的情况下,内部模型提供了一种替代方案。无论采用何种方法,实施自动分割工具都需要仔细监测和质量保证。