Abdulkadir Yasin, Luximon Dishane, Morris Eric, Chow Phillip, Kishan Amar U, Mikaeilian Argin, Lamb James M
Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California, USA.
Med Phys. 2023 Oct;50(10):5969-5977. doi: 10.1002/mp.16676. Epub 2023 Aug 30.
Deep neural nets have revolutionized the science of auto-segmentation and present great promise for treatment planning automation. However, little data exists regarding clinical implementation and human factors. We evaluated the performance and clinical implementation of a novel deep learning-based auto-contouring workflow for 0.35T magnetic resonance imaging (MRI)-guided pelvic radiotherapy, focusing on automation bias and objective measures of workflow savings.
An auto-contouring model was developed using a UNet-derived architecture for the femoral heads, bladder, and rectum in 0.35T MR images. Training data was taken from 75 patients treated with MRI-guided radiotherapy at our institution. The model was tested against 20 retrospective cases outside the training set, and subsequently was clinically implemented. Usability was evaluated on the first 30 clinical cases by computing Dice coefficient (DSC), Hausdorff distance (HD), and the fraction of slices that were used un-modified by planners. Final contours were retrospectively reviewed by an experienced planner and clinical significance of deviations was graded as negligible, low, moderate, and high probability of leading to actionable dosimetric variations. In order to assess whether the use of auto-contouring led to final contours more or less in agreement with an objective standard, 10 pre-treatment and 10 post-treatment blinded cases were re-contoured from scratch by three expert planners to get expert consensus contours (EC). EC was compared to clinically used (CU) contours using DSC. Student's t-test and Levene's statistic were used to test statistical significance of differences in mean and standard deviation, respectively. Finally, the dosimetric significance of the contour differences were assessed by comparing the difference in bladder and rectum maximum point doses between EC and CU before and after the introduction of automation.
Median (interquartile range) DSC for the retrospective test data were 0.92(0.02), 0.92(0.06), 0.93(0.06), 0.87(0.04) for the post-processed contours for the right and left femoral heads, bladder, and rectum, respectively. Post-implementation median DSC were 1.0(0.0), 1.0(0.0), 0.98(0.04), and 0.98(0.06), respectively. For each organ, 96.2, 95.4, 59.5, and 68.21 percent of slices were used unmodified by the planner. DSC between EC and pre-implementation CU contours were 0.91(0.05*), 0.91*(0.05*), 0.95(0.04), and 0.88(0.04) for right and left femoral heads, bladder, and rectum, respectively. The corresponding DSC for post-implementation CU contours were 0.93(0.02*), 0.93*(0.01*), 0.96(0.01), and 0.85(0.02) (asterisks indicate statistically significant difference). In a retrospective review of contours used for planning, a total of four deviating slices in two patients were graded as low potential clinical significance. No deviations were graded as moderate or high. Mean differences between EC and CU rectum max-doses were 0.1 ± 2.6 Gy and -0.9 ± 2.5 Gy for pre- and post-implementation, respectively. Mean differences between EC and CU bladder/bladder wall max-doses were -0.9 ± 4.1 Gy and 0.0 ± 0.6 Gy for pre- and post-implementation, respectively. These differences were not statistically significant according to Student's t-test.
We have presented an analysis of the clinical implementation of a novel auto-contouring workflow. Substantial workflow savings were obtained. The introduction of auto-contouring into the clinical workflow changed the contouring behavior of planners. Automation bias was observed, but it had little deleterious effect on treatment planning.
深度神经网络给自动分割科学带来了变革,并为治疗计划自动化展现出巨大前景。然而,关于临床应用和人为因素的数据却很少。我们评估了一种基于深度学习的新型自动轮廓勾画工作流程在0.35T磁共振成像(MRI)引导的盆腔放射治疗中的性能和临床应用情况,重点关注自动化偏差和工作流程节省的客观指标。
使用源自UNet的架构为0.35T MR图像中的股骨头、膀胱和直肠开发了一个自动轮廓勾画模型。训练数据取自我们机构接受MRI引导放射治疗的75例患者。该模型在训练集之外的20例回顾性病例上进行了测试,随后进行了临床应用。通过计算Dice系数(DSC)、豪斯多夫距离(HD)以及计划者未修改使用的切片比例,对前30例临床病例的可用性进行了评估。由一位经验丰富的计划者对最终轮廓进行回顾性审查,并将偏差的临床意义分为可忽略不计、低、中、高导致可操作剂量学变化的概率。为了评估自动轮廓勾画的使用是否导致最终轮廓与客观标准的一致性更高或更低,三位专家计划者从10例治疗前和10例治疗后的数据中重新手动勾画轮廓,以获得专家共识轮廓(EC)。使用DSC将EC与临床使用的(CU)轮廓进行比较。分别使用学生t检验和莱文统计量来检验均值和标准差差异的统计学显著性。最后,通过比较引入自动化前后EC和CU之间膀胱和直肠最大点剂量的差异,评估轮廓差异的剂量学意义。
回顾性测试数据中,右侧和左侧股骨头、膀胱和直肠的后处理轮廓的中位数(四分位间距)DSC分别为0.92(0.02)、0.92(0.06)、0.93(0.06)、0.87(0.04)。实施后的中位数DSC分别为1.0(0.0)、1.0(0.0)、0.98(0.04)和0.98(0.06)。对于每个器官,计划者未修改使用的切片比例分别为96.2%、95.4%、59.5%和68.21%。EC与实施前CU轮廓之间的DSC,右侧和左侧股骨头、膀胱和直肠分别为0.91(0.05*)、0.91*(0.05*)、0.95(0.04)和0.88(0.04)。实施后CU轮廓的相应DSC为0.93(0.02*)、0.93*(0.01*)、0.96(0.01)和0.85(0.02)(星号表示统计学显著差异)。在对用于计划的轮廓的回顾性审查中,两名患者中总共四个偏差切片被评定为临床意义低。没有偏差被评定为中度或高度。实施前和实施后,EC与CU直肠最大剂量之间的平均差异分别为0.1±2.6 Gy和 -0.9±2.5 Gy。实施前和实施后,EC与CU膀胱/膀胱壁最大剂量之间的平均差异分别为 -0.9±4.1 Gy和0.0±0.6 Gy。根据学生t检验,这些差异无统计学显著性。
我们对一种新型自动轮廓勾画工作流程的临床应用进行了分析。实现了工作流程的大幅节省。将自动轮廓勾画引入临床工作流程改变了计划者的轮廓勾画行为。观察到了自动化偏差,但它对治疗计划几乎没有有害影响。