Bakx Nienke, van der Sangen Maurice, Theuws Jacqueline, Bluemink Hanneke, Hurkmans Coen
Catharina Hospital, Department of Radiation Oncology, 5602ZA Eindhoven, the Netherlands.
Technical University Eindhoven, Faculties of Physics and Electrical Engineering, 5600MB Eindhoven, the Netherlands.
Tech Innov Patient Support Radiat Oncol. 2023 May 13;26:100209. doi: 10.1016/j.tipsro.2023.100209. eCollection 2023 Jun.
The development of deep learning (DL) models for auto-segmentation is increasing and more models become commercially available. Mostly, commercial models are trained on external data. To study the effect of using a model trained on external data, compared to the same model trained on in-house collected data, the performance of these two DL models was evaluated.
The evaluation was performed using in-house collected data of 30 breast cancer patients. Quantitative analysis was performed using Dice similarity coefficient (DSC), surface DSC (sDSC) and 95th percentile of Hausdorff Distance (95% HD). These values were compared with previously reported inter-observer variations (IOV).
For a number of structures, statistically significant differences were found between the two models. For organs at risk, mean values for DSC ranged from 0.63 to 0.98 and 0.71 to 0.96 for the in-house and external model, respectively. For target volumes, mean DSC values of 0.57 to 0.94 and 0.33 to 0.92 were found. The difference of 95% HD values ranged 0.08 to 3.23 mm between the two models, except for CTVn4 with 9.95 mm. For the external model, both DSC and 95% HD are outside the range of IOV for CTVn4, whereas this is the case for the DSC found for the thyroid of the in-house model.
Statistically significant differences were found between both models, which were mostly within published inter-observer variations, showing clinical usefulness of both models. Our findings could encourage discussion and revision of existing guidelines, to further decrease inter-observer, but also inter-institute variability.
用于自动分割的深度学习(DL)模型不断发展,越来越多的模型开始商业化。大多数情况下,商业模型是基于外部数据进行训练的。为了研究使用基于外部数据训练的模型与基于内部收集数据训练的相同模型相比的效果,对这两种DL模型的性能进行了评估。
使用内部收集的30例乳腺癌患者的数据进行评估。采用骰子相似系数(DSC)、表面DSC(sDSC)和95% 豪斯多夫距离(95% HD)进行定量分析。将这些值与先前报道的观察者间差异(IOV)进行比较。
在多个结构上,发现两种模型之间存在统计学显著差异。对于危及器官,内部模型和外部模型的DSC平均值分别为0.63至0.98和0.71至0.96。对于靶区体积,发现平均DSC值分别为0.57至0.94和0.33至0.92。除CTVn4为9.95mm外,两种模型的95% HD值差异范围为0.08至3.23mm。对于外部模型,CTVn4的DSC和95% HD均超出IOV范围,而内部模型甲状腺的DSC情况也是如此。
两种模型之间存在统计学显著差异,大多在已发表的观察者间差异范围内,表明两种模型均具有临床实用性。我们的研究结果可能会促使对现有指南进行讨论和修订,以进一步减少观察者间以及机构间的变异性。