Philips Research, HTC 34, North Brabant, 5656 AE, NL, The Netherlands.
University Medical Centre Utrecht, Department of Radiotherapy, Heidelberglaan 100 Utrecht, 3584 CX, NL, The Netherlands.
Phys Med Biol. 2024 Jan 24;69(3). doi: 10.1088/1361-6560/ad1a26.
. Prior to radiation therapy planning, accurate delineation of gross tumour volume (GTVs) and organs at risk (OARs) is crucial. In the current clinical practice, tumour delineation is performed manually by radiation oncologists, which is time-consuming and prone to large inter-observer variability. With the advent of deep learning (DL) models, automated contouring has become possible, speeding up procedures and assisting clinicians. However, these tools are currently used in the clinic mostly for contouring OARs, since these systems are not reliable yet for contouring GTVs. To improve the reliability of these systems, researchers have started exploring the topic of probabilistic neural networks. However, there is still limited knowledge of the practical implementation of such networks in real clinical settings.. In this work, we developed a 3D probabilistic system that generates DL-based uncertainty maps for lung cancer CT segmentations. We employed the Monte Carlo (MC) dropout technique to generate probabilistic and uncertainty maps, while the model calibration was evaluated by using reliability diagrams. A clinical validation was conducted in collaboration with a radiation oncologist to qualitatively assess the value of the uncertainty estimates. We also proposed two novel metrics, namely mean uncertainty (MU) and relative uncertainty volume (RUV), as potential indicators for clinicians to assess the need for independent visual checks of the DL-based segmentation. Our study showed that uncertainty mapping effectively identified cases of under or over-contouring. Although the overconfidence of the model, a strong correlation was observed between the clinical opinion and MU metric. Moreover, both MU and RUV revealed high AUC values in discretising between low and high uncertainty cases.. Our study is one of the first attempts to clinically validate uncertainty estimates in DL-based contouring. The two proposed metrics exhibited promising potential as indicators for clinicians to independently assess the quality of tumour delineation.
在放射治疗计划之前,准确勾画大体肿瘤体积(GTV)和危及器官(OAR)至关重要。在当前的临床实践中,肿瘤勾画由放射肿瘤学家手动完成,这既耗时又容易出现较大的观察者间变异性。随着深度学习(DL)模型的出现,自动勾画成为可能,加快了流程并为临床医生提供了帮助。然而,这些工具目前主要用于勾画 OAR,因为这些系统对于勾画 GTV 还不够可靠。为了提高这些系统的可靠性,研究人员开始探索概率神经网络的主题。然而,对于这些网络在实际临床环境中的实际实施,目前还知之甚少。
在这项工作中,我们开发了一个 3D 概率系统,为肺癌 CT 分割生成基于 DL 的不确定性图。我们采用蒙特卡罗(MC)dropout 技术生成概率和不确定性图,同时通过可靠性图评估模型校准。与放射肿瘤学家合作进行了临床验证,以定性评估不确定性估计的价值。我们还提出了两个新的指标,即平均不确定性(MU)和相对不确定性体积(RUV),作为临床医生评估是否需要对基于 DL 的分割进行独立视觉检查的潜在指标。
我们的研究表明,不确定性映射有效地识别了过度或不足的勾画情况。尽管模型存在过度自信,但观察到临床意见与 MU 指标之间存在很强的相关性。此外,MU 和 RUV 在区分低不确定性和高不确定性病例方面均表现出高 AUC 值。
我们的研究是首次尝试在基于 DL 的勾画中对不确定性估计进行临床验证。所提出的两个指标显示出作为临床医生独立评估肿瘤勾画质量的潜在指标的潜力。