University of New South Wales, South Western Sydney Clinical School, Sydney, Australia; Ingham Institute for Applied Medical Research, Sydney, Australia; Liverpool and Macarthur Cancer Therapy Centres, Department of Radiation Oncology, Sydney, Australia.
University of New South Wales, South Western Sydney Clinical School, Sydney, Australia; Ingham Institute for Applied Medical Research, Sydney, Australia; CSIRO Australian e-Health Research Centre, Herston, Australia.
Comput Med Imaging Graph. 2024 Sep;116:102403. doi: 10.1016/j.compmedimag.2024.102403. Epub 2024 Jun 2.
BACKGROUND AND OBJECTIVES: Bio-medical image segmentation models typically attempt to predict one segmentation that resembles a ground-truth structure as closely as possible. However, as medical images are not perfect representations of anatomy, obtaining this ground truth is not possible. A surrogate commonly used is to have multiple expert observers define the same structure for a dataset. When multiple observers define the same structure on the same image there can be significant differences depending on the structure, image quality/modality and the region being defined. It is often desirable to estimate this type of aleatoric uncertainty in a segmentation model to help understand the region in which the true structure is likely to be positioned. Furthermore, obtaining these datasets is resource intensive so training such models using limited data may be required. With a small dataset size, differing patient anatomy is likely not well represented causing epistemic uncertainty which should also be estimated so it can be determined for which cases the model is effective or not. METHODS: We use a 3D probabilistic U-Net to train a model from which several segmentations can be sampled to estimate the range of uncertainty seen between multiple observers. To ensure that regions where observers disagree most are emphasised in model training, we expand the Generalised Evidence Lower Bound (ELBO) with a Constrained Optimisation (GECO) loss function with an additional contour loss term to give attention to this region. Ensemble and Monte-Carlo dropout (MCDO) uncertainty quantification methods are used during inference to estimate model confidence on an unseen case. We apply our methodology to two radiotherapy clinical trial datasets, a gastric cancer trial (TOPGEAR, TROG 08.08) and a post-prostatectomy prostate cancer trial (RAVES, TROG 08.03). Each dataset contains only 10 cases each for model development to segment the clinical target volume (CTV) which was defined by multiple observers on each case. An additional 50 cases are available as a hold-out dataset for each trial which had only one observer define the CTV structure on each case. Up to 50 samples were generated using the probabilistic model for each case in the hold-out dataset. To assess performance, each manually defined structure was matched to the closest matching sampled segmentation based on commonly used metrics. RESULTS: The TOPGEAR CTV model achieved a Dice Similarity Coefficient (DSC) and Surface DSC (sDSC) of 0.7 and 0.43 respectively with the RAVES model achieving 0.75 and 0.71 respectively. Segmentation quality across cases in the hold-out datasets was variable however both the ensemble and MCDO uncertainty estimation approaches were able to accurately estimate model confidence with a p-value < 0.001 for both TOPGEAR and RAVES when comparing the DSC using the Pearson correlation coefficient. CONCLUSIONS: We demonstrated that training auto-segmentation models which can estimate aleatoric and epistemic uncertainty using limited datasets is possible. Having the model estimate prediction confidence is important to understand for which unseen cases a model is likely to be useful.
背景与目的:生物医学图像分割模型通常试图预测一个与地面真实结构尽可能接近的分割。然而,由于医学图像不是解剖结构的完美表示,因此无法获得该地面真实情况。常用的替代方法是让多个专家观察者为数据集定义相同的结构。当多个观察者在同一图像上定义相同的结构时,根据结构、图像质量/模态和定义的区域,可能会存在显著差异。通常希望在分割模型中估计这种类型的随机不确定性,以帮助了解真实结构可能定位的区域。此外,获取这些数据集需要大量资源,因此可能需要使用有限的数据来训练此类模型。由于数据集较小,不同患者的解剖结构可能无法很好地表示,从而导致认知不确定性,还需要估计这种不确定性,以便确定模型在哪些情况下有效或无效。
方法:我们使用 3D 概率 U-Net 从模型中训练出多个分割,可以从中采样以估计多个观察者之间观察到的不确定性范围。为了确保在模型训练中强调观察者最不一致的区域,我们使用带有约束优化(GECO)损失函数的广义证据下界(ELBO)扩展,增加轮廓损失项以关注该区域。在推理过程中使用集成和蒙特卡罗随机失活(MCDO)不确定性量化方法来估计对未见病例的模型置信度。我们将我们的方法应用于两个放射治疗临床试验数据集,一个胃癌试验(TOPGEAR,TROG 08.08)和一个前列腺癌手术后试验(RAVES,TROG 08.03)。每个数据集仅为模型开发开发了 10 个病例,以分割每个病例的临床靶区(CTV),该靶区由每个病例的多个观察者定义。每个试验还有另外 50 个病例作为保留数据集,每个病例只有一个观察者定义 CTV 结构。使用概率模型为保留数据集中的每个病例生成了多达 50 个样本。为了评估性能,根据常用的度量标准,将每个手动定义的结构与最接近的采样分割相匹配。
结果:TOPGEAR CTV 模型的 Dice 相似性系数(DSC)和表面 DSC(sDSC)分别为 0.7 和 0.43,RAVES 模型分别为 0.75 和 0.71。保留数据集的病例之间的分割质量各不相同,但两种集成和 MCDO 不确定性估计方法都能够准确估计模型置信度,对于 TOPGEAR 和 RAVES,当使用 Pearson 相关系数比较 DSC 时,p 值<0.001。
结论:我们证明了使用有限数据集训练能够估计随机和认知不确定性的自动分割模型是可行的。让模型估计预测置信度对于理解模型在哪些未见病例中可能有用非常重要。
Comput Med Imaging Graph. 2024-9
Philos Trans A Math Phys Eng Sci. 2025-3-13