Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10/2440, Leuven 3001, Belgium; Medical Imaging Research Center, UZ Leuven, Herestraat 49/7003, Leuven 3000, Belgium.
Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10/2440, Leuven 3001, Belgium; icometrix, Kolonel Begaultlaan 1b/12, Leuven 3000, Belgium.
Med Image Anal. 2021 Jan;67:101833. doi: 10.1016/j.media.2020.101833. Epub 2020 Oct 7.
The clinical interest is often to measure the volume of a structure, which is typically derived from a segmentation. In order to evaluate and compare segmentation methods, the similarity between a segmentation and a predefined ground truth is measured using popular discrete metrics, such as the Dice score. Recent segmentation methods use a differentiable surrogate metric, such as soft Dice, as part of the loss function during the learning phase. In this work, we first briefly describe how to derive volume estimates from a segmentation that is, potentially, inherently uncertain or ambiguous. This is followed by a theoretical analysis and an experimental validation linking the inherent uncertainty to common loss functions for training CNNs, namely cross-entropy and soft Dice. We find that, even though soft Dice optimization leads to an improved performance with respect to the Dice score and other measures, it may introduce a volume bias for tasks with high inherent uncertainty. These findings indicate some of the method's clinical limitations and suggest doing a closer ad-hoc volume analysis with an optional re-calibration step.
临床研究通常需要测量结构的体积,这通常可以通过分割来实现。为了评估和比较分割方法,通常使用流行的离散度量标准(如 Dice 评分)来衡量分割与预定义的真实分割之间的相似性。最近的分割方法在学习阶段使用可微的替代度量标准(如软 Dice)作为损失函数的一部分。在这项工作中,我们首先简要描述了如何从分割中推导出体积估计值,即分割可能具有固有的不确定性或歧义。然后,我们进行了理论分析和实验验证,将固有不确定性与用于训练 CNN 的常见损失函数(即交叉熵和软 Dice)联系起来。我们发现,尽管软 Dice 优化相对于 Dice 评分和其他度量标准可以提高性能,但对于固有不确定性较高的任务,它可能会引入体积偏差。这些发现表明了该方法的一些临床局限性,并建议进行更密切的特定于任务的体积分析,并可能需要进行重新校准。