IEEE Trans Med Imaging. 2018 Apr;37(4):1045-1057. doi: 10.1109/TMI.2017.2778748.
The most commonly used evaluation metrics for quality assessment of retinal vessel segmentation are sensitivity, specificity, and accuracy, which are based on pixel-to-pixel matching. However, due to the inter-observer problem that vessels annotated by different observers vary in both thickness and location, pixel-to-pixel matching is too restrictive to fairly evaluate the results of vessel segmentation. In this paper, the proposed skeletal similarity metric is constructed by comparing the skeleton maps generated from the reference and the source vessel segmentation maps. To address the inter-observer problem, instead of using a pixel-to-pixel matching strategy, each skeleton segment in the reference skeleton map is adaptively assigned with a searching range whose radius is determined based on its vessel thickness. Pixels in the source skeleton map located within the searching range are then selected for similarity calculation. The skeletal similarity consists of a curve similarity, which measures the structural similarity between the reference and the source skeleton maps and a thickness similarity, which measures the thickness consistency between the reference and the source vessel segmentation maps. In contrast to other metrics that provide a global score for the overall performance, we modify the definitions of true positive, false negative, true negative, and false positive based on the skeletal similarity, based on which sensitivity, specificity, accuracy, and other objective measurements can be constructed. More importantly, the skeletal similarity metric has better potential to be used as a pixelwise loss function for training deep learning models for retinal vessel segmentation. Through comparison of a set of examples, we demonstrate that the redefined metrics based on the skeletal similarity are more effective for quality evaluation, especially with greater tolerance to the inter-observer problem.
用于评估视网膜血管分割质量的最常用评估指标是敏感性、特异性和准确性,这些指标都是基于像素级匹配的。然而,由于观察者之间的差异,不同观察者标注的血管在厚度和位置上都存在差异,像素级匹配过于严格,无法公平评估血管分割的结果。在本文中,我们提出了一种基于骨架相似性的度量方法,通过比较参考和源血管分割图生成的骨架图来构建。为了解决观察者之间的问题,我们不是使用像素级匹配策略,而是为参考骨架图中的每个骨架段自适应地分配一个搜索范围,其半径由血管的厚度决定。然后,在源骨架图中选择位于搜索范围内的像素进行相似性计算。骨架相似性由曲线相似性和厚度相似性组成,曲线相似性测量参考和源骨架图之间的结构相似性,厚度相似性测量参考和源血管分割图之间的厚度一致性。与提供整体性能全局得分的其他指标不同,我们根据骨架相似性修改了真阳性、假阴性、真阴性和假阳性的定义,基于这些定义可以构建敏感性、特异性、准确性和其他客观测量。更重要的是,骨架相似性度量方法更适合作为视网膜血管分割的深度学习模型训练的像素级损失函数。通过一组示例的比较,我们证明了基于骨架相似性的重新定义的度量指标在质量评估方面更有效,尤其是对观察者之间的问题具有更大的容忍度。