IEEE Trans Image Process. 2016 Mar;25(3):1047-55. doi: 10.1109/TIP.2015.2510284. Epub 2015 Dec 17.
There exist a significant number of benchmarks for evaluating the performance of boundary detection algorithms, most of them relying on some sort of comparison of the automatically-generated boundaries with human-labeled ones. Such benchmarks are composed of a representative image data set, as well as a comparison measure on the universe of boundary images. Despite many such data sets and measures have been proposed, there is no clear way of knowing which combinations of them are the most suitable for the task. In this paper, we introduce four criteria that allow for a sensible evaluation of the performance of a comparison measure on a given data set. The criteria mimic the way in which humans understand boundary images, as well as their ability to recognize the underlying scenes. These criteria can, as a final goal, quantify the ability of the boundary detection benchmarks to evaluate the performance of boundary detection methods, either edge-based or segmentation-based.
存在大量用于评估边界检测算法性能的基准,其中大多数依赖于自动生成的边界与人工标记边界的某种比较。这些基准由代表性的图像数据集以及边界图像的比较度量组成。尽管已经提出了许多这样的数据集和度量,但尚不清楚它们的哪些组合最适合该任务。在本文中,我们引入了四个标准,可以合理地评估给定数据集上比较度量的性能。这些标准模拟了人类理解边界图像的方式,以及他们识别潜在场景的能力。作为最终目标,这些标准可以量化边界检测基准评估基于边缘或基于分割的边界检测方法性能的能力。