IEEE Trans Pattern Anal Mach Intell. 2020 Nov;42(11):2825-2841. doi: 10.1109/TPAMI.2019.2915233. Epub 2019 May 10.
In this paper, a novel benchmark is introduced for evaluating local image descriptors. We demonstrate limitations of the commonly used datasets and evaluation protocols, that lead to ambiguities and contradictory results in the literature. Furthermore, these benchmarks are nearly saturated due to the recent improvements in local descriptors obtained by learning from large annotated datasets. To address these issues, we introduce a new large dataset suitable for training and testing modern descriptors, together with strictly defined evaluation protocols in several tasks such as matching, retrieval and verification. This allows for more realistic, thus more reliable comparisons in different application scenarios. We evaluate the performance of several state-of-the-art descriptors and analyse their properties. We show that a simple normalisation of traditional hand-crafted descriptors is able to boost their performance to the level of deep learning based descriptors once realistic benchmarks are considered. Additionally we specify a protocol for learning and evaluating using cross validation. We show that when training state-of-the-art descriptors on this dataset, the traditional verification task is almost entirely saturated.
在本文中,我们引入了一个新的基准来评估局部图像描述符。我们展示了常用数据集和评估协议的局限性,这些局限性导致文献中存在歧义且相互矛盾的结果。此外,由于通过从大型标注数据集中学习获得的局部描述符的最新改进,这些基准几乎达到饱和。为了解决这些问题,我们引入了一个新的大型数据集,该数据集适合训练和测试现代描述符,并在多个任务(如匹配、检索和验证)中定义了严格的评估协议。这使得在不同的应用场景中可以进行更真实、更可靠的比较。我们评估了几种最先进的描述符的性能,并分析了它们的特性。我们表明,一旦考虑到更真实的基准,对传统手工制作的描述符进行简单的归一化处理就能够将其性能提升到基于深度学习的描述符的水平。此外,我们指定了一个使用交叉验证进行学习和评估的协议。我们表明,当在这个数据集上训练最先进的描述符时,传统的验证任务几乎完全饱和。