IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):769-790. doi: 10.1109/TPAMI.2017.2699960. Epub 2017 May 2.
Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.
最近邻搜索是指在数据库中查找距离查询点最近的数据点的问题。学习哈希是解决这个问题的主要方法之一,最近得到了广泛的研究。在本文中,我们对学习哈希算法进行了全面的调查,根据保持相似度的方式将其分类为:两两相似度保持、多对相似度保持、隐式相似度保持以及量化,并讨论了它们之间的关系。我们将量化与两两相似度保持分开,因为虽然量化可以从保持两两相似度中得出,但目标函数却非常不同。此外,我们还介绍了评估协议和一般性能分析,并指出量化算法在搜索准确性、搜索时间成本和空间成本方面表现更优。最后,我们介绍了一些新兴的主题。