IEEE Trans Neural Netw Learn Syst. 2018 Jan;29(1):87-103. doi: 10.1109/TNNLS.2016.2615085. Epub 2016 Oct 24.
Hashing-based semantic similarity search is becoming increasingly important for building large-scale content-based retrieval system. The state-of-the-art supervised hashing techniques use flexible two-step strategy to learn hash functions. The first step learns binary codes for training data by solving binary optimization problems with millions of variables, thus usually requiring intensive computations. Despite simplicity and efficiency, locality-sensitive hashing (LSH) has never been recognized as a good way to generate such codes due to its poor performance in traditional approximate neighbor search. We claim in this paper that the true merit of LSH lies in transforming the semantic labels to obtain the binary codes, resulting in an effective and efficient two-step hashing framework. Specifically, we developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique. Theoretically, with proper assumption, LS-TSH is actually a useful LSH scheme, so that it preserves the label-based semantic similarity and possesses sublinear query complexity for hash lookup. Experimentally, LS-TSH could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.
基于哈希的语义相似性搜索在构建大规模基于内容的检索系统方面变得越来越重要。最先进的监督哈希技术使用灵活的两步策略来学习哈希函数。第一步通过解决具有数百万个变量的二进制优化问题为训练数据学习二进制代码,因此通常需要大量的计算。尽管局部敏感哈希 (LSH) 简单高效,但由于其在传统近似邻居搜索中的性能较差,从未被认为是生成此类代码的好方法。我们在本文中声称,LSH 的真正优点在于将语义标签转换以获得二进制代码,从而形成一个有效且高效的两步哈希框架。具体来说,我们开发了局部敏感两步哈希 (LS-TSH),通过 LSH 而不是任何复杂的优化技术生成二进制代码。从理论上讲,在适当的假设下,LS-TSH 实际上是一种有用的 LSH 方案,因此它保留了基于标签的语义相似性,并具有亚线性的哈希查询复杂度。实验上,LS-TSH 可以以快两到三个数量级的速度获得与最先进技术相当的检索精度。