IEEE Trans Cybern. 2014 Aug;44(8):1362-71. doi: 10.1109/TCYB.2013.2283497. Epub 2013 Oct 23.
Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches.
最近邻搜索是机器学习、数据挖掘和模式识别等各个研究领域的一个基本问题。最近,基于哈希的方法,例如局部敏感哈希(LSH),已被证明对于可扩展的高维最近邻搜索是有效的。许多哈希算法在随机投影中找到了它们的理论根源。由于这些算法随机生成哈希表(投影),因此需要大量的哈希表(即长码字)才能同时实现高精度和召回率。为了解决这个限制,我们在本文中提出了一种称为密度敏感哈希(DSH)的新哈希算法。DSH 可以被视为 LSH 的扩展。通过探索数据的几何结构,DSH 避免了纯粹的随机投影选择,并使用了与数据分布最匹配的投影函数。在真实数据集上的大量实验结果表明,与最先进的哈希方法相比,所提出的方法具有更好的性能。