Suppr超能文献

基于 p-稳定分布的数据相关哈希。

Data-dependent hashing based on p-stable distribution.

出版信息

IEEE Trans Image Process. 2014 Dec;23(12):5033-46. doi: 10.1109/TIP.2014.2352458. Epub 2014 Aug 27.

Abstract

The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property, we develop a projection method, which maps the original data to arbitrary dimensional vectors. Each projection vector is a linear combination of multiple random vectors subject to p-stable distribution, in which the weights for the linear combination are learned based on the training data. An orthogonal matrix is then learned data-dependently for minimizing the thresholding error in quantization. Combining the projection method and orthogonal matrix, we develop an unsupervised hashing scheme, which preserves the Euclidean distance. Compared with data-independent hashing methods, our method takes the data distribution into consideration and gives more accurate hashing results with compact hash codes. Different from many data-dependent hashing methods, our method accommodates multiple hash tables and is not restricted by the number of hash functions. To extend our method to a supervised scenario, we incorporate a supervised label propagation scheme into the proposed projection method. This results in a supervised hashing scheme, which preserves semantic similarity of data. Experimental results show that our methods have outperformed several state-of-the-art hashing approaches in both effectiveness and efficiency.

摘要

p-稳定分布传统上用于数据独立的哈希。在本文中,我们描述了如何基于 p-稳定分布执行数据相关的哈希。我们首先从方差估计的角度来描述欧几里得距离保持特性。基于这个特性,我们开发了一种投影方法,它将原始数据映射到任意维向量。每个投影向量都是多个随机向量的线性组合,这些随机向量服从 p-稳定分布,其中线性组合的权重是基于训练数据学习的。然后,学习一个正交矩阵来最小化量化中的阈值误差。结合投影方法和正交矩阵,我们开发了一种保持欧几里得距离的无监督哈希方案。与数据独立的哈希方法相比,我们的方法考虑了数据分布,并且使用紧凑的哈希码得到更准确的哈希结果。与许多数据相关的哈希方法不同,我们的方法可以容纳多个哈希表,并且不受哈希函数数量的限制。为了将我们的方法扩展到有监督的场景,我们将有监督的标签传播方案纳入到所提出的投影方法中。这导致了一种保持数据语义相似性的有监督哈希方案。实验结果表明,我们的方法在有效性和效率方面都优于几种最先进的哈希方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验