Suppr超能文献

分布式自适应二进制量化用于快速最近邻搜索。

Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.

出版信息

IEEE Trans Image Process. 2017 Nov;26(11):5324-5336. doi: 10.1109/TIP.2017.2729896. Epub 2017 Jul 24.

Abstract

Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.

摘要

哈希技术已被证明是一种用于大数据快速最近邻搜索的有吸引力的技术。与基于投影的哈希方法相比,基于原型的哈希方法具有更强的能力为具有复杂内在结构的数据生成判别性二进制代码。然而,现有的基于原型的方法,如球形哈希和 K-均值哈希,仍然受到在超立方体中利用完整二进制代码进行有效编码的困扰。为了解决这个问题,我们提出了一种自适应二进制量化 (ABQ) 方法,该方法使用与小唯一二进制代码相关联的原型学习判别性哈希函数。我们的交替优化以有效的方式自适应地发现具有不同大小的原型集和代码集,这些原型集和代码集共同稳健地近似数据关系。我们的方法可以自然地推广到用于长哈希码的乘积空间,并具有快速训练,其训练线性取决于训练数据的数量。我们进一步设计了一种用于大规模学习的分布式框架,该框架可以显著加快 ABQ 在当今广泛应用于许多领域的分布式环境中的训练。在四个大规模数据集(最多 8000 万)上的广泛实验表明,我们的方法显著优于最先进的哈希方法,相对性能提高高达 58.84%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验