Suppr超能文献

用于联合哈希表学习的分布式互补二进制量化

Distributed Complementary Binary Quantization for Joint Hash Table Learning.

作者信息

Liu Xianglong, Fu Qiang, Wang Deqing, Bai Xiao, Wu Xinyu, Tao Dacheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5312-5323. doi: 10.1109/TNNLS.2020.2965992. Epub 2020 Nov 30.

Abstract

Building multiple hash tables serves as a very successful technique for gigantic data indexing, which can simultaneously guarantee both the search accuracy and efficiency. However, most of existing multitable indexing solutions, without informative hash codes and strong table complementarity, largely suffer from the table redundancy. To address the problem, we propose a complementary binary quantization (CBQ) method for jointly learning multiple tables and the corresponding informative hash functions in a centralized way. Based on CBQ, we further design a distributed learning algorithm (D-CBQ) to accelerate the training over the large-scale distributed data set. The proposed (D-)CBQ exploits the power of prototype-based incomplete binary coding to well align the data distributions in the original space and the Hamming space and further utilizes the nature of multi-index search to jointly reduce the quantization loss. (D-)CBQ possesses several attractive properties, including the extensibility for generating long hash codes in the product space and the scalability with linear training time. Extensive experiments on two popular large-scale tasks, including the Euclidean and semantic nearest neighbor search, demonstrate that the proposed (D-)CBQ enjoys efficient computation, informative binary quantization, and strong table complementarity, which together help significantly outperform the state of the arts, with up to 57.76% performance gains relatively.

摘要

构建多个哈希表是一种非常成功的用于海量数据索引的技术,它可以同时保证搜索的准确性和效率。然而,现有的大多多表索引解决方案由于缺乏信息丰富的哈希码和强大的表互补性,在很大程度上存在表冗余问题。为了解决这个问题,我们提出了一种互补二进制量化(CBQ)方法,用于以集中方式联合学习多个表和相应的信息丰富的哈希函数。基于CBQ,我们进一步设计了一种分布式学习算法(D-CBQ),以加速在大规模分布式数据集上的训练。所提出的(D-)CBQ利用基于原型的不完全二进制编码的能力,使原始空间和汉明空间中的数据分布良好对齐,并进一步利用多索引搜索的特性来联合减少量化损失。(D-)CBQ具有几个吸引人的特性,包括在乘积空间中生成长哈希码的可扩展性和线性训练时间的可扩展性。在包括欧几里得和语义最近邻搜索在内的两个流行的大规模任务上进行的大量实验表明,所提出的(D-)CBQ具有高效的计算、信息丰富的二进制量化和强大的表互补性,这些共同有助于显著超越现有技术水平,相对性能提升高达57.76%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验