半监督哈希算法在大规模搜索中的应用

Semi-supervised hashing for large-scale search.

机构信息

Business Analytics and Mathematical Sciences Department, IBM T.J. Watson Research Center, RM 31-229, 1101 Kitchawan Rd, Rte. 134, Yorktown Heights, NY 10598, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2393-406. doi: 10.1109/TPAMI.2012.48.

DOI:10.1109/TPAMI.2012.48

PMID:22331853

Abstract

Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.

摘要

基于哈希的近似最近邻 (ANN) 在大型数据库中的搜索由于其计算和内存效率而变得流行。流行的哈希方法，例如局部敏感哈希和谱哈希，基于随机或主投影构建哈希函数。得到的哈希要么不是很准确，要么效率不高。此外，这些方法是针对给定的度量相似性设计的。相反，语义相似性通常是根据样本的成对标签来表示的。存在一些监督哈希方法可以处理这种语义相似性，但当标记数据较少或有噪声时，它们容易出现过拟合。在这项工作中，我们提出了一种半监督哈希 (SSH) 框架，该框架在标记集上最小化经验误差，在标记集和未标记集上最小化信息论正则化项。基于这个框架，我们提出了三种不同的半监督哈希方法，包括正交哈希、非正交哈希和顺序哈希。特别是，顺序哈希方法生成了鲁棒的代码，其中每个哈希函数都被设计用来纠正前一个哈希函数的错误。我们进一步表明，顺序学习范式可以扩展到没有可用标记对的无监督领域。在四个大型数据集（多达 8000 万个样本）上的广泛实验表明，所提出的 SSH 方法在最先进的监督和无监督哈希技术方面具有优越的性能。

相似文献

Semi-supervised hashing for large-scale search.半监督哈希算法在大规模搜索中的应用

IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2393-406. doi: 10.1109/TPAMI.2012.48.

Neighborhood Discriminant Hashing for Large-Scale Image Retrieval.基于邻域判别哈希的大规模图像检索

IEEE Trans Image Process. 2015 Sep;24(9):2827-40. doi: 10.1109/TIP.2015.2421443.

IEEE Trans Image Process. 2014 Jul;23(7):3025-39. doi: 10.1109/TIP.2014.2326010.

Unsupervised Semantic-Preserving Adversarial Hashing for Image Search.用于图像搜索的无监督语义保持对抗哈希

IEEE Trans Image Process. 2019 Aug;28(8):4032-4044. doi: 10.1109/TIP.2019.2903661. Epub 2019 Mar 13.

A General Framework for Linear Distance Preserving Hashing.一种线性距离保持哈希的通用框架。

IEEE Trans Image Process. 2018 Feb;27(2):907-922. doi: 10.1109/TIP.2017.2751150. Epub 2017 Sep 11.

Robust hashing with local models for approximate similarity search.基于局部模型的鲁棒哈希用于近似相似度搜索。

IEEE Trans Cybern. 2014 Jul;44(7):1225-36. doi: 10.1109/TCYB.2013.2289351.

Compact Structure Hashing via Sparse and Similarity Preserving Embedding.通过稀疏和相似性保持嵌入实现紧凑的结构哈希。

IEEE Trans Cybern. 2016 Mar;46(3):718-29. doi: 10.1109/TCYB.2015.2414299. Epub 2015 Apr 20.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.基于层次卷积特征的层次递归神经网络哈希图像检索

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Efficient Semi-Supervised Multimodal Hashing With Importance Differentiation Regression.基于重要性差异回归的高效半监督多模态哈希算法

IEEE Trans Image Process. 2022;31:5881-5892. doi: 10.1109/TIP.2022.3203216. Epub 2022 Sep 13.

Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search.用于大规模跨模态相似性搜索的标签一致矩阵分解哈希算法

IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2466-2479. doi: 10.1109/TPAMI.2018.2861000. Epub 2018 Jul 30.

引用本文的文献

Enhanced Image Retrieval Using Multiscale Deep Feature Fusion in Supervised Hashing.在监督哈希中使用多尺度深度特征融合的增强图像检索

J Imaging. 2025 Jan 12;11(1):20. doi: 10.3390/jimaging11010020.

Dual Attention Triplet Hashing Network for Image Retrieval.用于图像检索的双注意力三元组哈希网络

Front Neurorobot. 2021 Oct 18;15:728161. doi: 10.3389/fnbot.2021.728161. eCollection 2021.

Deep Convolutional Hashing for Low-Dimensional Binary Embedding of Histopathological Images.深度卷积哈希用于组织病理学图像的低维二进制嵌入。

IEEE J Biomed Health Inform. 2019 Mar;23(2):805-816. doi: 10.1109/JBHI.2018.2827703. Epub 2018 Apr 16.

Categorization of Images Using Autoencoder Hashing and Training of Intra Bin Classifiers for Image Classification and Annotation.使用自动编码器哈希进行图像分类，并为图像分类和标注训练内部类分类器。

J Med Syst. 2018 Jun 11;42(7):132. doi: 10.1007/s10916-018-0986-6.

Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis.联邦环境下的隐私保护患者相似度学习：开发与分析

JMIR Med Inform. 2018 Apr 13;6(2):e20. doi: 10.2196/medinform.7744.

Rapid Retrieval of Lung Nodule CT Images Based on Hashing and Pruning Methods.基于哈希和剪枝方法的肺结节CT图像快速检索

Biomed Res Int. 2016;2016:3162649. doi: 10.1155/2016/3162649. Epub 2016 Nov 22.

High-throughput histopathological image analysis via robust cell segmentation and hashing.通过稳健的细胞分割和哈希技术实现高通量组织病理学图像分析。

Med Image Anal. 2015 Dec;26(1):306-15. doi: 10.1016/j.media.2015.10.005. Epub 2015 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

半监督哈希算法在大规模搜索中的应用

Semi-supervised hashing for large-scale search.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献