用于可扩展图像-文本和视频-文本检索的深度语义多模态哈希网络

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

作者信息

Jin Lu, Li Zechao, Tang Jinhui

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):1838-1851. doi: 10.1109/TNNLS.2020.2997020. Epub 2023 Apr 4.

DOI:10.1109/TNNLS.2020.2997020

Abstract

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval. The proposed deep hashing framework leverages 2-D convolutional neural networks (CNN) as the backbone network to capture the spatial information for image-text retrieval, while the 3-D CNN as the backbone network to capture the spatial and temporal information for video-text retrieval. In the DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both intermodality similarities and intramodality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for the classification task, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Moreover, a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning, intermodality similarity-preserving learning, semantic label-preserving learning, and hash function learning with different types of loss functions simultaneously. The proposed DSMHN method is a generic and scalable deep hashing framework for both image-text and video-text retrievals, which can be flexibly integrated with different types of loss functions. We conduct extensive experiments for both single-modal- and cross-modal-retrieval tasks on four widely used multimodal-retrieval data sets. Experimental results on both image-text- and video-text-retrieval tasks demonstrate that the DSMHN significantly outperforms the state-of-the-art methods.

摘要

由于哈希在计算和存储方面的高效性，它已被广泛应用于大规模多媒体数据的多模态检索。在本文中，我们提出了一种新颖的深度语义多模态哈希网络（DSMHN），用于可扩展的图像 - 文本和视频 - 文本检索。所提出的深度哈希框架利用二维卷积神经网络（CNN）作为骨干网络来捕获图像 - 文本检索的空间信息，而利用三维CNN作为骨干网络来捕获视频 - 文本检索的空间和时间信息。在DSMHN中，通过明确保留模态间相似性和模态内语义标签，联合学习两组特定于模态的哈希函数。具体而言，假设学习到的哈希码对于分类任务应该是最优的，通过将语义标签嵌入到生成的哈希码上，联合训练两个流网络来学习哈希函数。此外，提出了一个统一的深度多模态哈希框架，通过同时利用特征表示学习、模态间相似性保留学习、语义标签保留学习以及使用不同类型损失函数的哈希函数学习，来学习紧凑且高质量的哈希码。所提出的DSMHN方法是一种用于图像 - 文本和视频 - 文本检索的通用且可扩展的深度哈希框架，它可以灵活地与不同类型的损失函数集成。我们在四个广泛使用的多模态检索数据集上对单模态和跨模态检索任务进行了广泛的实验。图像 - 文本和视频 - 文本检索任务的实验结果表明，DSMHN显著优于现有方法。

相似文献

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.用于可扩展图像-文本和视频-文本检索的深度语义多模态哈希网络

IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):1838-1851. doi: 10.1109/TNNLS.2020.2997020. Epub 2023 Apr 4.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.基于层次卷积特征的层次递归神经网络哈希图像检索

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search.用于跨模态相似性搜索的深度语义保持序数哈希

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1429-1440. doi: 10.1109/TNNLS.2018.2869601. Epub 2018 Oct 1.

Scalable Deep Hashing for Large-scale Social Image Retrieval.用于大规模社交图像检索的可扩展深度哈希

IEEE Trans Image Process. 2019 Sep 16. doi: 10.1109/TIP.2019.2940693.

Deep Ordinal Hashing With Spatial Attention.深度序哈希与空间注意力。

IEEE Trans Image Process. 2019 May;28(5):2173-2186. doi: 10.1109/TIP.2018.2883522. Epub 2018 Nov 28.

Semantic Neighbor Graph Hashing for Multimodal Retrieval.基于语义邻居图的哈希的多模态检索。

IEEE Trans Image Process. 2018 Mar;27(3):1405-1417. doi: 10.1109/TIP.2017.2776745. Epub 2017 Nov 22.

Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval.多模态判别式二值嵌入的大规模跨模态检索。

IEEE Trans Image Process. 2016 Oct;25(10):4540-54. doi: 10.1109/TIP.2016.2592800. Epub 2016 Jul 18.

Deep Category-Level and Regularized Hashing With Global Semantic Similarity Learning.基于全局语义相似性学习的深度类别级正则哈希

IEEE Trans Cybern. 2021 Dec;51(12):6240-6252. doi: 10.1109/TCYB.2020.2964993. Epub 2021 Dec 22.

Unsupervised Semantic-Preserving Adversarial Hashing for Image Search.用于图像搜索的无监督语义保持对抗哈希

IEEE Trans Image Process. 2019 Aug;28(8):4032-4044. doi: 10.1109/TIP.2019.2903661. Epub 2019 Mar 13.

Discrete Semantic Alignment Hashing for Cross-Media Retrieval.用于跨媒体检索的离散语义对齐哈希

IEEE Trans Cybern. 2020 Dec;50(12):4896-4907. doi: 10.1109/TCYB.2019.2912644. Epub 2020 Dec 3.

用于可扩展图像-文本和视频-文本检索的深度语义多模态哈希网络

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

作者信息

Jin Lu, Li Zechao, Tang Jinhui

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):1838-1851. doi: 10.1109/TNNLS.2020.2997020. Epub 2023 Apr 4.

DOI:10.1109/TNNLS.2020.2997020

PMID:32502968

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于可扩展图像-文本和视频-文本检索的深度语义多模态哈希网络

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

作者信息

出版信息

相似文献

用于可扩展图像-文本和视频-文本检索的深度语义多模态哈希网络

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

作者信息

出版信息

相似文献