深度关系嵌入的跨模态检索。

Deep Relation Embedding for Cross-Modal Retrieval.

出版信息

IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.

DOI:10.1109/TIP.2020.3038354

Abstract

Cross-modal retrieval aims to identify relevant data across different modalities. In this work, we are dedicated to cross-modal retrieval between images and text sentences, which is formulated into similarity measurement for each image-text pair. To this end, we propose a Cross-modal Relation Guided Network (CRGN) to embed image and text into a latent feature space. The CRGN model uses GRU to extract text feature and ResNet model to learn the globally guided image feature. Based on the global feature guiding and sentence generation learning, the relation between image regions can be modeled. The final image embedding is generated by a relation embedding module with an attention mechanism. With the image embeddings and text embeddings, we conduct cross-modal retrieval based on the cosine similarity. The learned embedding space well captures the inherent relevance between image and text. We evaluate our approach with extensive experiments on two public benchmark datasets, i.e., MS-COCO and Flickr30K. Experimental results demonstrate that our approach achieves better or comparable performance with the state-of-the-art methods with notable efficiency.

摘要

跨模态检索旨在识别不同模态之间的相关数据。在这项工作中，我们专注于图像和文本句子之间的跨模态检索，将其公式化为每个图像-文本对的相似性度量。为此，我们提出了一种跨模态关系引导网络（CRGN），将图像和文本嵌入到潜在特征空间中。CRGN 模型使用 GRU 提取文本特征，使用 ResNet 模型学习全局引导图像特征。基于全局特征引导和句子生成学习，可以对图像区域之间的关系进行建模。最后通过具有注意力机制的关系嵌入模块生成图像嵌入。使用图像嵌入和文本嵌入，我们基于余弦相似度进行跨模态检索。学习到的嵌入空间很好地捕获了图像和文本之间的内在相关性。我们在两个公共基准数据集 MS-COCO 和 Flickr30K 上进行了广泛的实验评估。实验结果表明，我们的方法在效率方面具有显著优势，并且在性能上可与最先进的方法相媲美。

相似文献

Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。

IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.

Topic-Oriented Image Captioning Based on Order-Embedding.基于序嵌入的主题导向图像字幕生成

IEEE Trans Image Process. 2019 Jun;28(6):2743-2754. doi: 10.1109/TIP.2018.2889922. Epub 2018 Dec 27.

Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成

IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量

IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.

Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.通过图表示学习弥合多媒体异质鸿沟进行跨模态检索。

Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval.论视觉语义嵌入网络在图像到文本信息检索中的局限性

J Imaging. 2021 Jul 26;7(8):125. doi: 10.3390/jimaging7080125.

Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.记忆、关联与匹配：通过细粒度对齐进行图像-文本检索的嵌入增强

IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.

Learning to Embed Semantic Similarity for Joint Image-Text Retrieval.学习为联合图像-文本检索嵌入语义相似度

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10252-10260. doi: 10.1109/TPAMI.2021.3132163. Epub 2022 Nov 7.

Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited.联合特征合成与嵌入：重新审视对抗性跨模态检索

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3030-3047. doi: 10.1109/TPAMI.2020.3045530. Epub 2022 May 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度关系嵌入的跨模态检索。

Deep Relation Embedding for Cross-Modal Retrieval.

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献