Suppr超能文献

深度关系嵌入的跨模态检索。

Deep Relation Embedding for Cross-Modal Retrieval.

出版信息

IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.

Abstract

Cross-modal retrieval aims to identify relevant data across different modalities. In this work, we are dedicated to cross-modal retrieval between images and text sentences, which is formulated into similarity measurement for each image-text pair. To this end, we propose a Cross-modal Relation Guided Network (CRGN) to embed image and text into a latent feature space. The CRGN model uses GRU to extract text feature and ResNet model to learn the globally guided image feature. Based on the global feature guiding and sentence generation learning, the relation between image regions can be modeled. The final image embedding is generated by a relation embedding module with an attention mechanism. With the image embeddings and text embeddings, we conduct cross-modal retrieval based on the cosine similarity. The learned embedding space well captures the inherent relevance between image and text. We evaluate our approach with extensive experiments on two public benchmark datasets, i.e., MS-COCO and Flickr30K. Experimental results demonstrate that our approach achieves better or comparable performance with the state-of-the-art methods with notable efficiency.

摘要

跨模态检索旨在识别不同模态之间的相关数据。在这项工作中,我们专注于图像和文本句子之间的跨模态检索,将其公式化为每个图像-文本对的相似性度量。为此,我们提出了一种跨模态关系引导网络(CRGN),将图像和文本嵌入到潜在特征空间中。CRGN 模型使用 GRU 提取文本特征,使用 ResNet 模型学习全局引导图像特征。基于全局特征引导和句子生成学习,可以对图像区域之间的关系进行建模。最后通过具有注意力机制的关系嵌入模块生成图像嵌入。使用图像嵌入和文本嵌入,我们基于余弦相似度进行跨模态检索。学习到的嵌入空间很好地捕获了图像和文本之间的内在相关性。我们在两个公共基准数据集 MS-COCO 和 Flickr30K 上进行了广泛的实验评估。实验结果表明,我们的方法在效率方面具有显著优势,并且在性能上可与最先进的方法相媲美。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验