基于组合查询的图像检索的几何敏感跨模态推理

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.

作者信息

Zhang Feifei, Xu Mingliang, Xu Changsheng

出版信息

IEEE Trans Image Process. 2022;31:1000-1011. doi: 10.1109/TIP.2021.3138302. Epub 2022 Jan 10.

DOI:10.1109/TIP.2021.3138302

Abstract

Composed Query Based Image Retrieval (CQBIR) aims at retrieving images relevant to a composed query containing a reference image with a requested modification expressed via a textual sentence. Compared with the conventional image retrieval which takes one modality as query to retrieve relevant data of another modality, CQBIR poses great challenge over the semantic gap between the reference image and modification text in the composed query. To solve the challenge, previous methods either resort to feature composition that cannot model interactions in the query or explore inter-modal attention while ignoring the spatial structure and visual-semantic relationship. In this paper, we propose a geometry sensitive cross-modal reasoning network for CQBIR by jointly modeling the geometric information of the image and the visual-semantic relationship between the reference image and modification text in the query. Specifically, it contains two key components: a geometry sensitive inter-modal attention module (GS-IMA) and a text-guided visual reasoning module (TG-VR). The GS-IMA introduces the spatial structure into the inter-modal attention in both implicit and explicit manners. The TG-VR models the unequal semantics not included in the reference image to guide further visual reasoning. As a result, our method can learn effective feature for the composed query which does not exhibit literal alignment. Comprehensive experimental results on three standard benchmarks demonstrate that the proposed model performs favorably against state-of-the-art methods.

摘要

基于组合查询的图像检索（CQBIR）旨在检索与组合查询相关的图像，该组合查询包含一幅参考图像以及通过文本句子表达的所需修改。与传统的以一种模态作为查询来检索另一种模态的相关数据的图像检索相比，CQBIR在组合查询中的参考图像和修改文本之间的语义鸿沟上提出了巨大挑战。为了解决这一挑战，先前的方法要么诉诸于无法对查询中的交互进行建模的特征组合，要么探索跨模态注意力，却忽略了空间结构和视觉语义关系。在本文中，我们通过联合对图像的几何信息以及查询中参考图像与修改文本之间的视觉语义关系进行建模，提出了一种用于CQBIR的几何敏感跨模态推理网络。具体来说，它包含两个关键组件：一个几何敏感跨模态注意力模块（GS - IMA）和一个文本引导视觉推理模块（TG - VR）。GS - IMA以隐式和显式方式将空间结构引入跨模态注意力中。TG - VR对参考图像中未包含的不等语义进行建模，以指导进一步的视觉推理。结果，我们的方法可以为不呈现文字对齐的组合查询学习有效的特征。在三个标准基准上的综合实验结果表明，所提出的模型优于现有方法。

相似文献

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.基于组合查询的图像检索的几何敏感跨模态推理

IEEE Trans Image Process. 2022;31:1000-1011. doi: 10.1109/TIP.2021.3138302. Epub 2022 Jan 10.

Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment.通过显式擦除和语义对齐补充实现合成图像检索

IEEE Trans Image Process. 2022;31:5976-5988. doi: 10.1109/TIP.2022.3204213. Epub 2022 Sep 15.

Hierarchical matching and reasoning for multi-query image retrieval.多层次匹配与推理的多查询图像检索。

Neural Netw. 2024 May;173:106200. doi: 10.1016/j.neunet.2024.106200. Epub 2024 Feb 22.

Boosting cross-modal retrieval in remote sensing via a novel unified attention network.通过一种新颖的统一注意力网络提升遥感的跨模态检索。

Neural Netw. 2024 Dec;180:106718. doi: 10.1016/j.neunet.2024.106718. Epub 2024 Sep 11.

An effective spatial relational reasoning networks for visual question answering.用于视觉问答的有效的空间关系推理网络。

PLoS One. 2022 Nov 28;17(11):e0277693. doi: 10.1371/journal.pone.0277693. eCollection 2022.

Visual context learning based on textual knowledge for image-text retrieval.基于文本知识的视觉上下文学习用于图像-文本检索。

Neural Netw. 2022 Aug;152:434-449. doi: 10.1016/j.neunet.2022.05.008. Epub 2022 May 18.

Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。

IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量

IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.

Semantics Disentangling for Cross-Modal Retrieval.用于跨模态检索的语义解缠

IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.用于图像-文本匹配的具有语义一致性的跨模态注意力机制

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5412-5425. doi: 10.1109/TNNLS.2020.2967597. Epub 2020 Nov 30.

基于组合查询的图像检索的几何敏感跨模态推理

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.

作者信息

Zhang Feifei, Xu Mingliang, Xu Changsheng

出版信息

IEEE Trans Image Process. 2022;31:1000-1011. doi: 10.1109/TIP.2021.3138302. Epub 2022 Jan 10.

DOI:10.1109/TIP.2021.3138302

PMID:34971533

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于组合查询的图像检索的几何敏感跨模态推理

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.

作者信息

出版信息

相似文献

基于组合查询的图像检索的几何敏感跨模态推理

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.

作者信息

出版信息

相似文献