基于变分多实例图的基于关键词的多样图像检索

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.

作者信息

Zeng Yawen, Wang Yiru, Liao Dongliang, Li Gongfu, Huang Weijie, Xu Jin, Cao Da, Man Hong

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.

DOI:10.1109/TNNLS.2022.3168431

Abstract

The task of cross-modal image retrieval has recently attracted considerable research attention. In real-world scenarios, keyword-based queries issued by users are usually short and have broad semantics. Therefore, semantic diversity is as important as retrieval accuracy in such user-oriented services, which improves user experience. However, most typical cross-modal image retrieval methods based on single point query embedding inevitably result in low semantic diversity, while existing diverse retrieval approaches frequently lead to low accuracy due to a lack of cross-modal understanding. To address this challenge, we introduce an end-to-end solution termed variational multiple instance graph (VMIG), in which a continuous semantic space is learned to capture diverse query semantics, and the retrieval task is formulated as a multiple instance learning problems to connect diverse features across modalities. Specifically, a query-guided variational autoencoder is employed to model the continuous semantic space instead of learning a single-point embedding. Afterward, multiple instances of the image and query are obtained by sampling in the continuous semantic space and applying multihead attention, respectively. Thereafter, an instance graph is constructed to remove noisy instances and align cross-modal semantics. Finally, heterogeneous modalities are robustly fused under multiple losses. Extensive experiments on two real-world datasets have well verified the effectiveness of our proposed solution in both retrieval accuracy and semantic diversity.

摘要

跨模态图像检索任务最近引起了相当多的研究关注。在现实世界场景中，用户发出的基于关键词的查询通常很短且语义宽泛。因此，在这种面向用户的服务中，语义多样性与检索准确性同样重要，这有助于提升用户体验。然而，大多数基于单点查询嵌入的典型跨模态图像检索方法不可避免地导致语义多样性较低，而现有的多样化检索方法由于缺乏跨模态理解，常常导致准确性较低。为应对这一挑战，我们引入了一种名为变分多实例图（VMIG）的端到端解决方案，其中学习一个连续语义空间以捕获多样的查询语义，并将检索任务表述为一个多实例学习问题，以连接跨模态的多样特征。具体而言，采用一个查询引导的变分自编码器来对连续语义空间进行建模，而非学习单点嵌入。之后，分别通过在连续语义空间中采样并应用多头注意力，获得图像和查询的多个实例。此后，构建一个实例图以去除噪声实例并对齐跨模态语义。最后，在多种损失下对异构模态进行稳健融合。在两个真实世界数据集上进行的大量实验充分验证了我们提出的解决方案在检索准确性和语义多样性方面的有效性。

相似文献

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.基于变分多实例图的基于关键词的多样图像检索

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.

Geometric Matching for Cross-Modal Retrieval.

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5509-5521. doi: 10.1109/TNNLS.2024.3381347. Epub 2025 Feb 28.

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval.用于零样本跨模态检索的具有自监督的三元对抗网络。

IEEE Trans Cybern. 2020 Jun;50(6):2400-2413. doi: 10.1109/TCYB.2019.2928180. Epub 2019 Jul 24.

Semantics Disentangling for Cross-Modal Retrieval.用于跨模态检索的语义解缠

IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.用于零样本基于草图的图像检索的渐进式跨模态语义网络

IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.

Cross-modal semantic autoencoder with embedding consensus.具有嵌入共识的跨模态语义自动编码器。

Sci Rep. 2021 Oct 13;11(1):20319. doi: 10.1038/s41598-021-92750-7.

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.用于无监督跨模态检索的对象级视觉-文本关联图哈希

Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.

Relevance feedback for enhancing content based image retrieval and automatic prediction of semantic image features: Application to bone tumor radiographs.基于相关性反馈的图像检索增强和语义图像特征的自动预测：在骨肿瘤 X 光片上的应用。

J Biomed Inform. 2018 Aug;84:123-135. doi: 10.1016/j.jbi.2018.07.002. Epub 2018 Jul 5.

Video Moment Retrieval With Noisy Labels.带有噪声标签的视频片段检索

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6779-6791. doi: 10.1109/TNNLS.2022.3212900. Epub 2024 May 2.

基于变分多实例图的基于关键词的多样图像检索

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.

作者信息

Zeng Yawen, Wang Yiru, Liao Dongliang, Li Gongfu, Huang Weijie, Xu Jin, Cao Da, Man Hong

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.

DOI:10.1109/TNNLS.2022.3168431

PMID:35482693

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于变分多实例图的基于关键词的多样图像检索

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.

作者信息

出版信息

相似文献

基于变分多实例图的基于关键词的多样图像检索

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.

作者信息

出版信息

相似文献