• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于变分多实例图的基于关键词的多样图像检索

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.

作者信息

Zeng Yawen, Wang Yiru, Liao Dongliang, Li Gongfu, Huang Weijie, Xu Jin, Cao Da, Man Hong

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.

DOI:10.1109/TNNLS.2022.3168431
PMID:35482693
Abstract

The task of cross-modal image retrieval has recently attracted considerable research attention. In real-world scenarios, keyword-based queries issued by users are usually short and have broad semantics. Therefore, semantic diversity is as important as retrieval accuracy in such user-oriented services, which improves user experience. However, most typical cross-modal image retrieval methods based on single point query embedding inevitably result in low semantic diversity, while existing diverse retrieval approaches frequently lead to low accuracy due to a lack of cross-modal understanding. To address this challenge, we introduce an end-to-end solution termed variational multiple instance graph (VMIG), in which a continuous semantic space is learned to capture diverse query semantics, and the retrieval task is formulated as a multiple instance learning problems to connect diverse features across modalities. Specifically, a query-guided variational autoencoder is employed to model the continuous semantic space instead of learning a single-point embedding. Afterward, multiple instances of the image and query are obtained by sampling in the continuous semantic space and applying multihead attention, respectively. Thereafter, an instance graph is constructed to remove noisy instances and align cross-modal semantics. Finally, heterogeneous modalities are robustly fused under multiple losses. Extensive experiments on two real-world datasets have well verified the effectiveness of our proposed solution in both retrieval accuracy and semantic diversity.

摘要

跨模态图像检索任务最近引起了相当多的研究关注。在现实世界场景中,用户发出的基于关键词的查询通常很短且语义宽泛。因此,在这种面向用户的服务中,语义多样性与检索准确性同样重要,这有助于提升用户体验。然而,大多数基于单点查询嵌入的典型跨模态图像检索方法不可避免地导致语义多样性较低,而现有的多样化检索方法由于缺乏跨模态理解,常常导致准确性较低。为应对这一挑战,我们引入了一种名为变分多实例图(VMIG)的端到端解决方案,其中学习一个连续语义空间以捕获多样的查询语义,并将检索任务表述为一个多实例学习问题,以连接跨模态的多样特征。具体而言,采用一个查询引导的变分自编码器来对连续语义空间进行建模,而非学习单点嵌入。之后,分别通过在连续语义空间中采样并应用多头注意力,获得图像和查询的多个实例。此后,构建一个实例图以去除噪声实例并对齐跨模态语义。最后,在多种损失下对异构模态进行稳健融合。在两个真实世界数据集上进行的大量实验充分验证了我们提出的解决方案在检索准确性和语义多样性方面的有效性。

相似文献

1
Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.基于变分多实例图的基于关键词的多样图像检索
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.
2
Geometric Matching for Cross-Modal Retrieval.
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5509-5521. doi: 10.1109/TNNLS.2024.3381347. Epub 2025 Feb 28.
3
Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval.用于零样本跨模态检索的具有自监督的三元对抗网络。
IEEE Trans Cybern. 2020 Jun;50(6):2400-2413. doi: 10.1109/TCYB.2019.2928180. Epub 2019 Jul 24.
4
Semantics Disentangling for Cross-Modal Retrieval.用于跨模态检索的语义解缠
IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.
5
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.用于零样本基于草图的图像检索的渐进式跨模态语义网络
IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.
6
Cross-modal semantic autoencoder with embedding consensus.具有嵌入共识的跨模态语义自动编码器。
Sci Rep. 2021 Oct 13;11(1):20319. doi: 10.1038/s41598-021-92750-7.
7
Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.
8
Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.用于无监督跨模态检索的对象级视觉-文本关联图哈希
Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.
9
Relevance feedback for enhancing content based image retrieval and automatic prediction of semantic image features: Application to bone tumor radiographs.基于相关性反馈的图像检索增强和语义图像特征的自动预测:在骨肿瘤 X 光片上的应用。
J Biomed Inform. 2018 Aug;84:123-135. doi: 10.1016/j.jbi.2018.07.002. Epub 2018 Jul 5.
10
Video Moment Retrieval With Noisy Labels.带有噪声标签的视频片段检索
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6779-6791. doi: 10.1109/TNNLS.2022.3212900. Epub 2024 May 2.