• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于细粒度图像-文本检索的关系聚合跨图相关性学习

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.

作者信息

Peng Shu-Juan, He Yi, Liu Xin, Cheung Yiu-Ming, Xu Xing, Cui Zhen

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.

DOI:10.1109/TNNLS.2022.3188569
PMID:35830398
Abstract

Fine-grained image-text retrieval has been a hot research topic to bridge the vision and languages, and its main challenge is how to learn the semantic correspondence across different modalities. The existing methods mainly focus on learning the global semantic correspondence or intramodal relation correspondence in separate data representations, but which rarely consider the intermodal relation that interactively provide complementary hints for fine-grained semantic correlation learning. To address this issue, we propose a relation-aggregated cross-graph (RACG) model to explicitly learn the fine-grained semantic correspondence by aggregating both intramodal and intermodal relations, which can be well utilized to guide the feature correspondence learning process. More specifically, we first build semantic-embedded graph to explore both fine-grained objects and their relations of different media types, which aim not only to characterize the object appearance in each modality, but also to capture the intrinsic relation information to differentiate intramodal discrepancies. Then, a cross-graph relation encoder is newly designed to explore the intermodal relation across different modalities, which can mutually boost the cross-modal correlations to learn more precise intermodal dependencies. Besides, the feature reconstruction module and multihead similarity alignment are efficiently leveraged to optimize the node-level semantic correspondence, whereby the relation-aggregated cross-modal embeddings between image and text are discriminatively obtained to benefit various image-text retrieval tasks with high retrieval performance. Extensive experiments evaluated on benchmark datasets quantitatively and qualitatively verify the advantages of the proposed framework for fine-grained image-text retrieval and show its competitive performance with the state of the arts.

摘要

细粒度图像-文本检索一直是连接视觉和语言的热门研究课题,其主要挑战在于如何学习不同模态之间的语义对应关系。现有方法主要集中在单独的数据表示中学习全局语义对应或模态内关系对应,但很少考虑交互地为细粒度语义相关学习提供互补线索的模态间关系。为了解决这个问题,我们提出了一种关系聚合交叉图(RACG)模型,通过聚合模态内和模态间关系来显式学习细粒度语义对应关系,这可以很好地用于指导特征对应学习过程。更具体地说,我们首先构建语义嵌入图来探索不同媒体类型的细粒度对象及其关系,其目的不仅是在每个模态中表征对象外观,而且是捕获内在关系信息以区分模态内差异。然后,新设计了一个交叉图关系编码器来探索不同模态之间的模态间关系,它可以相互促进跨模态相关性以学习更精确的模态间依赖关系。此外,有效地利用特征重建模块和多头相似性对齐来优化节点级语义对应,从而有区别地获得图像和文本之间的关系聚合跨模态嵌入,以有利于具有高检索性能的各种图像-文本检索任务。在基准数据集上进行的大量实验从定量和定性方面验证了所提出框架在细粒度图像-文本检索方面的优势,并展示了其与现有技术相比的竞争性能。

相似文献

1
Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.
2
Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image-Text Matching.用于细粒度图像-文本匹配的学习关系增强语义图
IEEE Trans Cybern. 2024 Feb;54(2):948-961. doi: 10.1109/TCYB.2022.3179020. Epub 2024 Jan 17.
3
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.基于知识蒸馏的潜在空间语义监督用于跨模态检索
IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.
4
Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.用于无监督跨模态检索的对象级视觉-文本关联图哈希
Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.
5
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
6
Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.记忆、关联与匹配:通过细粒度对齐进行图像-文本检索的嵌入增强
IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.
7
Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training.高效的基于令牌的图像-文本检索与一致的多模态对比训练。
IEEE Trans Image Process. 2023;32:3622-3633. doi: 10.1109/TIP.2023.3286710. Epub 2023 Jul 3.
8
Image-Specific Information Suppression and Implicit Local Alignment for Text-Based Person Search.基于文本的行人搜索中的图像特定信息抑制与隐式局部对齐
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17973-17986. doi: 10.1109/TNNLS.2023.3310118. Epub 2024 Dec 2.
9
Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.无监督视觉-文本关联学习与细粒度语义对齐。
IEEE Trans Cybern. 2022 May;52(5):3669-3683. doi: 10.1109/TCYB.2020.3015084. Epub 2022 May 19.
10
CLIP-Driven Fine-Grained Text-Image Person Re-Identification.基于CLIP的细粒度文本-图像人物重识别
IEEE Trans Image Process. 2023;32:6032-6046. doi: 10.1109/TIP.2023.3327924. Epub 2023 Nov 7.