• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

记忆、关联与匹配:通过细粒度对齐进行图像-文本检索的嵌入增强

Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.

作者信息

Li Jiangtong, Liu Liu, Niu Li, Zhang Liqing

出版信息

IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.

DOI:10.1109/TIP.2021.3123553
PMID:34739375
Abstract

Image-text retrieval aims to capture the semantic correlation between images and texts. Existing image-text retrieval methods can be roughly categorized into embedding learning paradigm and pair-wise learning paradigm. The former paradigm fails to capture the fine-grained correspondence between images and texts. The latter paradigm achieves fine-grained alignment between regions and words, but the high cost of pair-wise computation leads to slow retrieval speed. In this paper, we propose a novel method named MEMBER by using Memory-based EMBedding Enhancement for image-text Retrieval (MEMBER), which introduces global memory banks to enable fine-grained alignment and fusion in embedding learning paradigm. Specifically, we enrich image (resp., text) features with relevant text (resp., image) features stored in the text (resp., image) memory bank. In this way, our model not only accomplishes mutual embedding enhancement across two modalities, but also maintains the retrieval efficiency. Extensive experiments demonstrate that our MEMBER remarkably outperforms state-of-the-art approaches on two large-scale benchmark datasets.

摘要

图像-文本检索旨在捕捉图像与文本之间的语义关联。现有的图像-文本检索方法大致可分为嵌入学习范式和成对学习范式。前一种范式无法捕捉图像与文本之间的细粒度对应关系。后一种范式实现了区域与单词之间的细粒度对齐,但成对计算的高成本导致检索速度较慢。在本文中,我们提出了一种名为MEMBER的新方法,即基于内存的嵌入增强图像-文本检索(MEMBER),该方法引入全局内存库,以在嵌入学习范式中实现细粒度对齐和融合。具体来说,我们用存储在文本(或图像)内存库中的相关文本(或图像)特征来丰富图像(或文本)特征。通过这种方式,我们的模型不仅实现了跨两种模态的相互嵌入增强,还保持了检索效率。大量实验表明,我们的MEMBER在两个大规模基准数据集上显著优于现有最先进的方法。

相似文献

1
Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.记忆、关联与匹配:通过细粒度对齐进行图像-文本检索的嵌入增强
IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.
2
Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.
3
Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training.高效的基于令牌的图像-文本检索与一致的多模态对比训练。
IEEE Trans Image Process. 2023;32:3622-3633. doi: 10.1109/TIP.2023.3286710. Epub 2023 Jul 3.
4
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.基于知识蒸馏的潜在空间语义监督用于跨模态检索
IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.
5
Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。
IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.
6
A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.一种用于跨模态遥感图像检索的、特定于聚合多尺度信息的细粒度语义对齐方法。
Sensors (Basel). 2023 Oct 13;23(20):8437. doi: 10.3390/s23208437.
7
HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval.学习一种用于图像-文本检索的分层自适应对齐网络。
Sensors (Basel). 2023 Feb 25;23(5):2559. doi: 10.3390/s23052559.
8
Image-Specific Information Suppression and Implicit Local Alignment for Text-Based Person Search.基于文本的行人搜索中的图像特定信息抑制与隐式局部对齐
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17973-17986. doi: 10.1109/TNNLS.2023.3310118. Epub 2024 Dec 2.
9
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
10
CLIP-Driven Fine-Grained Text-Image Person Re-Identification.基于CLIP的细粒度文本-图像人物重识别
IEEE Trans Image Process. 2023;32:6032-6046. doi: 10.1109/TIP.2023.3327924. Epub 2023 Nov 7.