Suppr超能文献

记忆、关联与匹配:通过细粒度对齐进行图像-文本检索的嵌入增强

Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.

作者信息

Li Jiangtong, Liu Liu, Niu Li, Zhang Liqing

出版信息

IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.

Abstract

Image-text retrieval aims to capture the semantic correlation between images and texts. Existing image-text retrieval methods can be roughly categorized into embedding learning paradigm and pair-wise learning paradigm. The former paradigm fails to capture the fine-grained correspondence between images and texts. The latter paradigm achieves fine-grained alignment between regions and words, but the high cost of pair-wise computation leads to slow retrieval speed. In this paper, we propose a novel method named MEMBER by using Memory-based EMBedding Enhancement for image-text Retrieval (MEMBER), which introduces global memory banks to enable fine-grained alignment and fusion in embedding learning paradigm. Specifically, we enrich image (resp., text) features with relevant text (resp., image) features stored in the text (resp., image) memory bank. In this way, our model not only accomplishes mutual embedding enhancement across two modalities, but also maintains the retrieval efficiency. Extensive experiments demonstrate that our MEMBER remarkably outperforms state-of-the-art approaches on two large-scale benchmark datasets.

摘要

图像-文本检索旨在捕捉图像与文本之间的语义关联。现有的图像-文本检索方法大致可分为嵌入学习范式和成对学习范式。前一种范式无法捕捉图像与文本之间的细粒度对应关系。后一种范式实现了区域与单词之间的细粒度对齐,但成对计算的高成本导致检索速度较慢。在本文中,我们提出了一种名为MEMBER的新方法,即基于内存的嵌入增强图像-文本检索(MEMBER),该方法引入全局内存库,以在嵌入学习范式中实现细粒度对齐和融合。具体来说,我们用存储在文本(或图像)内存库中的相关文本(或图像)特征来丰富图像(或文本)特征。通过这种方式,我们的模型不仅实现了跨两种模态的相互嵌入增强,还保持了检索效率。大量实验表明,我们的MEMBER在两个大规模基准数据集上显著优于现有最先进的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验