• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过对齐跨模态记忆实现少样本图像与句子匹配

Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory.

作者信息

Huang Yan, Wang Jingdong, Wang Liang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2968-2983. doi: 10.1109/TPAMI.2021.3052490. Epub 2022 May 5.

DOI:10.1109/TPAMI.2021.3052490
PMID:33460367
Abstract

Image and sentence matching has attracted much attention recently, and many effective methods have been proposed to deal with it. But even the current state-of-the-arts still cannot well associate those challenging pairs of images and sentences containing few-shot content in their regions and words. In fact, such a few-shot matching problem is seldom studied and has become a bottleneck for further performance improvement in real-world applications. In this work, we formulate this challenging problem as few-shot image and sentence matching, and accordingly propose an Aligned Cross-Modal Memory (ACMM) model to deal with it. The model can not only softly align few-shot regions and words in a weakly-supervised manner, but also persistently store and update cross-modal prototypical representations of few-shot classes as references, without using any groundtruth region-word correspondence. The model can also adaptively balance the relative importance between few-shot and common content in the image and sentence, which leads to better measurement of overall similarity. We perform extensive experiments in terms of both few-shot and conventional image and sentence matching, and demonstrate the effectiveness of the proposed model by achieving the state-of-the-art results on two public benchmark datasets.

摘要

图像与句子匹配最近备受关注,并且已经提出了许多有效方法来处理它。但即便当前最先进的方法仍无法很好地关联那些在区域和词汇中包含少样本内容的具有挑战性的图像与句子对。事实上,这种少样本匹配问题很少被研究,并且已成为实际应用中进一步提升性能的瓶颈。在这项工作中,我们将这个具有挑战性的问题表述为少样本图像与句子匹配,并据此提出了一种对齐跨模态记忆(ACMM)模型来处理它。该模型不仅能够以弱监督的方式对少样本区域和词汇进行软对齐,还能持续存储和更新少样本类别的跨模态原型表示作为参考,而无需使用任何真实的区域 - 词汇对应关系。该模型还能自适应地平衡图像和句子中少样本内容与常见内容之间的相对重要性,从而实现对整体相似度的更好度量。我们在少样本以及传统图像与句子匹配方面进行了广泛实验,并通过在两个公共基准数据集上取得最先进的结果证明了所提模型的有效性。

相似文献

1
Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory.通过对齐跨模态记忆实现少样本图像与句子匹配
IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2968-2983. doi: 10.1109/TPAMI.2021.3052490. Epub 2022 May 5.
2
Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model.Fs-DSM:通过跨模态注意力图模型实现的少样本图表-句子匹配
IEEE Trans Image Process. 2021;30:8102-8115. doi: 10.1109/TIP.2021.3112294. Epub 2021 Sep 27.
3
Image and Sentence Matching via Semantic Concepts and Order Learning.基于语义概念和顺序学习的图像-句子匹配。
IEEE Trans Pattern Anal Mach Intell. 2020 Mar;42(3):636-650. doi: 10.1109/TPAMI.2018.2883466. Epub 2018 Nov 28.
4
Cross-Modal Attention With Semantic Consistence for Image-Text Matching.用于图像-文本匹配的具有语义一致性的跨模态注意力机制
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5412-5425. doi: 10.1109/TNNLS.2020.2967597. Epub 2020 Nov 30.
5
Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching.用于图像-句子匹配的解耦跨模态短语注意力网络
IEEE Trans Image Process. 2024;33:1326-1337. doi: 10.1109/TIP.2022.3197972. Epub 2024 Feb 13.
6
Unpaired Image-Text Matching via Multimodal Aligned Conceptual Knowledge.通过多模态对齐概念知识实现非配对图像-文本匹配
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5160-5176. doi: 10.1109/TPAMI.2024.3432552.
7
Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation.用于三模态图像少样本语义分割的自增强混合注意力网络
Sensors (Basel). 2023 Jul 22;23(14):6612. doi: 10.3390/s23146612.
8
Learning Aligned Image-Text Representations Using Graph Attentive Relational Network.使用图注意力关系网络学习对齐的图像-文本表示
IEEE Trans Image Process. 2021;30:1840-1852. doi: 10.1109/TIP.2020.3048627. Epub 2021 Jan 18.
9
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成
IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.
10
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.用于零样本基于草图的图像检索的渐进式跨模态语义网络
IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.