通过对齐跨模态记忆实现少样本图像与句子匹配

Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory.

作者信息

Huang Yan, Wang Jingdong, Wang Liang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2968-2983. doi: 10.1109/TPAMI.2021.3052490. Epub 2022 May 5.

DOI:10.1109/TPAMI.2021.3052490

Abstract

Image and sentence matching has attracted much attention recently, and many effective methods have been proposed to deal with it. But even the current state-of-the-arts still cannot well associate those challenging pairs of images and sentences containing few-shot content in their regions and words. In fact, such a few-shot matching problem is seldom studied and has become a bottleneck for further performance improvement in real-world applications. In this work, we formulate this challenging problem as few-shot image and sentence matching, and accordingly propose an Aligned Cross-Modal Memory (ACMM) model to deal with it. The model can not only softly align few-shot regions and words in a weakly-supervised manner, but also persistently store and update cross-modal prototypical representations of few-shot classes as references, without using any groundtruth region-word correspondence. The model can also adaptively balance the relative importance between few-shot and common content in the image and sentence, which leads to better measurement of overall similarity. We perform extensive experiments in terms of both few-shot and conventional image and sentence matching, and demonstrate the effectiveness of the proposed model by achieving the state-of-the-art results on two public benchmark datasets.

摘要

图像与句子匹配最近备受关注，并且已经提出了许多有效方法来处理它。但即便当前最先进的方法仍无法很好地关联那些在区域和词汇中包含少样本内容的具有挑战性的图像与句子对。事实上，这种少样本匹配问题很少被研究，并且已成为实际应用中进一步提升性能的瓶颈。在这项工作中，我们将这个具有挑战性的问题表述为少样本图像与句子匹配，并据此提出了一种对齐跨模态记忆（ACMM）模型来处理它。该模型不仅能够以弱监督的方式对少样本区域和词汇进行软对齐，还能持续存储和更新少样本类别的跨模态原型表示作为参考，而无需使用任何真实的区域 - 词汇对应关系。该模型还能自适应地平衡图像和句子中少样本内容与常见内容之间的相对重要性，从而实现对整体相似度的更好度量。我们在少样本以及传统图像与句子匹配方面进行了广泛实验，并通过在两个公共基准数据集上取得最先进的结果证明了所提模型的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过对齐跨模态记忆实现少样本图像与句子匹配

Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory.

作者信息

出版信息

相似文献

通过对齐跨模态记忆实现少样本图像与句子匹配

Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory.

作者信息

出版信息

相似文献