• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于循环注意力网络的模态特定跨模态相似性度量

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

作者信息

Peng Yuxin, Qi Jinwei, Yuan Yuxin

出版信息

IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.

DOI:10.1109/TIP.2018.2852503
PMID:29994397
Abstract

Nowadays, cross-modal retrieval plays an important role to flexibly find useful information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationship, and they contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities, to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Therefore, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing the independent semantic space for each modality, which adopts an endto- end framework to directly generate modality-specific crossmodal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding, which utilizes the learned attention weights for guiding the fine-grained cross-modal correlation learning, and captures the imbalanced and complementary relationship between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia, Pascal Sentence, MS-COCO datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 stateof- the-art methods.

摘要

如今,跨模态检索在灵活地跨不同数据模态查找有用信息方面发挥着重要作用。有效衡量不同数据模态之间的相似性是跨模态检索的关键。图像和文本等不同模态具有不平衡且互补的关系,在描述相同语义时它们包含的信息量不相等。例如,图像通常包含更多文本描述无法展示的细节,反之亦然。现有的基于深度神经网络(DNN)的工作大多为不同模态构建一个公共空间,以找到它们之间的潜在对齐关系,这会丢失其独特的模态特定特征。因此,我们提出了模态特定的跨模态相似性度量(MCSM)方法,通过为每个模态构建独立的语义空间,该方法采用端到端框架直接生成模态特定的跨模态相似性,而无需显式的公共表示。对于每个语义空间,循环注意力网络充分利用一个模态内的模态特定特征,而另一个模态的数据通过基于注意力的联合嵌入投影到这个空间中,这利用学习到的注意力权重来指导细粒度的跨模态相关性学习,并捕捉不同模态之间的不平衡和互补关系。最后,通过对模态特定的跨模态相似性进行自适应融合来探索不同模态语义空间之间的互补性,以执行跨模态检索。在广泛使用的维基百科、帕斯卡句子、微软COCO数据集以及我们构建的大规模XMediaNet数据集上进行的实验验证了我们提出的方法的有效性,优于9种当前最先进的方法。

相似文献

1
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量
IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.
2
Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning.基于深度表示学习的混合DAER跨模态检索
Entropy (Basel). 2023 Aug 16;25(8):1216. doi: 10.3390/e25081216.
3
Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。
IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.
4
HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval.学习一种用于图像-文本检索的分层自适应对齐网络。
Sensors (Basel). 2023 Feb 25;23(5):2559. doi: 10.3390/s23052559.
5
Structure-aware contrastive hashing for unsupervised cross-modal retrieval.用于无监督跨模态检索的结构感知对比哈希
Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.
6
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.通过图表示学习弥合多媒体异质鸿沟进行跨模态检索。
Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.
7
Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.无监督视觉-文本关联学习与细粒度语义对齐。
IEEE Trans Cybern. 2022 May;52(5):3669-3683. doi: 10.1109/TCYB.2020.3015084. Epub 2022 May 19.
8
Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.用于无监督跨模态检索的对象级视觉-文本关联图哈希
Sensors (Basel). 2022 Apr 11;22(8):2921. doi: 10.3390/s22082921.
9
MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.MHTN:用于跨模态检索的模态对抗混合转移网络。
IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.
10
Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval.用于跨模态检索的具有多层相似性聚合的在线非对称度量学习
IEEE Trans Image Process. 2019 Sep;28(9):4299-4312. doi: 10.1109/TIP.2019.2908774. Epub 2019 Apr 2.

引用本文的文献

1
Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning.基于深度表示学习的混合DAER跨模态检索
Entropy (Basel). 2023 Aug 16;25(8):1216. doi: 10.3390/e25081216.
2
Bilinear pooling in video-QA: empirical challenges and motivational drift from neurological parallels.视频问答中的双线性池化:来自神经学相似性的实证挑战与动机漂移
PeerJ Comput Sci. 2022 Jun 3;8:e974. doi: 10.7717/peerj-cs.974. eCollection 2022.