• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度表示学习的混合DAER跨模态检索

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning.

作者信息

Huang Zhao, Hu Haowu, Su Miao

机构信息

Key Laboratory of Modern Teaching Technology, Ministry of Education, Xi'an 710062, China.

School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.

出版信息

Entropy (Basel). 2023 Aug 16;25(8):1216. doi: 10.3390/e25081216.

DOI:10.3390/e25081216
PMID:37628246
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10452985/
Abstract

Information retrieval across multiple modes has attracted much attention from academics and practitioners. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. Most of the existing methods tend to jointly construct a common subspace. However, very little attention has been given to the study of the importance of different fine-grained regions of various modalities. This lack of consideration significantly influences the utilization of the extracted information of multiple modalities. Therefore, this study proposes a novel text-image cross-modal retrieval approach that constructs a dual attention network and an enhanced relation network (DAER). More specifically, the dual attention network tends to precisely extract fine-grained weight information from text and images, while the enhanced relation network is used to expand the differences between different categories of data in order to improve the computational accuracy of similarity. The comprehensive experimental results on three widely-used major datasets (i.e., Wikipedia, Pascal Sentence, and XMediaNet) show that our proposed approach is effective and superior to existing cross-modal retrieval methods.

摘要

跨多种模态的信息检索已引起学术界和从业者的广泛关注。跨模态检索的一个关键挑战是消除不同模式之间的异构差距。现有的大多数方法倾向于联合构建一个公共子空间。然而,对于各种模态不同细粒度区域的重要性研究却很少受到关注。这种忽视显著影响了多模态提取信息的利用。因此,本研究提出了一种新颖的文本-图像跨模态检索方法,即构建一个双注意力网络和一个增强关系网络(DAER)。更具体地说,双注意力网络倾向于从文本和图像中精确提取细粒度权重信息,而增强关系网络用于扩大不同类别数据之间的差异,以提高相似度计算的准确性。在三个广泛使用的主要数据集(即维基百科、帕斯卡句子和XMediaNet)上的综合实验结果表明,我们提出的方法是有效的,并且优于现有的跨模态检索方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/b8c4b065dfd5/entropy-25-01216-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/56db73afa2d5/entropy-25-01216-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/fdef32b12863/entropy-25-01216-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/260b4735afd2/entropy-25-01216-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/af1fde725b22/entropy-25-01216-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/8c6ffe41ab8a/entropy-25-01216-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/fb4313649e81/entropy-25-01216-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/b8c4b065dfd5/entropy-25-01216-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/56db73afa2d5/entropy-25-01216-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/fdef32b12863/entropy-25-01216-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/260b4735afd2/entropy-25-01216-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/af1fde725b22/entropy-25-01216-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/8c6ffe41ab8a/entropy-25-01216-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/fb4313649e81/entropy-25-01216-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfe/10452985/b8c4b065dfd5/entropy-25-01216-g007.jpg

相似文献

1
Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning.基于深度表示学习的混合DAER跨模态检索
Entropy (Basel). 2023 Aug 16;25(8):1216. doi: 10.3390/e25081216.
2
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量
IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.
3
MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.MHTN:用于跨模态检索的模态对抗混合转移网络。
IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.
4
Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。
IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.
5
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
6
Improvement of deep cross-modal retrieval by generating real-valued representation.通过生成实值表示改进深度跨模态检索。
PeerJ Comput Sci. 2021 Apr 27;7:e491. doi: 10.7717/peerj-cs.491. eCollection 2021.
7
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.通过图表示学习弥合多媒体异质鸿沟进行跨模态检索。
Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.
8
SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval.SMAN:用于跨模态图像-文本检索的堆叠多模态注意力网络。
IEEE Trans Cybern. 2022 Feb;52(2):1086-1097. doi: 10.1109/TCYB.2020.2985716. Epub 2022 Feb 16.
9
[Cross-modal retrieval method for thyroid ultrasound image and text based on generative adversarial network].基于生成对抗网络的甲状腺超声图像与文本跨模态检索方法
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2020 Aug 25;37(4):641-651. doi: 10.7507/1001-5515.201812042.
10
Cross-modal dual subspace learning with adversarial network.跨模态对偶子空间学习的对抗网络。
Neural Netw. 2020 Jun;126:132-142. doi: 10.1016/j.neunet.2020.03.015. Epub 2020 Mar 19.

本文引用的文献

1
Cross-Modal Object Detection Based on a Knowledge Update.基于知识更新的跨模态目标检测。
Sensors (Basel). 2022 Feb 10;22(4):1338. doi: 10.3390/s22041338.
2
Attention in Natural Language Processing.自然语言处理中的注意力机制。
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4291-4308. doi: 10.1109/TNNLS.2020.3019893. Epub 2021 Oct 5.
3
MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism.MAVA:基于跨媒体双向注意力机制的多层次自适应视觉文本对齐
IEEE Trans Image Process. 2019 Nov 22. doi: 10.1109/TIP.2019.2952085.
4
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量
IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.
5
Structured AutoEncoders for Subspace Clustering.用于子空间聚类的结构化自动编码器
IEEE Trans Image Process. 2018 Jun 18. doi: 10.1109/TIP.2018.2848470.
6
Linear Subspace Ranking Hashing for Cross-Modal Retrieval.线性子空间排序哈希用于跨模态检索。
IEEE Trans Pattern Anal Mach Intell. 2017 Sep;39(9):1825-1838. doi: 10.1109/TPAMI.2016.2610969. Epub 2016 Sep 19.
7
Supervised Matrix Factorization Hashing for Cross-Modal Retrieval.监督矩阵分解哈希用于跨模态检索。
IEEE Trans Image Process. 2016 Jul;25(7):3157-3166. doi: 10.1109/TIP.2016.2564638. Epub 2016 May 6.
8
Multi-View Discriminant Analysis.多视图判别分析。
IEEE Trans Pattern Anal Mach Intell. 2016 Jan;38(1):188-94. doi: 10.1109/TPAMI.2015.2435740.
9
Canonical correlation analysis: an overview with application to learning methods.典型相关分析:概述及其在学习方法中的应用
Neural Comput. 2004 Dec;16(12):2639-64. doi: 10.1162/0899766042321814.