• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习一种用于图像-文本检索的分层自适应对齐网络。

HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval.

机构信息

School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China.

Shandong Provincial Key Laboratory of Digital Media Technology, Jinan 250014, China.

出版信息

Sensors (Basel). 2023 Feb 25;23(5):2559. doi: 10.3390/s23052559.

DOI:10.3390/s23052559
PMID:36904776
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10007124/
Abstract

Image-text retrieval aims to search related results of one modality by querying another modality. As a fundamental and key problem in cross-modal retrieval, image-text retrieval is still a challenging problem owing to the complementary and imbalanced relationship between different modalities (i.e., Image and Text) and different granularities (i.e., Global-level and Local-level). However, existing works have not fully considered how to effectively mine and fuse the complementarities between images and texts at different granularities. Therefore, in this paper, we propose a hierarchical adaptive alignment network, whose contributions are as follows: (1) We propose a multi-level alignment network, which simultaneously mines global-level and local-level data, thereby enhancing the semantic association between images and texts. (2) We propose an adaptive weighted loss to flexibly optimize the image-text similarity with two stages in a unified framework. (3) We conduct extensive experiments on three public benchmark datasets (Corel 5K, Pascal Sentence, and Wiki) and compare them with eleven state-of-the-art methods. The experimental results thoroughly verify the effectiveness of our proposed method.

摘要

图像-文本检索旨在通过查询另一种模态来搜索一种模态的相关结果。作为跨模态检索中的一个基本和关键问题,由于不同模态(即图像和文本)和不同粒度(即全局级和局部级)之间的互补和不平衡关系,图像-文本检索仍然是一个具有挑战性的问题。然而,现有工作并没有充分考虑如何有效地挖掘和融合不同粒度的图像和文本之间的互补性。因此,在本文中,我们提出了一种分层自适应对齐网络,其贡献如下:(1)我们提出了一种多级对齐网络,它同时挖掘全局级和局部级的数据,从而增强了图像和文本之间的语义关联。(2)我们提出了一种自适应加权损失函数,可以在统一框架的两个阶段中灵活地优化图像-文本相似度。(3)我们在三个公共基准数据集(Corel 5K、Pascal Sentence 和 Wiki)上进行了广泛的实验,并与十一种最先进的方法进行了比较。实验结果充分验证了我们提出的方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/14c308054c5d/sensors-23-02559-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/1afbc225f2ac/sensors-23-02559-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/11529b330b3b/sensors-23-02559-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/7aedeb8a885a/sensors-23-02559-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/5e3eed0c94d8/sensors-23-02559-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/cd67580bb970/sensors-23-02559-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/5f519b249d4f/sensors-23-02559-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/9d65657089c9/sensors-23-02559-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/e0e85ef5cb13/sensors-23-02559-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/14c308054c5d/sensors-23-02559-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/1afbc225f2ac/sensors-23-02559-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/11529b330b3b/sensors-23-02559-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/7aedeb8a885a/sensors-23-02559-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/5e3eed0c94d8/sensors-23-02559-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/cd67580bb970/sensors-23-02559-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/5f519b249d4f/sensors-23-02559-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/9d65657089c9/sensors-23-02559-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/e0e85ef5cb13/sensors-23-02559-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/10007124/14c308054c5d/sensors-23-02559-g009.jpg

相似文献

1
HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval.学习一种用于图像-文本检索的分层自适应对齐网络。
Sensors (Basel). 2023 Feb 25;23(5):2559. doi: 10.3390/s23052559.
2
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量
IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.
3
Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。
IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.
4
Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.记忆、关联与匹配:通过细粒度对齐进行图像-文本检索的嵌入增强
IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.
5
Image-Specific Information Suppression and Implicit Local Alignment for Text-Based Person Search.基于文本的行人搜索中的图像特定信息抑制与隐式局部对齐
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17973-17986. doi: 10.1109/TNNLS.2023.3310118. Epub 2024 Dec 2.
6
Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training.高效的基于令牌的图像-文本检索与一致的多模态对比训练。
IEEE Trans Image Process. 2023;32:3622-3633. doi: 10.1109/TIP.2023.3286710. Epub 2023 Jul 3.
7
Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.用于细粒度图像-文本检索的关系聚合跨图相关性学习
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.
8
Cross-Modal Attention With Semantic Consistence for Image-Text Matching.用于图像-文本匹配的具有语义一致性的跨模态注意力机制
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5412-5425. doi: 10.1109/TNNLS.2020.2967597. Epub 2020 Nov 30.
9
MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism.MAVA:基于跨媒体双向注意力机制的多层次自适应视觉文本对齐
IEEE Trans Image Process. 2019 Nov 22. doi: 10.1109/TIP.2019.2952085.
10
Hierarchical matching and reasoning for multi-query image retrieval.多层次匹配与推理的多查询图像检索。
Neural Netw. 2024 May;173:106200. doi: 10.1016/j.neunet.2024.106200. Epub 2024 Feb 22.

本文引用的文献

1
Image-Text Embedding Learning via Visual and Textual Semantic Reasoning.通过视觉和文本语义推理进行图像-文本嵌入学习
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):641-656. doi: 10.1109/TPAMI.2022.3148470. Epub 2022 Dec 5.
2
MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism.MAVA:基于跨媒体双向注意力机制的多层次自适应视觉文本对齐
IEEE Trans Image Process. 2019 Nov 22. doi: 10.1109/TIP.2019.2952085.
3
Multimodal Machine Learning: A Survey and Taxonomy.多模态机器学习:一项综述与分类法
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.
4
Learning Two-Branch Neural Networks for Image-Text Matching Tasks.学习用于图像-文本匹配任务的双分支神经网络。
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):394-407. doi: 10.1109/TPAMI.2018.2797921. Epub 2018 Jan 24.
5
Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval.跨模态检索的联合特征选择与子空间学习。
IEEE Trans Pattern Anal Mach Intell. 2016 Oct;38(10):2010-23. doi: 10.1109/TPAMI.2015.2505311. Epub 2015 Dec 3.