• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ALSA:用于视觉问答的有监督注意力对抗学习。

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.

出版信息

IEEE Trans Cybern. 2022 Jun;52(6):4520-4533. doi: 10.1109/TCYB.2020.3029423. Epub 2022 Jun 16.

DOI:10.1109/TCYB.2020.3029423
PMID:33175690
Abstract

Visual question answering (VQA) has gained increasing attention in both natural language processing and computer vision. The attention mechanism plays a crucial role in relating the question to meaningful image regions for answer inference. However, most existing VQA methods: 1) learn the attention distribution either from free-form regions or detection boxes in the image, which is intractable in answering questions about the foreground object and background form, respectively and 2) neglect the prior knowledge of human attention and learn the attention distribution with an unguided strategy. To fully exploit the advantages of attention, the learned attention distribution should focus more on the question-related image regions, such as human attention for both the questions, about the foreground object and background form. To achieve this, this article proposes a novel VQA model, called adversarial learning of supervised attentions (ALSAs). Specifically, two supervised attention modules: 1) free form-based and 2) detection-based, are designed to exploit the prior knowledge for attention distribution learning. To effectively learn the correlations between the question and image from different views, that is, free-form regions and detection boxes, an adversarial learning mechanism is implemented as an interplay between two supervised attention modules. The adversarial learning reinforces the two attention modules mutually to make the learned multiview features more effective for answer inference. The experiments performed on three commonly used VQA datasets confirm the favorable performance of ALSA.

摘要

视觉问答 (VQA) 在自然语言处理和计算机视觉领域受到了越来越多的关注。注意力机制在将问题与图像中有意义的区域相关联以进行答案推断方面起着至关重要的作用。然而,大多数现有的 VQA 方法:1)从图像中的自由形式区域或检测框中学习注意力分布,这在回答关于前景对象和背景形式的问题时分别是棘手的,2)忽略了人类注意力的先验知识,并采用无指导的策略来学习注意力分布。为了充分利用注意力的优势,学习到的注意力分布应该更集中于与问题相关的图像区域,例如人类对关于前景对象和背景形式的问题的注意力。为了实现这一点,本文提出了一种新的 VQA 模型,称为监督注意力的对抗学习(ALSAs)。具体来说,设计了两个监督注意力模块:1)基于自由形式的和 2)基于检测的,用于利用注意力分布学习的先验知识。为了从不同的视角(即自由形式的区域和检测框)有效地学习问题和图像之间的相关性,实现了对抗学习机制,作为两个监督注意力模块之间的相互作用。对抗学习相互加强两个注意力模块,使学习到的多视图特征更有效地进行答案推断。在三个常用的 VQA 数据集上进行的实验证实了 ALSA 的良好性能。

相似文献

1
ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering.ALSA:用于视觉问答的有监督注意力对抗学习。
IEEE Trans Cybern. 2022 Jun;52(6):4520-4533. doi: 10.1109/TCYB.2020.3029423. Epub 2022 Jun 16.
2
Adversarial Learning With Multi-Modal Attention for Visual Question Answering.用于视觉问答的多模态注意力对抗学习
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):3894-3908. doi: 10.1109/TNNLS.2020.3016083. Epub 2021 Aug 31.
3
Adversarial Learning with Bidirectional Attention for Visual Question Answering.基于双向注意力的对抗式学习在视觉问答中的应用。
Sensors (Basel). 2021 Oct 28;21(21):7164. doi: 10.3390/s21217164.
4
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.
5
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
6
Multi-modal adaptive gated mechanism for visual question answering.多模态自适应门控机制的视觉问答。
PLoS One. 2023 Jun 28;18(6):e0287557. doi: 10.1371/journal.pone.0287557. eCollection 2023.
7
MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network.MRA-Net:基于多模态关系注意力网络的视觉问答任务改进。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):318-329. doi: 10.1109/TPAMI.2020.3004830. Epub 2021 Dec 7.
8
A Bi-level representation learning model for medical visual question answering.用于医学视觉问答的双层表示学习模型。
J Biomed Inform. 2022 Oct;134:104183. doi: 10.1016/j.jbi.2022.104183. Epub 2022 Aug 28.
9
Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering.多模态显式稀疏注意力网络的视觉问答。
Sensors (Basel). 2020 Nov 26;20(23):6758. doi: 10.3390/s20236758.
10
Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.用于视觉问答的基于丰富视觉知识的增强网络
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4362-4373. doi: 10.1109/TNNLS.2020.3017530. Epub 2021 Oct 5.

引用本文的文献

1
Protein-ligand binding affinity prediction with edge awareness and supervised attention.基于边缘感知和监督注意力的蛋白质-配体结合亲和力预测
iScience. 2022 Dec 28;26(1):105892. doi: 10.1016/j.isci.2022.105892. eCollection 2023 Jan 20.
2
Adversarial Learning with Bidirectional Attention for Visual Question Answering.基于双向注意力的对抗式学习在视觉问答中的应用。
Sensors (Basel). 2021 Oct 28;21(21):7164. doi: 10.3390/s21217164.