• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于依存树推理的可解释视觉问答。

Interpretable Visual Question Answering by Reasoning on Dependency Trees.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):887-901. doi: 10.1109/TPAMI.2019.2943456. Epub 2021 Feb 4.

DOI:10.1109/TPAMI.2019.2943456
PMID:31562071
Abstract

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems. Although very recent studies have attempted to use explicit compositional processes to assemble multiple subtasks embedded in questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, which leads to either heavy workloads or poor performance on compositional reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question; thus, our model is called a parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module that exploits the local visual evidence of each word parsed from the question, ii) a gated residual composition module that composes the previously mined evidence, and iii) a parse-tree-guided propagation module that passes the mined evidence along the parse tree. Thus, PTGRN is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning. Experiments on relational datasets demonstrate the superiority of PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.

摘要

标题:基于解析树引导推理的可解释视觉问答系统

摘要:协同推理是可解释视觉问答系统中的一个重要但研究较少的课题。尽管最近的研究尝试使用显式的组合过程来组合问题中嵌入的多个子任务,但它们的模型严重依赖于注释或手工规则来获取有效的推理过程,这导致了繁重的工作量或组合推理的性能不佳。在本文中,为了在不同和不受限制的情况下更好地对齐图像和语言领域,我们提出了一种新的神经网络模型,该模型对从问题中解析出的依赖树进行全局推理;因此,我们的模型称为解析树引导推理网络(PTGRN)。该网络由三个协同模块组成:i)注意力模块,利用问题中解析出的每个词的局部视觉证据,ii)门控残差组合模块,组合之前挖掘到的证据,iii)解析树引导传播模块,沿着解析树传递挖掘到的证据。因此,PTGRN 能够构建一个可解释的视觉问答(VQA)系统,该系统能够根据问题驱动的解析树推理逐步推导图像线索。在关系型数据集上的实验表明了 PTGRN 优于当前最先进的 VQA 方法的优越性,可视化结果突出了我们推理系统的可解释性。

相似文献

1
Interpretable Visual Question Answering by Reasoning on Dependency Trees.基于依存树推理的可解释视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):887-901. doi: 10.1109/TPAMI.2019.2943456. Epub 2021 Feb 4.
2
Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.基于树结构的知识推理在视觉问答中的应用
Sensors (Basel). 2022 Feb 17;22(4):1575. doi: 10.3390/s22041575.
3
An effective spatial relational reasoning networks for visual question answering.用于视觉问答的有效的空间关系推理网络。
PLoS One. 2022 Nov 28;17(11):e0277693. doi: 10.1371/journal.pone.0277693. eCollection 2022.
4
Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks.迈向基于双路径神经模块网络的精确视觉推理
Front Robot AI. 2020 Aug 21;7:109. doi: 10.3389/frobt.2020.00109. eCollection 2020.
5
Medical visual question answering based on question-type reasoning and semantic space constraint.基于问题类型推理和语义空间约束的医学视觉问答。
Artif Intell Med. 2022 Sep;131:102346. doi: 10.1016/j.artmed.2022.102346. Epub 2022 Jun 30.
6
Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.用于视觉问答的基于丰富视觉知识的增强网络
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4362-4373. doi: 10.1109/TNNLS.2020.3017530. Epub 2021 Oct 5.
7
Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering.用于可解释医学视觉问答的反事实因果效应干预
IEEE Trans Med Imaging. 2024 Dec;43(12):4430-4441. doi: 10.1109/TMI.2024.3425533. Epub 2024 Dec 2.
8
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.知识引导的视觉问题推理:深度表示嵌入面临的挑战
IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2758-2767. doi: 10.1109/TNNLS.2020.3045034. Epub 2022 Jul 6.
9
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
10
Structured Multimodal Attentions for TextVQA.面向文本视觉问答的结构化多模态注意力
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9603-9614. doi: 10.1109/TPAMI.2021.3132034. Epub 2022 Nov 7.

引用本文的文献

1
Beyond top-k: knowledge reasoning for multi-answer temporal questions based on revalidation framework.超越前 k 项:基于重新验证框架的多答案时间问题知识推理
PeerJ Comput Sci. 2023 Dec 8;9:e1725. doi: 10.7717/peerj-cs.1725. eCollection 2023.
2
Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.基于树结构的知识推理在视觉问答中的应用
Sensors (Basel). 2022 Feb 17;22(4):1575. doi: 10.3390/s22041575.
3
Challenges and Prospects in Vision and Language Research.视觉与语言研究中的挑战与前景
Front Artif Intell. 2019 Dec 13;2:28. doi: 10.3389/frai.2019.00028. eCollection 2019.