Suppr超能文献

基于树结构的知识推理在视觉问答中的应用

Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.

机构信息

Shanghai Institute of Technical Physics of the Chinese Academy of Sciences, Shanghai 200083, China.

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Sensors (Basel). 2022 Feb 17;22(4):1575. doi: 10.3390/s22041575.

Abstract

Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image-question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attention model that locates attended regions based on the image features and question embeddings by attention mechanism, the gated reasoning model that forgets and updates the fused features, the fusion reasoning model that mines high-level semantics of the attended visual features and knowledge base and knowledge-based fact model that makes up for the lack of visual and textual information with external knowledge. Therefore, our model performs visual analysis and reasoning based on tree structures, knowledge base and four neural module networks. Experimental results show that our model achieves superior performance over existing methods on the VQA v2.0 and CLVER dataset, and visual reasoning experiments prove the interpretability of the model.

摘要

基于知识的视觉问答中的协同推理具有挑战性,但对于理解图像和问题的特征至关重要且高效。虽然之前的方法通过注意力机制联合融合各种特征,或者使用手工规则生成布局以执行组合推理,但缺乏视觉推理过程,并为预测正确答案引入了大量参数。为了对各种图像-问题对进行视觉推理,在本文中,我们提出了一种基于知识库的问题引导树结构的新推理模型(QGTSKB)来解决这些问题。此外,我们的模型由四个神经模块网络组成:注意力模型,它通过注意力机制基于图像特征和问题嵌入来定位关注区域;门控推理模型,它忘记和更新融合特征;融合推理模型,它挖掘关注视觉特征和知识库的高级语义;以及基于外部知识弥补视觉和文本信息缺失的知识库事实模型。因此,我们的模型基于树结构、知识库和四个神经模块网络执行视觉分析和推理。实验结果表明,我们的模型在 VQA v2.0 和 CLVER 数据集上优于现有方法的性能,并且视觉推理实验证明了模型的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16a/8874875/39c1f3960633/sensors-22-01575-g001.jpg

相似文献

2
3
Interpretable Visual Question Answering by Reasoning on Dependency Trees.基于依存树推理的可解释视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):887-901. doi: 10.1109/TPAMI.2019.2943456. Epub 2021 Feb 4.
6
Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.用于视觉问答的基于丰富视觉知识的增强网络
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4362-4373. doi: 10.1109/TNNLS.2020.3017530. Epub 2021 Oct 5.
7
MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network.MRA-Net:基于多模态关系注意力网络的视觉问答任务改进。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):318-329. doi: 10.1109/TPAMI.2020.3004830. Epub 2021 Dec 7.
8
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.
10
Structured Multimodal Attentions for TextVQA.面向文本视觉问答的结构化多模态注意力
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9603-9614. doi: 10.1109/TPAMI.2021.3132034. Epub 2022 Nov 7.

本文引用的文献

1
Interpretable Visual Question Answering by Reasoning on Dependency Trees.基于依存树推理的可解释视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):887-901. doi: 10.1109/TPAMI.2019.2943456. Epub 2021 Feb 4.
2
FVQA: Fact-based Visual Question Answering.基于事实的视觉问答(FVQA)。
IEEE Trans Pattern Anal Mach Intell. 2018 Oct;40(10):2413-2427. doi: 10.1109/TPAMI.2017.2754246. Epub 2017 Sep 19.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验