• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Knowledge-Augmented Visual Question Answering With Natural Language Explanation.

作者信息

Xie Jiayuan, Cai Yi, Chen Jiali, Xu Ruohang, Wang Jiexin, Li Qing

出版信息

IEEE Trans Image Process. 2024;33:2652-2664. doi: 10.1109/TIP.2024.3379900. Epub 2024 Apr 3.

DOI:10.1109/TIP.2024.3379900
PMID:38546994
Abstract

Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.

摘要

相似文献

1
Knowledge-Augmented Visual Question Answering With Natural Language Explanation.
IEEE Trans Image Process. 2024;33:2652-2664. doi: 10.1109/TIP.2024.3379900. Epub 2024 Apr 3.
2
Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.用于视觉问答的基于丰富视觉知识的增强网络
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4362-4373. doi: 10.1109/TNNLS.2020.3017530. Epub 2021 Oct 5.
3
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
4
Dual modality prompt learning for visual question-grounded answering in robotic surgery.用于机器人手术中视觉问题引导式回答的双模态提示学习
Vis Comput Ind Biomed Art. 2024 Apr 22;7(1):9. doi: 10.1186/s42492-024-00160-z.
5
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.
6
Bridging the Cross-Modality Semantic Gap in Visual Question Answering.弥合视觉问答中的跨模态语义鸿沟。
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4519-4531. doi: 10.1109/TNNLS.2024.3370925. Epub 2025 Feb 28.
7
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.
8
Radial Graph Convolutional Network for Visual Question Generation.
IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1654-1667. doi: 10.1109/TNNLS.2020.2986029. Epub 2021 Apr 2.
9
Medical visual question answering based on question-type reasoning and semantic space constraint.基于问题类型推理和语义空间约束的医学视觉问答。
Artif Intell Med. 2022 Sep;131:102346. doi: 10.1016/j.artmed.2022.102346. Epub 2022 Jun 30.
10
3D Question Answering.三维问答
IEEE Trans Vis Comput Graph. 2024 Mar;30(3):1772-1786. doi: 10.1109/TVCG.2022.3225327. Epub 2024 Jan 30.