• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

COIN:用于视觉问答解释的反事实图像生成。

COIN: Counterfactual Image Generation for Visual Question Answering Interpretation.

机构信息

Faculty of Computer Science, University of Koblenz-Landau, 56070 Koblenz, Germany.

Fraunhofer Institute for Software and Systems Engineering ISST, 44227 Dortmund, Germany.

出版信息

Sensors (Basel). 2022 Mar 14;22(6):2245. doi: 10.3390/s22062245.

DOI:10.3390/s22062245
PMID:35336415
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8953790/
Abstract

Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced. However, they are still error-prone when dealing with relatively complex questions. Therefore, it is important to understand the behaviour of the VQA models before adopting their results. In this paper, we introduce an interpretability approach for VQA models by generating counterfactual images. Specifically, the generated image is supposed to have the minimal possible change to the original image and leads the VQA model to give a different answer. In addition, our approach ensures that the generated image is realistic. Since quantitative metrics cannot be employed to evaluate the interpretability of the model, we carried out a user study to assess different aspects of our approach. In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.

摘要

由于自然语言处理和基于计算机视觉的模型取得了重大进展,视觉问答 (VQA) 系统正变得更加智能和先进。然而,当处理相对复杂的问题时,它们仍然容易出错。因此,在采用 VQA 模型的结果之前,了解它们的行为是很重要的。在本文中,我们通过生成反事实图像为 VQA 模型引入了一种可解释性方法。具体来说,生成的图像应该对原始图像进行最小的可能更改,并引导 VQA 模型给出不同的答案。此外,我们的方法确保生成的图像是真实的。由于无法使用定量指标来评估模型的可解释性,我们进行了一项用户研究来评估我们方法的不同方面。除了对单个图像上的 VQA 模型结果进行解释外,获得的结果和讨论还对 VQA 模型的行为提供了广泛的解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/0cb0d72fbd15/sensors-22-02245-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/3968d67e1590/sensors-22-02245-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/a810a85e7fee/sensors-22-02245-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/a5dfffa76dd8/sensors-22-02245-g003a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/f991787894d5/sensors-22-02245-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/f05abcd326a2/sensors-22-02245-g005a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/0cb0d72fbd15/sensors-22-02245-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/3968d67e1590/sensors-22-02245-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/a810a85e7fee/sensors-22-02245-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/a5dfffa76dd8/sensors-22-02245-g003a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/f991787894d5/sensors-22-02245-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/f05abcd326a2/sensors-22-02245-g005a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47e8/8953790/0cb0d72fbd15/sensors-22-02245-g006.jpg

相似文献

1
COIN: Counterfactual Image Generation for Visual Question Answering Interpretation.COIN:用于视觉问答解释的反事实图像生成。
Sensors (Basel). 2022 Mar 14;22(6):2245. doi: 10.3390/s22062245.
2
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.用于鲁棒视觉问答的反事实样本合成与训练
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13218-13234. doi: 10.1109/TPAMI.2023.3290012. Epub 2023 Oct 3.
3
Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering.用于可解释医学视觉问答的反事实因果效应干预
IEEE Trans Med Imaging. 2024 Dec;43(12):4430-4441. doi: 10.1109/TMI.2024.3425533. Epub 2024 Dec 2.
4
Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering.多模态显式稀疏注意力网络的视觉问答。
Sensors (Basel). 2020 Nov 26;20(23):6758. doi: 10.3390/s20236758.
5
MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network.MRA-Net:基于多模态关系注意力网络的视觉问答任务改进。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):318-329. doi: 10.1109/TPAMI.2020.3004830. Epub 2021 Dec 7.
6
Vision-Language Transformer for Interpretable Pathology Visual Question Answering.用于可解释病理学视觉问答的视觉-语言转换器。
IEEE J Biomed Health Inform. 2023 Apr;27(4):1681-1690. doi: 10.1109/JBHI.2022.3163751. Epub 2023 Apr 4.
7
Advancing surgical VQA with scene graph knowledge.利用场景图知识推进外科视觉问答。
Int J Comput Assist Radiol Surg. 2024 Jul;19(7):1409-1417. doi: 10.1007/s11548-024-03141-y. Epub 2024 May 23.
8
Vision-Language Model for Visual Question Answering in Medical Imagery.用于医学图像视觉问答的视觉语言模型。
Bioengineering (Basel). 2023 Mar 20;10(3):380. doi: 10.3390/bioengineering10030380.
9
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
10
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.

本文引用的文献

1
Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network.使用双边注意条件生成对抗网络进行自然语言指导的图像操作。
Neural Netw. 2021 Apr;136:207-217. doi: 10.1016/j.neunet.2020.09.002. Epub 2020 Sep 12.
2
Interpretable CNNs for Object Classification.可解释卷积神经网络的目标分类。
IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3416-3431. doi: 10.1109/TPAMI.2020.2982882. Epub 2021 Sep 2.
3
Definitions, methods, and applications in interpretable machine learning.可解释机器学习中的定义、方法和应用。
Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22071-22080. doi: 10.1073/pnas.1900654116. Epub 2019 Oct 16.
4
Visual Turing test for computer vision systems.计算机视觉系统的视觉图灵测试。
Proc Natl Acad Sci U S A. 2015 Mar 24;112(12):3618-23. doi: 10.1073/pnas.1422953112. Epub 2015 Mar 9.
5
Edge focusing.边缘聚焦。
IEEE Trans Pattern Anal Mach Intell. 1987 Jun;9(6):726-41. doi: 10.1109/tpami.1987.4767980.