• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

R-Cut:通过关系加权输出和裁剪增强视觉Transformer的可解释性

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut.

作者信息

Niu Yingjie, Ding Ming, Ge Maoning, Karlsson Robin, Zhang Yuxiao, Carballo Alexander, Takeda Kazuya

机构信息

Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan.

Graduate School of Engineering, Gifu University, Gifu 501-1112, Japan.

出版信息

Sensors (Basel). 2024 Apr 24;24(9):2695. doi: 10.3390/s24092695.

DOI:10.3390/s24092695
PMID:38732800
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11085337/
Abstract

Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the "Relationship Weighted Out" and the "Cut" modules. The "Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the "Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, which is specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in scenarios with complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.

摘要

基于Transformer的模型在自然语言处理(NLP)领域中颇受欢迎,并广泛应用于计算机视觉任务以及诸如GPT4之类的多模态模型。本文提出了一种新颖的方法来增强基于Transformer的图像分类模型的可解释性。我们的方法旨在通过提供特定类别的映射可视化,提高对分类结果的信任度,并使用户能够更深入地理解模型以用于下游任务。我们引入了两个模块:“关系加权输出”模块和“切割”模块。“关系加权输出”模块专注于从中间层提取特定类别的信息,使我们能够突出相关特征。此外,“切割”模块进行细粒度的特征分解,同时考虑位置、纹理和颜色等因素。通过整合这些模块,我们生成了密集的特定类别的视觉可解释性映射。我们在ImageNet数据集上进行了广泛的定性和定量实验来验证我们的方法。此外,我们在专门为自动驾驶危险警报设计的LRN数据集上进行了大量实验,以评估我们的方法在复杂背景场景下的可解释性。结果表明,与先前的方法相比有显著改进。此外,我们进行了消融实验以验证每个模块的有效性。通过这些实验,我们能够确认每个模块的各自贡献,从而巩固了我们提出的方法的整体有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/3b66e060ba23/sensors-24-02695-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/322bb4d8fcfa/sensors-24-02695-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/0d50d47f83b8/sensors-24-02695-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/2c5313360d19/sensors-24-02695-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/29b01d4c8f8d/sensors-24-02695-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/20f9b11d4975/sensors-24-02695-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/9af6dc2a933d/sensors-24-02695-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/e2aa9e59a57e/sensors-24-02695-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/3b66e060ba23/sensors-24-02695-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/322bb4d8fcfa/sensors-24-02695-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/0d50d47f83b8/sensors-24-02695-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/2c5313360d19/sensors-24-02695-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/29b01d4c8f8d/sensors-24-02695-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/20f9b11d4975/sensors-24-02695-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/9af6dc2a933d/sensors-24-02695-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/e2aa9e59a57e/sensors-24-02695-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdb0/11085337/3b66e060ba23/sensors-24-02695-g007.jpg

相似文献

1
R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut.R-Cut:通过关系加权输出和裁剪增强视觉Transformer的可解释性
Sensors (Basel). 2024 Apr 24;24(9):2695. doi: 10.3390/s24092695.
2
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
3
What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么:语义信息对视觉表征的影响。
Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.
4
Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。
Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.
5
Visualizing and Understanding Patch Interactions in Vision Transformer.可视化与理解视觉Transformer中的补丁交互
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13671-13680. doi: 10.1109/TNNLS.2023.3270479. Epub 2024 Oct 7.
6
TransVG++: End-to-End Visual Grounding With Language Conditioned Vision Transformer.TransVG++:基于语言条件视觉Transformer的端到端视觉基础
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13636-13652. doi: 10.1109/TPAMI.2023.3296823. Epub 2023 Oct 3.
7
TCI-UNet: transformer-CNN interactive module for medical image segmentation.TCI-UNet:用于医学图像分割的Transformer-CNN交互模块
Biomed Opt Express. 2023 Oct 23;14(11):5904-5920. doi: 10.1364/BOE.499640. eCollection 2023 Nov 1.
8
Vicinity Vision Transformer.邻近视觉变换器
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12635-12649. doi: 10.1109/TPAMI.2023.3285569. Epub 2023 Sep 5.
9
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
10
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn:一个基于 Transformer 的模型的医学语言理解工具包。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.

本文引用的文献

1
Enhancing Query Formulation for Universal Image Segmentation.增强通用图像分割的查询公式制定
Sensors (Basel). 2024 Mar 14;24(6):1879. doi: 10.3390/s24061879.
2
LayerCAM: Exploring Hierarchical Class Activation Maps for Localization.LayerCAM:探索用于定位的分层类激活映射
IEEE Trans Image Process. 2021;30:5875-5888. doi: 10.1109/TIP.2021.3089943. Epub 2021 Jun 28.
3
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.关于通过逐层相关性传播对非线性分类器决策进行逐像素解释
PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.
4
Decision tree methods: applications for classification and prediction.决策树方法:分类与预测应用
Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. doi: 10.11919/j.issn.1002-0829.215044.