Suppr超能文献

一种全面可靠的特征归因方法:双面删除与重建(DoRaR)。

A comprehensive and reliable feature attribution method: Double-sided remove and reconstruct (DoRaR).

机构信息

Department of Electrical and Computer Engineering, Iowa State University, 2215 Coover Hall, 2520 Osborn Drive, Ames, 50011-1046, IA, USA.

Department of Computer Science, Kansas State University, 2184 Engineering Hall, 1701D Platt St., Manhattan, 66506, KS, USA.

出版信息

Neural Netw. 2024 May;173:106166. doi: 10.1016/j.neunet.2024.106166. Epub 2024 Feb 10.

Abstract

The limited transparency of the inner decision-making mechanism in deep neural networks (DNN) and other machine learning (ML) models has hindered their application in several domains. In order to tackle this issue, feature attribution methods have been developed to identify the crucial features that heavily influence decisions made by these black box models. However, many feature attribution methods have inherent downsides. For example, one category of feature attribution methods suffers from the artifacts problem, which feeds out-of-distribution masked inputs directly through the classifier that was originally trained on natural data points. Another category of feature attribution method finds explanations by using jointly trained feature selectors and predictors. While avoiding the artifacts problem, this new category suffers from the Encoding Prediction in the Explanation (EPITE) problem, in which the predictor's decisions rely not on the features, but on the masks that selects those features. As a result, the credibility of attribution results is undermined by these downsides. In this research, we introduce the Double-sided Remove and Reconstruct (DoRaR) feature attribution method based on several improvement methods that addresses these issues. By conducting thorough testing on MNIST, CIFAR10 and our own synthetic dataset, we demonstrate that the DoRaR feature attribution method can effectively bypass the above issues and can aid in training a feature selector that outperforms other state-of-the-art feature attribution methods. Our code is available at https://github.com/dxq21/DoRaR.

摘要

深度神经网络(DNN)和其他机器学习(ML)模型内部决策机制的不透明性限制了它们在多个领域的应用。为了解决这个问题,已经开发了特征归因方法来识别对这些黑盒模型决策有重大影响的关键特征。然而,许多特征归因方法都存在固有缺陷。例如,一类特征归因方法存在伪影问题,该问题通过直接将离群掩蔽输入馈送到最初在自然数据点上训练的分类器来解决。另一类特征归因方法通过使用联合训练的特征选择器和预测器来找到解释。虽然避免了伪影问题,但这个新类别却存在编码预测在解释中的问题(EPITE),即预测器的决策不是基于特征,而是基于选择这些特征的掩蔽。因此,归因结果的可信度受到这些缺陷的影响。在这项研究中,我们引入了基于几种改进方法的双面删除和重构(DoRaR)特征归因方法来解决这些问题。通过在 MNIST、CIFAR10 和我们自己的合成数据集上进行彻底的测试,我们证明了 DoRaR 特征归因方法可以有效地绕过上述问题,并有助于训练出一个比其他最先进的特征归因方法表现更好的特征选择器。我们的代码可以在 https://github.com/dxq21/DoRaR 找到。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验