Zhuang Juntang, Dvornek Nicha C, Li Xiaoxiao, Yang Junlin, Duncan James S
Biomedical Engineering, Yale University, New Haven, CT USA.
Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT USA.
IEEE Int Conf Comput Vis Workshops. 2019 Oct;2019:4235-4239. doi: 10.1109/iccvw.2019.00521. Epub 2020 Mar 5.
Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potential to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input space to the feature space; (2) a linear classifier in the feature space. We can determine the decision boundary of a linear classifier in the feature space; since the transform is invertible, we can invert the decision boundary from the feature space to the input space. Furthermore, we propose to determine the projection of a data point onto the decision boundary, and define explanation as the difference between data and its projection. Finally, we propose to locally approximate a neural network with its first-order Taylor expansion, and define feature importance using a local linear model. We provide the implementation of our method: https://github.com/juntang-zhuang/explain_invertible.
深度神经网络容易受到对抗性攻击,并且由于其黑箱性质而难以解释。最近提出的可逆网络能够根据一层的输出准确地重构其输入,因此有潜力解开黑箱模型。一个可逆网络分类器可以被看作是一个两阶段模型:(1) 从输入空间到特征空间的可逆变换;(2) 特征空间中的线性分类器。我们可以确定特征空间中线性分类器的决策边界;由于变换是可逆的,我们可以将决策边界从特征空间反推到输入空间。此外,我们建议确定一个数据点在决策边界上的投影,并将解释定义为数据与其投影之间的差异。最后,我们建议用其一阶泰勒展开式对神经网络进行局部近似,并使用局部线性模型定义特征重要性。我们提供了我们方法的实现:https://github.com/juntang-zhuang/explain_invertible 。