Department of General Systems Studies, The University of Tokyo, Tokyo, Japan.
Department of General Systems Studies, The University of Tokyo, Tokyo, Japan.
Neural Netw. 2020 Sep;129:75-90. doi: 10.1016/j.neunet.2020.05.026. Epub 2020 May 29.
Research explaining the behavior of convolutional neural networks (CNNs) has gained a lot of attention over the past few years. Although many visualization methods have been proposed to explain network predictions, most fail to provide clear correlations between the target output and the features extracted by convolutional layers. In this work, we define a concept, i.e., class-discriminative feature groups, to specify features that are extracted by groups of convolutional kernels correlated with a particular image class. We propose a detection method to detect class-discriminative feature groups and a visualization method to highlight image regions correlated with particular output and to interpret class-discriminative feature groups intuitively. The experiments showed that the proposed method can disentangle features based on image classes and shed light on what feature groups are extracted from which regions of the image. We also applied this method to visualize "lost" features in adversarial samples and features in an image containing a non-class object to demonstrate its ability to debug why the network failed or succeeded.
近年来,研究解释卷积神经网络(CNN)行为的方法引起了广泛关注。尽管已经提出了许多可视化方法来解释网络预测,但大多数方法都无法提供目标输出与卷积层提取的特征之间的明确关联。在这项工作中,我们定义了一个概念,即类判别特征组,用于指定与特定图像类相关的卷积核组提取的特征。我们提出了一种检测方法来检测类判别特征组,并提出了一种可视化方法来突出与特定输出相关的图像区域,并直观地解释类判别特征组。实验表明,所提出的方法可以根据图像类分离特征,并揭示从图像的哪些区域提取了哪些特征组。我们还将该方法应用于可视化对抗样本中的“丢失”特征和包含非类对象的图像中的特征,以证明其调试网络失败或成功原因的能力。