Cao Kelei, Liu Mengchen, Su Hang, Wu Jing, Zhu Jun, Liu Shixia
IEEE Trans Vis Comput Graph. 2021 Jul;27(7):3289-3304. doi: 10.1109/TVCG.2020.2969185. Epub 2021 May 27.
Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue, we present a visual analysis method to explain why adversarial examples are misclassified. The key is to compare and analyze the datapaths of both the adversarial and normal examples. A datapath is a group of critical neurons along with their connections. We formulate the datapath extraction as a subset selection problem and solve it by constructing and training a neural network. A multi-level visualization consisting of a network-level visualization of data flows, a layer-level visualization of feature maps, and a neuron-level visualization of learned features, has been designed to help investigate how datapaths of adversarial and normal examples diverge and merge in the prediction process. A quantitative evaluation and a case study were conducted to demonstrate the promise of our method to explain the misclassification of adversarial examples.
对抗样本是通过向正常样本添加微小但故意难以察觉的扰动而生成的,它会误导深度神经网络(DNN)做出错误的预测。尽管在对抗攻击和防御方面已经做了很多工作,但对对抗样本仍缺乏细粒度的理解。为了解决这个问题,我们提出了一种可视化分析方法来解释对抗样本为何被误分类。关键在于比较和分析对抗样本与正常样本的数据路径。数据路径是一组关键神经元及其连接。我们将数据路径提取表述为一个子集选择问题,并通过构建和训练神经网络来解决它。设计了一个多层次可视化,包括数据流的网络级可视化、特征图的层级别可视化以及学习特征的神经元级可视化,以帮助研究对抗样本和正常样本的数据路径在预测过程中是如何发散和合并的。进行了定量评估和案例研究,以证明我们的方法在解释对抗样本误分类方面的前景。