Zhang Yonggang, Tian Xinmei, Li Ya, Wang Xinchao, Tao Dacheng
IEEE Trans Image Process. 2020 Feb 28. doi: 10.1109/TIP.2020.2975918.
Despite having achieved excellent performance on various tasks, deep neural networks have been shown to be susceptible to adversarial examples, i.e., visual inputs crafted with structural imperceptible noise. To explain this phenomenon, previous works implicate the weak capability of the classification models and the difficulty of the classification tasks. These explanations appear to account for some of the empirical observations but lack deep insight into the intrinsic nature of adversarial examples, such as the generation method and transferability. Furthermore, previous works generate adversarial examples completely rely on a specific classifier (model). Consequently, the attack ability of adversarial examples is strongly dependent on the specific classifier. More importantly, adversarial examples cannot be generated without a trained classifier. In this paper, we raise a question: what is the real cause of the generation of adversarial examples? To answer this question, we propose a new concept, called the adversarial region, which explains the existence of adversarial examples as perturbations perpendicular to the tangent plane of the data manifold. This view yields a clear explanation of the transfer property across different models of adversarial examples. Moreover, with the notion of the adversarial region, we propose a novel target-free method to generate adversarial examples via principal component analysis. We verify our adversarial region hypothesis on a synthetic dataset and demonstrate through extensive experiments on real datasets that the adversarial examples generated by our method have competitive or even strong transferability compared with model-dependent adversarial example generating methods. Moreover, our experiment shows that the proposed method is more robust to defensive methods than previous methods.
尽管深度神经网络在各种任务上都取得了优异的性能,但已证明它们容易受到对抗样本的影响,即由结构上不可察觉的噪声精心构建的视觉输入。为了解释这一现象,先前的研究认为是分类模型的能力薄弱以及分类任务的难度所致。这些解释似乎能说明一些实证观察结果,但缺乏对对抗样本内在本质的深入洞察,比如生成方法和可迁移性。此外,先前的研究完全依赖特定的分类器(模型)来生成对抗样本。因此,对抗样本的攻击能力强烈依赖于特定的分类器。更重要的是,没有经过训练的分类器就无法生成对抗样本。在本文中,我们提出一个问题:对抗样本生成的真正原因是什么?为了回答这个问题,我们提出了一个新的概念,称为对抗区域,它将对抗样本的存在解释为与数据流形切平面垂直的扰动。这种观点对对抗样本在不同模型间的可迁移特性给出了清晰的解释。此外,基于对抗区域的概念,我们提出了一种新颖的无目标方法,通过主成分分析来生成对抗样本。我们在一个合成数据集上验证了我们的对抗区域假设,并通过在真实数据集上的大量实验表明,与依赖模型的对抗样本生成方法相比,我们的方法生成的对抗样本具有竞争力甚至更强的可迁移性。此外,我们的实验表明,所提出的方法比先前的方法对防御方法更具鲁棒性。