Li Xingjian, Goodman Dou, Liu Ji, Wei Tao, Dou Dejing
Big Data Lab, Baidu Research, Beijing, China.
X-Lab, Baidu Inc., Beijing, China.
Front Artif Intell. 2022 Jan 27;4:752831. doi: 10.3389/frai.2021.752831. eCollection 2021.
Though deep neural networks have achieved the state of the art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. In this paper, we develop improved techniques for defending against adversarial examples. First, we propose an enhanced defense technique denoted , which encourages both attention map and logit for the pairs of examples to be similar. When being applied to clean examples and their adversarial counterparts, improves accuracy on adversarial examples over adversarial training. We show that can effectively increase the average activations of adversarial examples in the key area and demonstrate that it focuses on discriminate features to improve the robustness of the model. Finally, we conduct extensive experiments using a wide range of datasets and the experiment results show that our achieves defense performance. For example, on , under strong 200-iteration Projected Gradient Descent (PGD) gray-box and black-box attacks where prior art has 34 and 39% accuracy, our method achieves and . Compared with previous work, our work is evaluated under highly challenging PGD attack: the maximum perturbation ∈ {0.25, 0.5} i.e. ∈ {0.25, 0.5} with 10-200 attack iterations. To the best of our knowledge, such a strong attack has not been previously explored on a wide range of datasets.
尽管深度神经网络在视觉分类方面已经取得了最优性能,但最近的研究表明,它们都容易受到对抗样本的攻击。在本文中,我们开发了改进技术来抵御对抗样本。首先,我们提出了一种增强防御技术,记为 ,它鼓励示例对的注意力图和逻辑值相似。当应用于干净示例及其对抗对应物时, 相对于对抗训练提高了对抗样本的准确率。我们表明, 可以有效地增加关键区域对抗样本的平均激活,并证明它专注于区分特征以提高模型的鲁棒性。最后,我们使用广泛的数据集进行了大量实验,实验结果表明我们的 实现了防御性能。例如,在 上,在强大的200次迭代投影梯度下降(PGD)灰盒和黑盒攻击下,现有技术的准确率分别为34%和39%,我们的方法实现了 和 。与之前的工作相比,我们的工作是在极具挑战性的PGD攻击下进行评估的:最大扰动 ∈ {0.25, 0.5},即 ∈ {0.25, 0.5},攻击迭代次数为10 - 200次。据我们所知,之前尚未在广泛的数据集上探索过如此强大的攻击。