Liu Hongying, Ge Zhijin, Zhou Zhenyu, Shang Fanhua, Liu Yuanyuan, Jiao Licheng
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18419-18430. doi: 10.1109/TNNLS.2023.3315414. Epub 2024 Dec 2.
Deep neural networks (DNNs) play key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, which are almost imperceptibly different from the original samples but can greatly change the output of DNNs. Recently, many white-box attack algorithms have been proposed, and most of the algorithms concentrate on how to make the best use of gradients per iteration to improve adversarial performance. In this article, we focus on the properties of the widely used activation function, rectified linear unit (ReLU), and find that there exist two phenomena (i.e., wrong blocking and over transmission) misguiding the calculation of gradients for ReLU during backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradients and corresponding actual changes and misguide the optimized direction, which results in larger perturbations. Therefore, we propose a universal gradient correction adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient-based white-box attack algorithms such as fast gradient signed method (FGSM), iterative FGSM (I-FGSM), momentum I-FGSM (MI-FGSM), and variance tuning MI-FGSM (VMI-FGSM). Through backpropagation, our approach calculates the gradient of the loss function with respect to the network input, maps the values to scores, and selects a part of them to update the misguided gradients. Comprehensive experimental results on ImageNet and CIFAR10 demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attacks, to further decrease perturbations measured in the -norm.
深度神经网络(DNN)在图像分类和目标识别等各种人工智能应用中发挥着关键作用。然而,越来越多的研究表明,DNN中存在对抗样本,这些样本与原始样本几乎难以察觉地不同,但却能极大地改变DNN的输出。最近,许多白盒攻击算法被提出,并且大多数算法都集中在如何在每次迭代中充分利用梯度来提高对抗性能。在本文中,我们关注广泛使用的激活函数——整流线性单元(ReLU)的特性,发现存在两种现象(即错误阻塞和过度传递)在反向传播过程中误导了ReLU梯度的计算。这两个问题都扩大了梯度预测的损失函数变化与相应实际变化之间的差异,并误导了优化方向,从而导致更大的扰动。因此,我们提出了一种通用的梯度校正对抗样本生成方法,称为ADV - ReLU,以增强基于梯度的白盒攻击算法的性能,如快速梯度符号法(FGSM)、迭代FGSM(I - FGSM)、动量I - FGSM(MI - FGSM)和方差调整MI - FGSM(VMI - FGSM)。通过反向传播,我们的方法计算损失函数相对于网络输入的梯度,将这些值映射到分数,并选择其中一部分来更新被误导的梯度。在ImageNet和CIFAR10上的综合实验结果表明,我们的ADV - ReLU可以很容易地集成到许多基于梯度的先进白盒攻击算法中,也可以转移到黑盒攻击中,以进一步降低以 - 范数衡量的扰动。