白盒对抗攻击的梯度校正

Liu Hongying, Ge Zhijin, Zhou Zhenyu, Shang Fanhua, Liu Yuanyuan, Jiao Licheng

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18419-18430. doi: 10.1109/TNNLS.2023.3315414. Epub 2024 Dec 2.

Deep neural networks (DNNs) play key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, which are almost imperceptibly different from the original samples but can greatly change the output of DNNs. Recently, many white-box attack algorithms have been proposed, and most of the algorithms concentrate on how to make the best use of gradients per iteration to improve adversarial performance. In this article, we focus on the properties of the widely used activation function, rectified linear unit (ReLU), and find that there exist two phenomena (i.e., wrong blocking and over transmission) misguiding the calculation of gradients for ReLU during backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradients and corresponding actual changes and misguide the optimized direction, which results in larger perturbations. Therefore, we propose a universal gradient correction adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient-based white-box attack algorithms such as fast gradient signed method (FGSM), iterative FGSM (I-FGSM), momentum I-FGSM (MI-FGSM), and variance tuning MI-FGSM (VMI-FGSM). Through backpropagation, our approach calculates the gradient of the loss function with respect to the network input, maps the values to scores, and selects a part of them to update the misguided gradients. Comprehensive experimental results on ImageNet and CIFAR10 demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attacks, to further decrease perturbations measured in the -norm.

深度神经网络（DNN）在图像分类和目标识别等各种人工智能应用中发挥着关键作用。然而，越来越多的研究表明，DNN中存在对抗样本，这些样本与原始样本几乎难以察觉地不同，但却能极大地改变DNN的输出。最近，许多白盒攻击算法被提出，并且大多数算法都集中在如何在每次迭代中充分利用梯度来提高对抗性能。在本文中，我们关注广泛使用的激活函数——整流线性单元（ReLU）的特性，发现存在两种现象（即错误阻塞和过度传递）在反向传播过程中误导了ReLU梯度的计算。这两个问题都扩大了梯度预测的损失函数变化与相应实际变化之间的差异，并误导了优化方向，从而导致更大的扰动。因此，我们提出了一种通用的梯度校正对抗样本生成方法，称为ADV - ReLU，以增强基于梯度的白盒攻击算法的性能，如快速梯度符号法（FGSM）、迭代FGSM（I - FGSM）、动量I - FGSM（MI - FGSM）和方差调整MI - FGSM（VMI - FGSM）。通过反向传播，我们的方法计算损失函数相对于网络输入的梯度，将这些值映射到分数，并选择其中一部分来更新被误导的梯度。在ImageNet和CIFAR10上的综合实验结果表明，我们的ADV - ReLU可以很容易地集成到许多基于梯度的先进白盒攻击算法中，也可以转移到黑盒攻击中，以进一步降低以 - 范数衡量的扰动。

相似文献

Gradient Correction for White-Box Adversarial Attacks.

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18419-18430. doi: 10.1109/TNNLS.2023.3315414. Epub 2024 Dec 2.

Improving the Transferability of Adversarial Examples With a Noise Data Enhancement Framework and Random Erasing.

Front Neurorobot. 2021 Dec 9;15:784053. doi: 10.3389/fnbot.2021.784053. eCollection 2021.

Robustifying Deep Networks for Medical Image Segmentation.

J Digit Imaging. 2021 Oct;34(5):1279-1293. doi: 10.1007/s10278-021-00507-5. Epub 2021 Sep 20.

Strengthening transferability of adversarial examples by adaptive inertia and amplitude spectrum dropout.

Neural Netw. 2023 Aug;165:925-937. doi: 10.1016/j.neunet.2023.06.031. Epub 2023 Jun 30.

Adversarial Attacks against Deep-Learning-Based Automatic Dependent Surveillance-Broadcast Unsupervised Anomaly Detection Models in the Context of Air Traffic Management.

Sensors (Basel). 2024 Jun 2;24(11):3584. doi: 10.3390/s24113584.

Adv-BDPM: Adversarial attack based on Boundary Diffusion Probability Model.

Neural Netw. 2023 Oct;167:730-740. doi: 10.1016/j.neunet.2023.08.048. Epub 2023 Sep 9.

An adversarial example attack method based on predicted bounding box adaptive deformation in optical remote sensing images.

PeerJ Comput Sci. 2024 May 28;10:e2053. doi: 10.7717/peerj-cs.2053. eCollection 2024.

ApaNet: adversarial perturbations alleviation network for face verification.

Multimed Tools Appl. 2023;82(5):7443-7461. doi: 10.1007/s11042-022-13641-1. Epub 2022 Aug 23.

Improving the Robustness of Deep-Learning Models in Predicting Hematoma Expansion from Admission Head CT.

AJNR Am J Neuroradiol. 2025 Jan 10. doi: 10.3174/ajnr.A8650.

Attention distraction with gradient sharpening for multi-task adversarial attack.

Math Biosci Eng. 2023 Jun 14;20(8):13562-13580. doi: 10.3934/mbe.2023605.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Gradient Correction for White-Box Adversarial Attacks.

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18419-18430. doi: 10.1109/TNNLS.2023.3315414. Epub 2024 Dec 2.

Improving the Transferability of Adversarial Examples With a Noise Data Enhancement Framework and Random Erasing.

Front Neurorobot. 2021 Dec 9;15:784053. doi: 10.3389/fnbot.2021.784053. eCollection 2021.

Robustifying Deep Networks for Medical Image Segmentation.

J Digit Imaging. 2021 Oct;34(5):1279-1293. doi: 10.1007/s10278-021-00507-5. Epub 2021 Sep 20.

Strengthening transferability of adversarial examples by adaptive inertia and amplitude spectrum dropout.

Neural Netw. 2023 Aug;165:925-937. doi: 10.1016/j.neunet.2023.06.031. Epub 2023 Jun 30.

Adversarial Attacks against Deep-Learning-Based Automatic Dependent Surveillance-Broadcast Unsupervised Anomaly Detection Models in the Context of Air Traffic Management.

Sensors (Basel). 2024 Jun 2;24(11):3584. doi: 10.3390/s24113584.

Adv-BDPM: Adversarial attack based on Boundary Diffusion Probability Model.

Neural Netw. 2023 Oct;167:730-740. doi: 10.1016/j.neunet.2023.08.048. Epub 2023 Sep 9.

An adversarial example attack method based on predicted bounding box adaptive deformation in optical remote sensing images.

PeerJ Comput Sci. 2024 May 28;10:e2053. doi: 10.7717/peerj-cs.2053. eCollection 2024.

ApaNet: adversarial perturbations alleviation network for face verification.

Multimed Tools Appl. 2023;82(5):7443-7461. doi: 10.1007/s11042-022-13641-1. Epub 2022 Aug 23.

Improving the Robustness of Deep-Learning Models in Predicting Hematoma Expansion from Admission Head CT.

AJNR Am J Neuroradiol. 2025 Jan 10. doi: 10.3174/ajnr.A8650.

Attention distraction with gradient sharpening for multi-task adversarial attack.

Math Biosci Eng. 2023 Jun 14;20(8):13562-13580. doi: 10.3934/mbe.2023605.

Gradient Correction for White-Box Adversarial Attacks.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献