Agarwal Akshay, Ratha Nalini, Vatsa Mayank, Singh Richa
IEEE Trans Image Process. 2022;31:7338-7349. doi: 10.1109/TIP.2022.3204206. Epub 2022 Nov 30.
Adversarial attacks have been demonstrated to fool the deep classification networks. There are two key characteristics of these attacks: firstly, these perturbations are mostly additive noises carefully crafted from the deep neural network itself. Secondly, the noises are added to the whole image, not considering them as the combination of multiple components from which they are made. Motivated by these observations, in this research, we first study the role of various image components and the impact of these components on the classification of the images. These manipulations do not require the knowledge of the networks and external noise to function effectively and hence have the potential to be one of the most practical options for real-world attacks. Based on the significance of the particular image components, we also propose a transferable adversarial attack against unseen deep networks. The proposed attack utilizes the projected gradient descent strategy to add the adversarial perturbation to the manipulated component image. The experiments are conducted on a wide range of networks and four databases including ImageNet and CIFAR-100. The experiments show that the proposed attack achieved better transferability and hence gives an upper hand to an attacker. On the ImageNet database, the success rate of the proposed attack is up to 88.5%, while the current state-of-the-art attack success rate on the database is 53.8%. We have further tested the resiliency of the attack against one of the most successful defenses namely adversarial training to measure its strength. The comparison with several challenging attacks shows that: (i) the proposed attack has a higher transferability rate against multiple unseen networks and (ii) it is hard to mitigate its impact. We claim that based on the understanding of the image components, the proposed research has been able to identify a newer adversarial attack unseen so far and unsolvable using the current defense mechanisms.
对抗攻击已被证明能骗过深度分类网络。这些攻击有两个关键特征:其一,这些扰动大多是从深度神经网络本身精心构造的加性噪声。其二,噪声被添加到整个图像上,而不是将其视为由多个组成部分构成的组合。受这些观察结果的启发,在本研究中,我们首先研究各种图像组件的作用以及这些组件对图像分类的影响。这些操作无需网络知识和外部噪声就能有效发挥作用,因此有可能成为现实世界攻击中最实用的选择之一。基于特定图像组件的重要性,我们还提出了一种针对未见深度网络的可迁移对抗攻击。所提出的攻击利用投影梯度下降策略将对抗扰动添加到经过处理的组件图像上。实验在包括ImageNet和CIFAR - 100在内的广泛网络和四个数据库上进行。实验表明,所提出的攻击实现了更好的可迁移性,从而使攻击者占据上风。在ImageNet数据库上,所提出攻击的成功率高达88.5%,而该数据库当前最先进的攻击成功率为53.8%。我们进一步测试了该攻击对最成功的防御方法之一即对抗训练的弹性,以衡量其强度。与几种具有挑战性的攻击的比较表明:(i)所提出的攻击对多个未见网络具有更高的可迁移率,(ii)很难减轻其影响。我们声称,基于对图像组件的理解,本研究所识别出的一种新型对抗攻击是目前尚未见过且使用当前防御机制无法解决的。