攻击与解释深度神经网络

Attack to Fool and Explain Deep Networks.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):5980-5995. doi: 10.1109/TPAMI.2021.3083769. Epub 2022 Sep 14.

DOI:10.1109/TPAMI.2021.3083769

Abstract

Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We first propose an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We show that the proposed attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we alter the 'adversarial' objective of our attack to use it as a tool to 'explain' deep visual representation. We show that by careful channeling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. Finally, we exploit the explanability properties of our perturbations to perform image generation, inpainting and interactive image manipulation by attacking adversarialy robust 'classifiers'. In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models. The article also makes secondary contributions in terms of establishing the utility of our attack beyond the adversarial objective with multiple interesting applications.

摘要

深度视觉模型容易受到输入的对抗性干扰。尽管这些信号是精心设计的，但它们在人类看来仍然像是噪声模式。这一观察结果导致了深度视觉表示与人类感知不一致的论点。我们通过提供对抗性干扰中存在人类有意义模式的证据来反驳这一观点。我们首先提出了一种攻击，该攻击可以欺骗网络将一整类物体（源类）与目标标签混淆。我们的攻击还通过来自非源类的样本限制了无意的欺骗，从而限定了网络欺骗的人类定义的语义概念。我们表明，所提出的攻击不仅导致了干扰中出现规则的几何模式，而且还揭示了有关深度模型决策边界的有见地的信息。进一步探索这一现象，我们改变了攻击的“对抗性”目标，将其用作“解释”深度视觉表示的工具。我们表明，通过仔细引导和投影我们方法计算的扰动，我们可以可视化模型对人类定义的语义概念的理解。最后，我们利用我们的扰动的可解释性特性，通过攻击对抗鲁棒的“分类器”来执行图像生成、修复和交互式图像操作。总之，我们的主要贡献是一种新颖的实用对抗性攻击，随后将其转化为解释视觉模型的工具。该文章还通过多个有趣的应用，在超越对抗性目标的方面，提供了关于我们攻击的实用性的次要贡献。

相似文献

Attack to Fool and Explain Deep Networks.攻击与解释深度神经网络

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):5980-5995. doi: 10.1109/TPAMI.2021.3083769. Epub 2022 Sep 14.

Enhancing robustness in video recognition models: Sparse adversarial attacks and beyond.增强视频识别模型的鲁棒性：稀疏对抗攻击及其他。

Neural Netw. 2024 Mar;171:127-143. doi: 10.1016/j.neunet.2023.11.056. Epub 2023 Nov 25.

Uni-image: Universal image construction for robust neural model.Uni-image：用于稳健神经模型的通用图像构建。

Neural Netw. 2020 Aug;128:279-287. doi: 10.1016/j.neunet.2020.05.018. Epub 2020 May 21.

Crafting Adversarial Perturbations via Transformed Image Component Swapping.通过变换图像组件交换来生成对抗性扰动

IEEE Trans Image Process. 2022;31:7338-7349. doi: 10.1109/TIP.2022.3204206. Epub 2022 Nov 30.

Generalizable Data-Free Objective for Crafting Universal Adversarial Perturbations.用于生成通用对抗扰动的可推广无数据目标。

IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2452-2465. doi: 10.1109/TPAMI.2018.2861800. Epub 2018 Jul 31.

Differential evolution based dual adversarial camouflage: Fooling human eyes and object detectors.基于差分进化的双重对抗伪装：愚弄人类的眼睛和目标检测器。

Neural Netw. 2023 Jun;163:256-271. doi: 10.1016/j.neunet.2023.03.041. Epub 2023 Mar 31.

K-Anonymity inspired adversarial attack and multiple one-class classification defense.K-匿名启发的对抗攻击与多类单分类防御。

Neural Netw. 2020 Apr;124:296-307. doi: 10.1016/j.neunet.2020.01.015. Epub 2020 Feb 6.

Frequency constraint-based adversarial attack on deep neural networks for medical image classification.基于频率约束的深度神经网络对抗攻击在医学图像分类中的应用

Comput Biol Med. 2023 Sep;164:107248. doi: 10.1016/j.compbiomed.2023.107248. Epub 2023 Jul 25.

Frequency-Tuned Universal Adversarial Attacks on Texture Recognition.基于纹理识别的频率调谐通用对抗攻击

IEEE Trans Image Process. 2022;31:5856-5868. doi: 10.1109/TIP.2022.3202366. Epub 2022 Sep 8.

DEFEAT: Decoupled feature attack across deep neural networks.击败：跨深度神经网络的解耦特征攻击。

Neural Netw. 2022 Dec;156:13-28. doi: 10.1016/j.neunet.2022.09.009. Epub 2022 Sep 20.

攻击与解释深度神经网络

Attack to Fool and Explain Deep Networks.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):5980-5995. doi: 10.1109/TPAMI.2021.3083769. Epub 2022 Sep 14.

DOI:10.1109/TPAMI.2021.3083769

PMID:34038356

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

攻击与解释深度神经网络

Attack to Fool and Explain Deep Networks.

出版信息

相似文献

攻击与解释深度神经网络

Attack to Fool and Explain Deep Networks.

出版信息

相似文献