Suppr超能文献

梯度无法被驯服:靶向对抗攻击的不可能悖论背后。

Gradients Cannot Be Tamed: Behind the Impossible Paradox of Blocking Targeted Adversarial Attacks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):128-138. doi: 10.1109/TNNLS.2020.2977142. Epub 2021 Jan 4.

Abstract

Despite their accuracy, neural network-based classifiers are still prone to manipulation through adversarial perturbations. These perturbations are designed to be misclassified by the neural network while being perceptually identical to some valid inputs. The vast majority of such attack methods rely on white-box conditions, where the attacker has full knowledge of the attacked network's parameters. This allows the attacker to calculate the network's loss gradient with respect to some valid inputs and use this gradient in order to create an adversarial example. The task of blocking white-box attacks has proved difficult to address. While many defense methods have been suggested, they have had limited success. In this article, we examine this difficulty and try to understand it. We systematically explore the capabilities and limitations of defensive distillation, one of the most promising defense mechanisms against adversarial perturbations suggested so far, in order to understand this defense challenge. We show that contrary to commonly held belief, the ability to bypass defensive distillation is not dependent on an attack's level of sophistication. In fact, simple approaches, such as the targeted gradient sign method, are capable of effectively bypassing defensive distillation. We prove that defensive distillation is highly effective against nontargeted attacks but is unsuitable for targeted attacks. This discovery led to our realization that targeted attacks leverage the same input gradient that allows a network to be trained. This implies that blocking them comes at the cost of losing the network's ability to learn, presenting an impossible tradeoff to the research community.

摘要

尽管基于神经网络的分类器具有较高的准确性,但它们仍然容易受到对抗性扰动的影响。这些扰动旨在被神经网络错误分类,同时在感知上与某些有效输入相同。绝大多数此类攻击方法都依赖于白盒条件,攻击者对受攻击网络的参数具有完全的了解。这使得攻击者可以计算网络对某些有效输入的损失梯度,并利用该梯度来创建对抗性示例。阻止白盒攻击的任务被证明难以解决。虽然已经提出了许多防御方法,但它们的效果有限。在本文中,我们研究了这种困难并试图理解它。我们系统地研究了防御性蒸馏的能力和局限性,防御性蒸馏是迄今为止提出的最有前途的对抗性扰动防御机制之一,以了解这种防御挑战。我们表明,与普遍的看法相反,绕过防御性蒸馏的能力并不取决于攻击的复杂程度。事实上,简单的方法,如目标梯度符号方法,就能够有效地绕过防御性蒸馏。我们证明防御性蒸馏对非目标攻击非常有效,但不适合目标攻击。这一发现使我们意识到,目标攻击利用了允许网络进行训练的相同输入梯度。这意味着阻止它们会以牺牲网络学习能力为代价,这对研究社区来说是一个不可能的权衡。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验