Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany.
Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany; Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany; BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany.
Neural Netw. 2021 May;137:1-17. doi: 10.1016/j.neunet.2020.12.024. Epub 2021 Jan 9.
Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion-projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.
深度学习模型的对抗攻击已经严重影响了它们的性能。作为补救措施,已经提出了许多防御方法,但这些方法已经被更新和更复杂的攻击策略所规避。在这场持续的军备竞赛中,对抗攻击的鲁棒性问题仍然是一项具有挑战性的任务。本文提出了一种新颖的、简单而有效的防御策略,该策略通过考虑感知边界的 Metropolis 调整 Langevin 算法(MALA)将离群的对抗样本推向未知目标类数据生成分布的高密度区域。为了实现这一任务,我们引入了一种基于条件分布的生成模型,该模型可以通过监督去噪自动编码器(sDAE)在与判别分类器对齐的情况下进行学习。我们的算法称为防御的 MALA(MALADE),它配备了重要的分散-投影机制,分布广泛。这可以防止白盒攻击准确地将输入对齐,从而有效地创建对抗样本。MALADE 可以应用于任何现有的分类器,提供强大的防御和离群样本检测。在我们的实验中,MALADE 表现出了针对各种精心设计的攻击策略的最先进的性能。