IEEE Trans Image Process. 2021;30:1291-1304. doi: 10.1109/TIP.2020.3042083. Epub 2020 Dec 23.
Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Despite the potential risk they bring, adversarial examples are also valuable for providing insights into the weakness and blind-spots of DNNs. Thus, the interpretability of a DNN in the adversarial setting aims to explain the rationale behind its decision-making process and makes deeper understanding which results in better practical applications. To address this issue, we try to explain adversarial robustness for deep models from a new perspective of neuron sensitivity which is measured by neuron behavior variation intensity against benign and adversarial examples. In this paper, we first draw the close connection between adversarial robustness and neuron sensitivities, as sensitive neurons make the most non-trivial contributions to model predictions in the adversarial setting. Based on that, we further propose to improve adversarial robustness by stabilizing the behaviors of sensitive neurons. Moreover, we demonstrate that state-of-the-art adversarial training methods improve model robustness by reducing neuron sensitivities, which in turn confirms the strong connections between adversarial robustness and neuron sensitivity. Extensive experiments on various datasets demonstrate that our algorithm effectively achieves excellent results. To the best of our knowledge, we are the first to study adversarial robustness using neuron sensitivities.
深度神经网络 (DNN) 容易受到对抗样本的影响,这些对抗样本中的输入只存在微小的扰动,但却会导致 DNN 给出错误的结果。尽管对抗样本存在潜在风险,但它们也为深入了解 DNN 的弱点和盲点提供了有价值的信息。因此,在对抗环境下,DNN 的可解释性旨在解释其决策过程的基本原理,并实现更深入的理解,从而带来更好的实际应用。为了解决这个问题,我们试图从神经元敏感性的新角度来解释深度模型的对抗鲁棒性,该敏感性通过神经元对良性和对抗样本的行为变化强度来衡量。在本文中,我们首先得出对抗鲁棒性与神经元敏感性之间的紧密联系,因为在对抗环境下,敏感神经元对模型预测做出了最有意义的贡献。在此基础上,我们进一步提出通过稳定敏感神经元的行为来提高对抗鲁棒性。此外,我们证明了最先进的对抗训练方法通过降低神经元敏感性来提高模型的鲁棒性,这反过来又证实了对抗鲁棒性和神经元敏感性之间的紧密联系。在各种数据集上的广泛实验表明,我们的算法能够有效地取得优异的结果。据我们所知,我们是第一个使用神经元敏感性来研究对抗鲁棒性的。