Wiedeman Christopher, Wang Ge
Rensselaer Polytechnic Institute, Department of Electrical and Computer Systems Engineering, Troy, NY, USA.
Rensselaer Polytechnic Institute, Department of Biomedical Engineering, Troy, NY, USA.
Patterns (N Y). 2022 Mar 24;3(5):100472. doi: 10.1016/j.patter.2022.100472. eCollection 2022 May 13.
Adversarial attack transferability is well recognized in deep learning. Previous work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but little is known beyond that. We propose that transferability between seemingly different models is due to a high linear correlation between the feature sets that different networks extract. In other words, two models trained on the same task that are distant in the parameter space likely extract features in the same fashion, linked by trivial affine transformations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in corresponding latent spaces, can reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a dual-neck autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability.
对抗攻击的可转移性在深度学习中已得到广泛认可。先前的工作通过识别常见的对抗性子空间以及决策边界之间的相关性,对可转移性进行了部分解释,但除此之外,人们了解得并不多。我们提出,看似不同的模型之间的可转移性是由于不同网络提取的特征集之间存在高度线性相关性。换句话说,在相同任务上训练的两个模型,尽管在参数空间中相距甚远,但它们可能以相同的方式提取特征,通过潜在空间之间的平凡仿射变换相联系。此外,我们展示了应用特征相关损失(该损失使相应潜在空间中提取的特征去相关)如何能够降低对抗攻击在模型之间的可转移性,这表明模型以语义上不同的方式完成任务。最后,我们提出了一种双颈自动编码器(DNA),它利用这种特征相关损失来创建输入信息的两种有意义的不同编码,同时降低可转移性。