Suppr超能文献

针对未知对抗攻击的泛化鲁棒性的元不变性防御

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks.

作者信息

Zhang Lei, Zhou Yuhang, Yang Yi, Gao Xinbo

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6669-6687. doi: 10.1109/TPAMI.2024.3385745. Epub 2024 Sep 5.

Abstract

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks. Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. Besides, commonly used adaptive learning and fine-tuning technique is unsuitable for adversarial defense since it is essentially a zero-shot problem when deployed. Thus, to tackle this challenge, we propose an attack-agnostic defense method named Meta Invariance Defense (MID). Specifically, various combinations of adversarial attacks are randomly sampled from a manually constructed Attacker Pool to constitute different defense tasks against unknown attacks, in which a student encoder is supervised by multi-consistency distillation to learn the attack-invariant features via a meta principle. The proposed MID has two merits: 1) Full distillation from pixel-, feature- and prediction-level between benign and adversarial samples facilitates the discovery of attack-invariance. 2) The model simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration. Theoretical and empirical studies on numerous benchmarks such as ImageNet verify the generalizable robustness and superiority of MID under various attacks.

摘要

尽管深度神经网络(DNN)模型为计算机视觉任务提供了高性能解决方案,但已被证明极易受到对抗攻击。当前的防御主要集中在已知攻击上,而对未知攻击的对抗鲁棒性却被严重忽视。此外,常用的自适应学习和微调技术不适用于对抗防御,因为在部署时它本质上是一个零样本问题。因此,为应对这一挑战,我们提出了一种名为元不变性防御(MID)的与攻击无关的防御方法。具体而言,从人工构建的攻击者池中随机采样各种对抗攻击组合,以构成针对未知攻击的不同防御任务,其中学生编码器通过多一致性蒸馏进行监督,以通过元原则学习攻击不变特征。所提出的MID有两个优点:1)在良性样本和对抗样本之间从像素级、特征级和预测级进行完全蒸馏,有助于发现攻击不变性。2)该模型在高级图像分类中对不可察觉的对抗扰动同时实现鲁棒性,在低级鲁棒图像生成中实现攻击抑制。对诸如ImageNet等众多基准的理论和实证研究验证了MID在各种攻击下的可推广鲁棒性和优越性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验