Zhang Lilin, Yang Ning, Sun Yanchao, Yu Philip S
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8302-8319. doi: 10.1109/TPAMI.2024.3400988. Epub 2024 Nov 6.
Adversarial training (AT) is widely considered as the most promising strategy to defend against adversarial attacks and has drawn increasing interest from researchers. However, the existing AT methods still suffer from two challenges. First, they are unable to handle unrestricted adversarial examples (UAEs), which are built from scratch, as opposed to restricted adversarial examples (RAEs), which are created by adding perturbations bound by an l norm to observed examples. Second, the existing AT methods often achieve adversarial robustness at the expense of standard generalizability (i.e., the accuracy on natural examples) because they make a tradeoff between them. To overcome these challenges, we propose a unique viewpoint that understands UAEs as imperceptibly perturbed unobserved examples. Also, we find that the tradeoff results from the separation of the distributions of adversarial examples and natural examples. Based on these ideas, we propose a novel AT approach called Provable Unrestricted Adversarial Training (PUAT), which can provide a target classifier with comprehensive adversarial robustness against both UAE and RAE, and simultaneously improve its standard generalizability. Particularly, PUAT utilizes partially labeled data to achieve effective UAE generation by accurately capturing the natural data distribution through a novel augmented triple-GAN. At the same time, PUAT extends the traditional AT by introducing the supervised loss of the target classifier into the adversarial loss and achieves the alignment between the UAE distribution, the natural data distribution, and the distribution learned by the classifier, with the collaboration of the augmented triple-GAN. Finally, the solid theoretical analysis and extensive experiments conducted on widely-used benchmarks demonstrate the superiority of PUAT.
对抗训练(AT)被广泛认为是抵御对抗攻击最有前景的策略,并且已引起研究人员越来越多的关注。然而,现有的AT方法仍然面临两个挑战。首先,它们无法处理完全从头构建的无限制对抗样本(UAE),与之相对的是受限对抗样本(RAE),后者是通过向观测样本添加由l范数约束的扰动而创建的。其次,现有的AT方法通常以牺牲标准泛化能力(即自然样本上的准确率)为代价来实现对抗鲁棒性,因为它们在两者之间进行了权衡。为了克服这些挑战,我们提出了一种独特的观点,即将UAE理解为难以察觉地受到扰动的未观测样本。此外,我们发现这种权衡是由对抗样本和自然样本的分布分离导致的。基于这些想法,我们提出了一种名为可证无限制对抗训练(PUAT)的新颖AT方法,它可以为目标分类器提供针对UAE和RAE的全面对抗鲁棒性,同时提高其标准泛化能力。具体而言,PUAT利用部分标记数据,通过一种新颖的增强型三生成对抗网络(triple-GAN)准确捕捉自然数据分布,从而实现有效的UAE生成。同时,PUAT通过将目标分类器的监督损失引入对抗损失来扩展传统的AT,并在增强型三生成对抗网络的协作下,实现UAE分布、自然数据分布和分类器学习到的分布之间的对齐。最后,在广泛使用的基准上进行的扎实理论分析和大量实验证明了PUAT的优越性。