Jia Xiaojun, Zhang Yong, Wei Xingxing, Wu Baoyuan, Ma Ke, Wang Jue, Cao Xiaochun
IEEE Trans Pattern Anal Mach Intell. 2024 Sep;46(9):6367-6383. doi: 10.1109/TPAMI.2024.3381180. Epub 2024 Aug 6.
Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces robustness after a few training epochs. Although various FAT variants have been proposed to prevent overfitting, they require high training time. In this paper, we investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and FAT. We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse. Based on this observation, we propose a positive prior-guided adversarial initialization to prevent overfitting by improving adversarial example quality without extra training time. This initialization is generated by using high-quality adversarial perturbations from the historical training process. We provide theoretical analysis for the proposed initialization and propose a prior-guided regularization method that boosts the smoothness of the loss function. Additionally, we design a prior-guided ensemble FAT method that averages the different model weights of historical models using different decay rates. Our proposed method, called FGSM-PGK, assembles the prior-guided knowledge, i.e., the prior-guided initialization and model weights, acquired during the historical training process. The proposed method can effectively improve the model's adversarial robustness in white-box attack scenarios. Evaluations of four datasets demonstrate the superiority of the proposed method.
快速对抗训练(FAT)是一种在白盒攻击场景中提高鲁棒性的有效方法。然而,原始的FAT存在灾难性过拟合问题,即在几个训练轮次后,鲁棒性会急剧且突然下降。尽管已经提出了各种FAT变体来防止过拟合,但它们需要很长的训练时间。在本文中,我们通过比较标准对抗训练和FAT的训练过程,研究对抗样本质量与灾难性过拟合之间的关系。我们发现,当对抗样本的攻击成功率变差时,就会发生灾难性过拟合。基于这一观察结果,我们提出了一种正先验引导的对抗初始化方法,通过在不增加额外训练时间的情况下提高对抗样本质量来防止过拟合。这种初始化是通过使用来自历史训练过程的高质量对抗扰动生成的。我们对所提出的初始化方法进行了理论分析,并提出了一种先验引导的正则化方法,该方法可以提高损失函数的平滑度。此外,我们设计了一种先验引导的集成FAT方法,该方法使用不同的衰减率对历史模型的不同模型权重进行平均。我们提出的方法称为FGSM-PGK,它整合了在历史训练过程中获得的先验引导知识,即先验引导的初始化和模型权重。该方法可以有效地提高模型在白盒攻击场景中的对抗鲁棒性。对四个数据集的评估证明了所提出方法的优越性。