用于对抗性净化的对抗引导扩散模型。

Adversarial guided diffusion models for adversarial purification.

作者信息

Lin Guang, Tao Zerui, Zhang Jianhai, Tanaka Toshihisa, Zhao Qibin

机构信息

Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, 184-8588, Tokyo, Japan; RIKEN Center for Advanced Intelligence Project (AIP), 103-0027, Tokyo, Japan.

RIKEN Center for Advanced Intelligence Project (AIP), 103-0027, Tokyo, Japan.

出版信息

Neural Netw. 2025 Nov;191:107705. doi: 10.1016/j.neunet.2025.107705. Epub 2025 Jul 8.

DOI:10.1016/j.neunet.2025.107705

PMID:40644991

Abstract

Diffusion model (DM) based adversarial purification (AP) has proven to be a powerful defense method that can remove adversarial perturbations and generate a purified example without threats. In principle, the pre-trained DMs can only ensure that purified examples conform to the same distribution of the training data, but it may inadvertently compromise the semantic information of input examples, leading to misclassification of purified examples. Recent advancements introduce guided diffusion techniques to preserve semantic information while removing the perturbations. However, these guidances often rely on distance measures between purified examples and diffused examples, which can also preserve perturbations in purified examples. To further unleash the robustness power of DM-based AP, we propose an adversarial guided diffusion model by introducing a novel adversarial guidance that contains sufficient semantic information but does not explicitly involve adversarial perturbations. The guidance is modeled by an auxiliary neural network obtained with adversarial training, considering the distance in the latent representations rather than at the pixel-level values. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method is effective for simultaneously maintaining semantic information and removing the adversarial perturbations. In addition, comprehensive comparisons show that our method significantly enhances the robustness of existing DM-based AP, with an average robust accuracy improved by up to 7.30% on CIFAR-10.

摘要

基于扩散模型（DM）的对抗净化（AP）已被证明是一种强大的防御方法，它可以去除对抗性扰动并生成无威胁的净化示例。原则上，预训练的扩散模型只能确保净化后的示例符合训练数据的相同分布，但它可能会无意中损害输入示例的语义信息，导致净化后的示例被错误分类。最近的进展引入了引导扩散技术，以在去除扰动的同时保留语义信息。然而，这些引导通常依赖于净化示例和扩散示例之间的距离度量，这也可能在净化示例中保留扰动。为了进一步释放基于DM的AP的鲁棒性，我们通过引入一种新颖的对抗引导来提出一种对抗引导扩散模型，该引导包含足够的语义信息但不明确涉及对抗性扰动。该引导由通过对抗训练获得的辅助神经网络建模，考虑的是潜在表示中的距离而不是像素级值。我们在CIFAR-10、CIFAR-100和ImageNet上进行了大量实验，以证明我们的方法对于同时保持语义信息和去除对抗性扰动是有效的。此外，全面的比较表明，我们的方法显著提高了现有基于DM的AP的鲁棒性，在CIFAR-10上平均鲁棒准确率提高了7.30%。