Kong Xiangyin, Jiang Xiaoyu, Song Zhihuan, Ge Zhiqiang
IEEE Trans Pattern Anal Mach Intell. 2025 May 21;PP. doi: 10.1109/TPAMI.2025.3572245.
Deep neural networks (DNNs) have achieved satisfactory performance in multiple fields. However, recent studies have shown that DNNs can be easily fooled by adversarial examples. To mitigate the threats caused by adversarial attacks, a highly effective strategy is to design detectors to reject adversarial examples. This article proposes an unsupervised class- and classifier-free adversarial detection method. It only takes unlabeled clean data for training to discriminate illegal samples, and does not require any knowledge about the adversarial examples, sample classes, and the original classifier. More specifically, motivated by the idea that adversarial examples may differ significantly from benign data in terms of sample structural information, we develop an adversarial detector that can simultaneously capture the residual information and the variable-wise structural relationships of data. After that, we design an attribute called data identity (ID) that combines the extracted residual and structural information of data to identify adversarial examples. We validate the superiority of the proposed method through detecting adversarial attacks on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that the performance of our model is the best among various state-of-the-art adversarial detectors. Besides, we also conduct visualization experiments to illustrate the role of structural information in detecting adversarial examples.
深度神经网络(DNN)在多个领域都取得了令人满意的性能。然而,最近的研究表明,DNN很容易被对抗样本所欺骗。为了减轻对抗攻击造成的威胁,一种高效的策略是设计检测器来拒绝对抗样本。本文提出了一种无监督、无类别和无分类器的对抗检测方法。它仅使用未标记的干净数据进行训练以区分非法样本,并且不需要任何关于对抗样本、样本类别和原始分类器的知识。更具体地说,受对抗样本在样本结构信息方面可能与良性数据有显著差异这一想法的启发,我们开发了一种能够同时捕捉数据的残差信息和变量级结构关系的对抗检测器。之后,我们设计了一种名为数据标识(ID)的属性,它结合了提取的数据残差和结构信息来识别对抗样本。我们通过检测对CIFAR-10和ImageNet数据集的对抗攻击来验证所提方法的优越性,实验结果表明,在各种先进的对抗检测器中,我们模型的性能最佳。此外,我们还进行了可视化实验,以说明结构信息在检测对抗样本中的作用。