Mu Hua, Li Chenggang, Peng Anjie, Wang Yangyang, Liang Zhenyu
College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China.
The First People's Hospital of Guangyuan, Guangyuan 628017, China.
Sensors (Basel). 2025 Mar 12;25(6):1770. doi: 10.3390/s25061770.
The threat posed by adversarial examples (AEs) to deep learning applications has garnered significant attention from the academic community. In response, various defense strategies have been proposed, including adversarial example detection. A range of detection algorithms has been developed to differentiate between benign samples and adversarial examples. However, the detection accuracy of these algorithms is significantly influenced by the characteristics of the adversarial attacks, such as attack type and intensity. Furthermore, the impact of image preprocessing on detection robustness-a common step before adversarial example generation-has been largely overlooked in prior research. To address these challenges, this paper introduces a novel adversarial example detection algorithm based on high-level feature differences (HFDs), which is specifically designed to improve robustness against both attacks and preprocessing operations. For each test image, a counterpart image with the same predicted label is randomly selected from the training dataset. The high-level features of both images are extracted using an encoder and compared through a similarity measurement model. If the feature similarity is low, the test image is classified as an adversarial example. The proposed method was evaluated for detection accuracy against four comparison methods, showing significant improvements over FS, DF, and MD, with a performance comparable to ESRM. Therefore, the subsequent robustness experiments focused exclusively on ESRM. Our results demonstrate that the proposed method exhibits superior robustness against preprocessing operations, such as downsampling and common corruptions, applied by attackers before generating adversarial examples. It is also applicable to various target models. By exploiting semantic conflicts in high-level features between clean and adversarial examples with the same predicted label, the method achieves high detection accuracy across diverse attack types while maintaining resilience to preprocessing, providing a valuable new perspective in the design of adversarial example detection algorithms.
对抗样本(AEs)对深度学习应用构成的威胁已引起学术界的广泛关注。作为回应,人们提出了各种防御策略,包括对抗样本检测。已经开发了一系列检测算法来区分良性样本和对抗样本。然而,这些算法的检测准确性受到对抗攻击特性的显著影响,例如攻击类型和强度。此外,图像预处理对检测鲁棒性的影响——对抗样本生成之前的一个常见步骤——在先前的研究中很大程度上被忽视了。为了应对这些挑战,本文介绍了一种基于高层特征差异(HFDs)的新型对抗样本检测算法,该算法专门设计用于提高对攻击和预处理操作的鲁棒性。对于每个测试图像,从训练数据集中随机选择一个具有相同预测标签的对应图像。使用编码器提取两个图像的高层特征,并通过相似性测量模型进行比较。如果特征相似度较低,则将测试图像分类为对抗样本。针对四种比较方法对所提出的方法进行了检测准确性评估,结果表明该方法相对于FS、DF和MD有显著改进,性能与ESRM相当。因此,后续的鲁棒性实验仅专注于ESRM。我们的结果表明,所提出的方法在对抗样本生成之前攻击者应用的预处理操作(如下采样和常见损坏)方面表现出卓越的鲁棒性。它也适用于各种目标模型。通过利用具有相同预测标签的干净样本和对抗样本之间高层特征中的语义冲突,该方法在各种攻击类型中都能实现高检测准确性,同时保持对预处理的弹性,为对抗样本检测算法的设计提供了一个有价值的新视角。