Wang Shuo, Chen Shangyu, Chen Tianle, Nepal Surya, Rudolph Carsten, Grobler Marthie
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17070-17084. doi: 10.1109/TNNLS.2023.3299408. Epub 2024 Dec 2.
The susceptibility of deep neural networks (DNNs) to adversarial intrusions, exemplified by adversarial examples, is well-documented. Conventional attacks implement unstructured, pixel-wise perturbations to mislead classifiers, which often results in a noticeable departure from natural samples and lacks human-perceptible interpretability. In this work, we present an adversarial attack strategy that implements fine-granularity, semantic-meaning-oriented structural perturbations. Our proposed methodology manipulates the semantic attributes of images through the use of disentangled latent codes. We engineer adversarial perturbations by manipulating either a single latent code or a combination thereof. To this end, we propose two unsupervised semantic manipulation strategies: one based on vector-disentangled representation and the other on feature map-disentangled representation, taking into consideration the complexity of the latent codes and the smoothness of the reconstructed images. Our empirical evaluations, conducted extensively on real-world image data, showcase the potency of our attacks, particularly against black-box classifiers. Furthermore, we establish the existence of a universal semantic adversarial example that is agnostic to specific images.
深度神经网络(DNN)对对抗性入侵的敏感性,以对抗性示例为代表,已有充分记录。传统攻击实施无结构的、逐像素的扰动来误导分类器,这通常会导致与自然样本有明显差异,并且缺乏人类可感知的可解释性。在这项工作中,我们提出了一种对抗性攻击策略,该策略实施细粒度的、面向语义意义的结构扰动。我们提出的方法通过使用解缠的潜在代码来操纵图像的语义属性。我们通过操纵单个潜在代码或其组合来设计对抗性扰动。为此,考虑到潜在代码的复杂性和重建图像的平滑性,我们提出了两种无监督语义操纵策略:一种基于向量解缠表示,另一种基于特征图解缠表示。我们在真实世界图像数据上进行了广泛的实证评估,展示了我们攻击的有效性,特别是针对黑盒分类器。此外,我们确定了存在一种对特定图像不可知的通用语义对抗性示例。