Nakajima Tasuku, Maeda Keisuke, Togo Ren, Ogawa Takahiro, Haseyama Miki
Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Japan.
Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Japan.
Sensors (Basel). 2025 Apr 25;25(9):2736. doi: 10.3390/s25092736.
Adversarial attacks on large-scale vision-language foundation models, such as the contrastive language-image pretraining (CLIP) model, can significantly degrade performance across various tasks by generating adversarial examples that are indistinguishable from the original images to human perception. Although adversarial training methods, which train models with adversarial examples, have been proposed to defend against such attacks, they typically require prior knowledge of the attack. These methods also lead to a trade-off between robustness to adversarial examples and accuracy for clean images. To address these challenges, we propose an adversarial defense method based on human brain activity data by hypothesizing that such adversarial examples are not misrecognized by humans. The proposed method employs an encoder that integrates the features of brain activity and augmented images from the original images. Then, by maximizing the similarity between features predicted by the encoder and the original visual features, we obtain features with the visual invariance of the human brain and the diversity of data augmentation. Consequently, we construct a model that is robust against adversarial attacks and maintains accuracy for clean images. Unlike existing methods, the proposed method is not trained on any specific adversarial attack information; thus, it is robust against unknown attacks. Extensive experiments demonstrate that the proposed method significantly enhances robustness to adversarial attacks on the CLIP model without degrading accuracy for clean images. The primary contribution of this study is that the performance trade-off can be overcome using brain activity data.
对大规模视觉语言基础模型(如对比语言-图像预训练(CLIP)模型)的对抗攻击,通过生成人类感知上与原始图像无法区分的对抗样本,可显著降低各种任务的性能。尽管已提出使用对抗样本训练模型的对抗训练方法来抵御此类攻击,但它们通常需要攻击的先验知识。这些方法还会导致在对抗样本的鲁棒性和干净图像的准确性之间进行权衡。为应对这些挑战,我们通过假设此类对抗样本不会被人类误识别,提出了一种基于人类大脑活动数据的对抗防御方法。所提出的方法采用了一个编码器,该编码器整合了大脑活动特征和从原始图像增强后的图像特征。然后,通过最大化编码器预测的特征与原始视觉特征之间的相似度,我们获得了具有人类大脑视觉不变性和数据增强多样性的特征。因此,我们构建了一个对对抗攻击具有鲁棒性且保持干净图像准确性的模型。与现有方法不同,所提出的方法不是基于任何特定的对抗攻击信息进行训练的;因此,它对未知攻击具有鲁棒性。广泛的实验表明,所提出的方法显著增强了CLIP模型对对抗攻击的鲁棒性,同时不会降低干净图像的准确性。本研究的主要贡献在于,可以利用大脑活动数据克服性能权衡问题。