针对查询次数有限的文本分类模型的强隐蔽对抗攻击。

Strongly concealed adversarial attack against text classification models with limited queries.

作者信息

Cheng Yao, Luo Senlin, Wan Yunwei, Pan Limin, Li Xinshuai

机构信息

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, PR China.

出版信息

Neural Netw. 2025 Mar;183:106971. doi: 10.1016/j.neunet.2024.106971. Epub 2024 Nov 30.

DOI:10.1016/j.neunet.2024.106971

PMID:39662200

Abstract

In black-box scenarios, adversarial attacks against text classification models face challenges in ensuring highly available adversarial samples, especially a high number of invalid queries under long texts. The existing methods select distractors by comparing the confidence vector differences obtained before and after deleting words, and the query increases linearly with the length of the text, making it difficult to apply to attack scenarios with limited queries. Generating adversarial samples based on a thesaurus can lead to semantic inconsistencies and even grammatical errors, making it easy for the target model to recognize adversarial samples and resulting in a low success rate of attacks. A parallel and highly stealthy Adversarial Attack against Text Classification Model (AdATCM) is proposed, which reinforces dual-task of attack and generation. This method does not require querying the target model during the selection of distractors. Instead, it directly uses contextual information to calculate the importance of words and selects distractors in one go, strengthening the concealment of attacks. Integrating KL divergence loss, cross entropy loss, and adversarial loss to construct an objective function for training an adversarial sample attack model, generating adversarial samples that can fit the original sample distribution and strengthen the success rate of attacks. The experimental results show that this method has a high success rate and strong concealment, effectively reducing the number of attack queries under long text conditions.

摘要

在黑箱场景中，针对文本分类模型的对抗攻击在确保高可用性对抗样本方面面临挑战，尤其是在长文本下存在大量无效查询。现有方法通过比较删除单词前后获得的置信度向量差异来选择干扰项，并且查询数量随文本长度线性增加，这使得其难以应用于查询受限的攻击场景。基于同义词库生成对抗样本可能会导致语义不一致甚至语法错误，从而使目标模型很容易识别对抗样本，导致攻击成功率较低。提出了一种针对文本分类模型的并行且高度隐蔽的对抗攻击方法（AdATCM），该方法强化了攻击和生成的双重任务。此方法在选择干扰项时不需要查询目标模型。相反，它直接利用上下文信息计算单词的重要性并一次性选择干扰项，增强了攻击的隐蔽性。整合KL散度损失、交叉熵损失和对抗损失来构建用于训练对抗样本攻击模型的目标函数，生成能够拟合原始样本分布并提高攻击成功率的对抗样本。实验结果表明，该方法具有较高的成功率和较强的隐蔽性，能有效减少长文本条件下的攻击查询数量。

相似文献

Strongly concealed adversarial attack against text classification models with limited queries.针对查询次数有限的文本分类模型的强隐蔽对抗攻击。

Neural Netw. 2025 Mar;183:106971. doi: 10.1016/j.neunet.2024.106971. Epub 2024 Nov 30.

HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization.HyGloadAttack：通过混合优化实现的硬标签黑盒文本对抗攻击。

Neural Netw. 2024 Oct;178:106461. doi: 10.1016/j.neunet.2024.106461. Epub 2024 Jun 12.

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet.通用对抗性攻击对注意力的影响及由此产生的数据集 DAmageNet。

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2188-2197. doi: 10.1109/TPAMI.2020.3033291. Epub 2022 Mar 4.

SMGEA: A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories.SMGEA：一种由长期梯度记忆驱动的新型集成对抗攻击。

IEEE Trans Neural Netw Learn Syst. 2022 Mar;33(3):1051-1065. doi: 10.1109/TNNLS.2020.3039295. Epub 2022 Feb 28.

An Optimized Black-Box Adversarial Simulator Attack Based on Meta-Learning.基于元学习的优化黑盒对抗模拟器攻击

Entropy (Basel). 2022 Sep 27;24(10):1377. doi: 10.3390/e24101377.

Adv-BDPM: Adversarial attack based on Boundary Diffusion Probability Model.Adv-BDPM：基于边界扩散概率模型的对抗攻击。

Neural Netw. 2023 Oct;167:730-740. doi: 10.1016/j.neunet.2023.08.048. Epub 2023 Sep 9.

ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers.ABC攻击：一种用于欺骗深度图像分类器的无梯度优化黑盒攻击。

Entropy (Basel). 2022 Mar 15;24(3):412. doi: 10.3390/e24030412.

A Distributed Black-Box Adversarial Attack Based on Multi-Group Particle Swarm Optimization.基于多群组粒子群优化的分布式黑盒对抗攻击。

Sensors (Basel). 2020 Dec 14;20(24):7158. doi: 10.3390/s20247158.

Auto encoder-based defense mechanism against popular adversarial attacks in deep learning.基于自动编码器的深度学习中流行对抗攻击防御机制。

PLoS One. 2024 Oct 21;19(10):e0307363. doi: 10.1371/journal.pone.0307363. eCollection 2024.

Query-Efficient Black-Box Adversarial Attacks Guided by a Transfer-Based Prior.基于迁移先验引导的查询高效黑盒对抗攻击

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9536-9548. doi: 10.1109/TPAMI.2021.3126733. Epub 2022 Nov 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

针对查询次数有限的文本分类模型的强隐蔽对抗攻击。

Strongly concealed adversarial attack against text classification models with limited queries.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献