Suppr超能文献

针对查询次数有限的文本分类模型的强隐蔽对抗攻击。

Strongly concealed adversarial attack against text classification models with limited queries.

作者信息

Cheng Yao, Luo Senlin, Wan Yunwei, Pan Limin, Li Xinshuai

机构信息

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, PR China.

出版信息

Neural Netw. 2025 Mar;183:106971. doi: 10.1016/j.neunet.2024.106971. Epub 2024 Nov 30.

Abstract

In black-box scenarios, adversarial attacks against text classification models face challenges in ensuring highly available adversarial samples, especially a high number of invalid queries under long texts. The existing methods select distractors by comparing the confidence vector differences obtained before and after deleting words, and the query increases linearly with the length of the text, making it difficult to apply to attack scenarios with limited queries. Generating adversarial samples based on a thesaurus can lead to semantic inconsistencies and even grammatical errors, making it easy for the target model to recognize adversarial samples and resulting in a low success rate of attacks. A parallel and highly stealthy Adversarial Attack against Text Classification Model (AdATCM) is proposed, which reinforces dual-task of attack and generation. This method does not require querying the target model during the selection of distractors. Instead, it directly uses contextual information to calculate the importance of words and selects distractors in one go, strengthening the concealment of attacks. Integrating KL divergence loss, cross entropy loss, and adversarial loss to construct an objective function for training an adversarial sample attack model, generating adversarial samples that can fit the original sample distribution and strengthen the success rate of attacks. The experimental results show that this method has a high success rate and strong concealment, effectively reducing the number of attack queries under long text conditions.

摘要

在黑箱场景中,针对文本分类模型的对抗攻击在确保高可用性对抗样本方面面临挑战,尤其是在长文本下存在大量无效查询。现有方法通过比较删除单词前后获得的置信度向量差异来选择干扰项,并且查询数量随文本长度线性增加,这使得其难以应用于查询受限的攻击场景。基于同义词库生成对抗样本可能会导致语义不一致甚至语法错误,从而使目标模型很容易识别对抗样本,导致攻击成功率较低。提出了一种针对文本分类模型的并行且高度隐蔽的对抗攻击方法(AdATCM),该方法强化了攻击和生成的双重任务。此方法在选择干扰项时不需要查询目标模型。相反,它直接利用上下文信息计算单词的重要性并一次性选择干扰项,增强了攻击的隐蔽性。整合KL散度损失、交叉熵损失和对抗损失来构建用于训练对抗样本攻击模型的目标函数,生成能够拟合原始样本分布并提高攻击成功率的对抗样本。实验结果表明,该方法具有较高的成功率和较强的隐蔽性,能有效减少长文本条件下的攻击查询数量。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验