Suppr超能文献

针对自然语言处理模型的具有高查询效率的硬标签对抗攻击。

Hard label adversarial attack with high query efficiency against NLP models.

作者信息

Qiu Shilin, Liu Qihe, Zhou Shijie, Gou Min, Zeng Yi, Zhang Zhun, Wu Zhewei

机构信息

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China.

出版信息

Sci Rep. 2025 Mar 18;15(1):9378. doi: 10.1038/s41598-025-93566-5.

Abstract

Current black-box adversarial attacks have demonstrated significant efficacy in creating adversarial texts against natural language processing models, exposing potential robustness vulnerabilities of these models. However, present attack techniques exhibit inefficiency due to their failure to account for the query counts needed in the adversarial text generation process, causing a disparity between the existing methodology and the practical adversarial attack scenario. To this end, this work proposes a query-efficient hard-label attack method called QEAttack, which leverages the genetic algorithm to produce persuasive and semantically equivalent adversarial texts relying solely on observing the final predicted label output by the victim model. To reduce query counts, a dual-gradient fusion strategy and a locality sensitive hashing based sentence-level semantic clustering strategy are proposed and applied to the crossover and mutation steps, respectively. Extensive experiments and ablation studies are conducted on three victim models with varying architectures across five benchmark datasets. The results demonstrate that QEAttack consistently achieves high attack success rates with significantly reduced query counts, while maintaining or even enhancing the imperceptibility and quality of generated adversarial texts.

摘要

当前的黑盒对抗攻击已在针对自然语言处理模型创建对抗文本方面展现出显著成效,揭示了这些模型潜在的鲁棒性漏洞。然而,现有的攻击技术效率低下,因为它们未能考虑对抗文本生成过程中所需的查询次数,导致现有方法与实际对抗攻击场景之间存在差异。为此,这项工作提出了一种名为QEAttack的查询高效硬标签攻击方法,该方法利用遗传算法仅通过观察受害模型输出的最终预测标签来生成有说服力且语义等效的对抗文本。为了减少查询次数,分别提出了双梯度融合策略和基于局部敏感哈希的句子级语义聚类策略,并将其应用于交叉和变异步骤。在五个基准数据集上对三种具有不同架构的受害模型进行了广泛的实验和消融研究。结果表明,QEAttack始终能以显著减少的查询次数实现高攻击成功率,同时保持甚至提高生成的对抗文本的不可感知性和质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b91/11920284/56c94a434365/41598_2025_93566_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验