Chen Qidong, Sun Jun, Palade Vasile
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15210-15221. doi: 10.1109/TNNLS.2023.3283308. Epub 2024 Oct 29.
The goal of textual adversarial attack methods is to replace some words in an input text in order to make the victim model misbehave. This article proposes an effective word-level adversarial attack method based on sememes and an improved quantum-behaved particle swarm optimization (QPSO) algorithm. The sememe-based substitute method, which uses the words sharing the same sememes as the substitutes of the original words, is first employed to form the reduced search space. Then, an improved QPSO algorithm, called historical information-guided QPSO with random drift local attractor (HIQPSO-RD), is proposed to search the reduced search space for adversarial examples. The HIQPSO-RD introduces historical information into the current mean best position of the QPSO, for the purpose of improving the convergence speed of the algorithm, by enhancing its exploration ability and preventing the premature convergence of the swarm. The proposed algorithm uses the random drift local attractor technique to make a good balance between its exploration and exploitation, so that the algorithm can find a better adversarial attack example with low grammaticality and perplexity (PPL). In addition, it employs a two-stage diversity control strategy to enhance the search performance of the algorithm. Experiments on three natural language processing (NLP) datasets, with three commonly used nature language processing models as victim models, show that our method achieves higher attack success rates but lower modification rates than the state-of-the-art adversarial attack methods. Moreover, the results of human evaluations show that adversarial examples generated by our method can better maintain the semantic similarity and grammatical correctness of the original input.
文本对抗攻击方法的目标是替换输入文本中的一些单词,以使受攻击模型表现异常。本文提出了一种基于义原的有效的词级对抗攻击方法以及一种改进的量子行为粒子群优化(QPSO)算法。基于义原的替换方法首先被用于形成缩小的搜索空间,该方法使用与原词具有相同义原的词作为原词的替换词。然后,提出了一种改进的QPSO算法,称为具有随机漂移局部吸引子的历史信息引导QPSO(HIQPSO-RD),用于在缩小的搜索空间中搜索对抗样本。HIQPSO-RD将历史信息引入到QPSO的当前平均最佳位置,以提高算法的收敛速度,通过增强其探索能力并防止群体过早收敛。所提出的算法使用随机漂移局部吸引子技术在其探索和利用之间取得良好平衡,以便算法能够找到具有低语法性和困惑度(PPL)的更好的对抗攻击样本。此外,它采用两阶段多样性控制策略来提高算法的搜索性能。在三个自然语言处理(NLP)数据集上进行的实验,以三个常用的自然语言处理模型作为受攻击模型,结果表明我们的方法比现有最先进的对抗攻击方法实现了更高的攻击成功率但更低的修改率。此外,人工评估结果表明,我们的方法生成的对抗样本能够更好地保持原始输入的语义相似性和语法正确性。