Wang Siyu, Cao Yuanjiang, Chen Xiaocong, Yao Lina, Wang Xianzhi, Sheng Quan Z
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia.
School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia.
Front Big Data. 2022 May 3;5:822783. doi: 10.3389/fdata.2022.822783. eCollection 2022.
Adversarial attacks, e.g., adversarial perturbations of the input and adversarial samples, pose significant challenges to machine learning and deep learning techniques, including interactive recommendation systems. The latent embedding space of those techniques makes adversarial attacks challenging to detect at an early stage. Recent advance in causality shows that counterfactual can also be considered one of the ways to generate the adversarial samples drawn from different distribution as the training samples. We propose to explore adversarial examples and attack agnostic detection on reinforcement learning (RL)-based interactive recommendation systems. We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors. Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data. Finally, we study the attack strength and frequency of adversarial examples and evaluate our model on standard datasets with multiple crafting methods. Our extensive experiments show that most adversarial attacks are effective, and both attack strength and attack frequency impact the attack performance. The strategically-timed attack achieves comparative attack performance with only 1/3 to 1/2 attack frequency. Besides, our white-box detector trained with one crafting method has the generalization ability over several other crafting methods.
对抗攻击,例如对输入的对抗性扰动和对抗样本,给包括交互式推荐系统在内的机器学习和深度学习技术带来了重大挑战。这些技术的潜在嵌入空间使得对抗攻击在早期阶段难以检测。因果关系方面的最新进展表明,反事实也可被视为生成与训练样本来自不同分布的对抗样本的一种方式。我们建议探索基于强化学习(RL)的交互式推荐系统中的对抗样本和与攻击无关的检测方法。我们首先通过对输入添加扰动并干预因果因素来精心构造不同类型的对抗样本。然后,我们基于精心构造的数据,通过使用基于深度学习的分类器检测潜在攻击来增强推荐系统。最后,我们研究对抗样本的攻击强度和频率,并使用多种构造方法在标准数据集上评估我们的模型。我们广泛的实验表明,大多数对抗攻击是有效的,并且攻击强度和攻击频率都会影响攻击性能。策略性定时攻击仅以1/3到1/2的攻击频率就能实现相当的攻击性能。此外,我们用一种构造方法训练的白盒检测器对其他几种构造方法具有泛化能力。