Yu Zishun, Kang Siteng, Zhang Xinhua
Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA.
Uncertain Artif Intell. 2024 Jul;2024.
Offline-to-online reinforcement learning has recently been shown effective in reducing the online sample complexity by first training from offline collected data. However, this additional data source may also invite new poisoning attacks that target offline training. In this work, we reveal such vulnerabilities in offline RL by proposing a novel data poisoning attack method, which is stealthy in the sense that the performance during the offline training remains intact, but the online fine-tuning stage will suffer a significant performance drop. Our method leverages the techniques from bi-level optimization to promote the over-estimation/distribution shift under offline-to-online reinforcement learning. Experiments on four environments confirm the satisfaction of the new stealthiness requirement, and can be effective in attacking with only a small budget and without having white-box access to the victim model.
离线到在线强化学习最近被证明通过首先从离线收集的数据进行训练,在降低在线样本复杂性方面是有效的。然而,这个额外的数据源也可能引发针对离线训练的新的中毒攻击。在这项工作中,我们通过提出一种新颖的数据中毒攻击方法,揭示了离线强化学习中的此类漏洞,这种攻击方法具有隐蔽性,即离线训练期间的性能保持不变,但在线微调阶段将遭受显著的性能下降。我们的方法利用双层优化技术来促进离线到在线强化学习下的过度估计/分布转移。在四个环境上的实验证实了新的隐蔽性要求得到满足,并且仅需少量预算且无需对受害模型进行白盒访问就能有效攻击。