Yao Xin, Li Enlang, Chen Yimin, Guo Jiawei, Huang Kecheng, Tang Fengxiao, Zhao Ming
School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.
School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.
Neural Netw. 2025 Nov;191:107829. doi: 10.1016/j.neunet.2025.107829. Epub 2025 Jul 3.
Massive videos are released every day particularly through video-focused social media apps like TikTok. This trend has fostered the quick emergence of video retrieval systems, which provide video retrieval services using machine learning techniques. Adversarial example (AE) attacks have been shown to be effective on such systems by perturbing an unaltered video subtly to induce false retrieval results. Such AE attacks can be easily detected because the adversarial perturbations are all over pixels and frames. In this paper, we propose DUO, a stealthy targeted black-box AE attack which uses DUal search Over frame-pixel to generate sparse perturbations and improve stealthiness and query efficiency. DUO is driven by three observations: only "key video frames" decide the model predictions, and different pixels and frames contribute far differently to AEs, and pixels in a frame exhibit locality. Subsequently we propose two AE attacks: DUO featuring pixel sparsity and DUO featuring group sparsity. Our sequential attack pipeline consists of two components, i.e., SparseTransfer and SparseQuery. In effect, DUO utilizes SparseTransfer to generate initial perturbations and then SparseQuery to further rectify them. Meanwhile, DUO focuses on individual pixels, whereas DUO targets groups of pixels. Extensive evaluations on two popular datasets confirm the improved stealthiness and efficacy of DUO over existing AE attacks on video retrieval systems. Particularly, DUO can achieve higher precision while significantly reducing adversarial perturbations by more than ×100 than state-of-the-art, and DUO is with more than ×10 fewer queries.
每天都会发布大量视频,尤其是通过TikTok等专注于视频的社交媒体应用程序。这种趋势推动了视频检索系统的迅速出现,这些系统使用机器学习技术提供视频检索服务。对抗样本(AE)攻击已被证明对此类系统有效,通过巧妙地扰动未改变的视频来诱导错误的检索结果。这种AE攻击很容易被检测到,因为对抗性扰动遍布像素和帧。在本文中,我们提出了DUO,一种隐秘的有针对性的黑盒AE攻击,它使用帧像素双重搜索来生成稀疏扰动,并提高隐秘性和查询效率。DUO基于三个观察结果:只有“关键视频帧”决定模型预测,不同的像素和帧对AE的贡献差异很大,并且帧中的像素具有局部性。随后,我们提出了两种AE攻击:具有像素稀疏性的DUO和具有组稀疏性的DUO。我们的顺序攻击管道由两个组件组成,即SparseTransfer和SparseQuery。实际上,DUO利用SparseTransfer生成初始扰动,然后利用SparseQuery进一步修正它们。同时,DUO关注单个像素,而DUO针对像素组。对两个流行数据集的广泛评估证实了DUO相对于视频检索系统上现有AE攻击的隐秘性和有效性有所提高。特别是,DUO可以实现更高的精度,同时将对抗性扰动显著减少超过100倍,比现有技术水平少10倍以上的查询次数。