Suppr超能文献

通过对视频的对抗性时空聚焦进行高效鲁棒性评估

Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos.

作者信息

Wei Xingxing, Wang Songping, Yan Huanqian

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10898-10912. doi: 10.1109/TPAMI.2023.3262592. Epub 2023 Aug 7.

Abstract

Adversarial robustness assessment for video recognition models has raised concerns owing to their wide applications on safety-critical tasks. Compared with images, videos have much high dimension, which brings huge computational costs when generating adversarial videos. This is especially serious for the query-based black-box attacks where gradient estimation for the threat models is usually utilized, and high dimensions will lead to a large number of queries. To mitigate this issue, we propose to simultaneously eliminate the temporal and spatial redundancy within the video to achieve an effective and efficient gradient estimation on the reduced searching space, and thus query number could decrease. To implement this idea, we design the novel Adversarial spatial-temporal Focus (AstFocus) attack on videos, which performs attacks on the simultaneously focused key frames and key regions from the inter-frames and intra-frames in the video. AstFocus attack is based on the cooperative Multi-Agent Reinforcement Learning (MARL) framework. One agent is responsible for selecting key frames, and another agent is responsible for selecting key regions. These two agents are jointly trained by the common rewards received from the black-box threat models to perform a cooperative prediction. By continuously querying, the reduced searching space composed of key frames and key regions is becoming precise, and the whole query number becomes less than that on the original video. Extensive experiments on four mainstream video recognition models and three widely used action recognition datasets demonstrate that the proposed AstFocus attack outperforms the SOTA methods, which is prevenient in fooling rate, query number, time, and perturbation magnitude at the same time.

摘要

由于视频识别模型在安全关键任务中的广泛应用,其对抗鲁棒性评估引发了人们的关注。与图像相比,视频具有更高的维度,这在生成对抗性视频时会带来巨大的计算成本。对于基于查询的黑盒攻击来说,这一问题尤为严重,因为通常需要对威胁模型进行梯度估计,而高维度会导致大量的查询。为了缓解这一问题,我们建议同时消除视频中的时间和空间冗余,以便在缩小的搜索空间上实现有效且高效的梯度估计,从而减少查询次数。为了实现这一想法,我们设计了新颖的针对视频的对抗性时空聚焦(AstFocus)攻击,该攻击对视频帧间和帧内同时聚焦的关键帧和关键区域进行攻击。AstFocus攻击基于协作多智能体强化学习(MARL)框架。一个智能体负责选择关键帧,另一个智能体负责选择关键区域。这两个智能体通过从黑盒威胁模型获得的共同奖励进行联合训练,以执行协作预测。通过不断查询,由关键帧和关键区域组成的缩小搜索空间变得更加精确,整体查询次数也比原始视频上的查询次数少。在四个主流视频识别模型和三个广泛使用的动作识别数据集上进行的大量实验表明,所提出的AstFocus攻击优于当前最优方法,在愚弄率、查询次数、时间和扰动幅度方面同时具有优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验