School of Integrated Technology, Yonsei University, Republic of Korea.
Department of Computer Science, Carnegie Mellon University, USA.
Neural Netw. 2024 Jan;169:388-397. doi: 10.1016/j.neunet.2023.10.033. Epub 2023 Oct 27.
Recently, video-based action recognition methods using convolutional neural networks (CNNs) achieve remarkable recognition performance. However, there is still lack of understanding about the generalization mechanism of action recognition models. In this paper, we suggest that action recognition models rely on the motion information less than expected, and thus they are robust to randomization of frame orders. Furthermore, we find that motion monotonicity remaining after randomization also contributes to such robustness. Based on this observation, we develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. Another observation enabling our defense method is that adversarial perturbations on videos are sensitive to temporal destruction. To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.
最近,基于视频的动作识别方法使用卷积神经网络(CNN)取得了显著的识别性能。然而,对于动作识别模型的泛化机制仍缺乏了解。在本文中,我们提出动作识别模型对运动信息的依赖程度低于预期,因此它们对帧顺序的随机化具有鲁棒性。此外,我们发现随机化后保留的运动单调性也有助于这种鲁棒性。基于这一观察,我们开发了一种新的防御方法,使用输入视频的时间打乱来抵御对抗攻击,以保护动作识别模型。使我们的防御方法成为可能的另一个观察结果是,视频上的对抗性干扰对时间破坏很敏感。据我们所知,这是第一次尝试为基于 3DCNN 的视频动作识别模型设计无需额外训练的防御方法。