Yuan Ye, Wu Baolei, Mo Zifan, Liu Weiye, Hong Ji, Li Zongdao, Liu Jian, Liu Na
Institute of Machine Intelligence, University of Shanghai for Science and Technology, Shanghai 200093, China.
School of Automation and Electronic Information, Xiangtan University, Xiangtan 411105, China.
Biomimetics (Basel). 2025 Mar 21;10(4):192. doi: 10.3390/biomimetics10040192.
The existence of redundant video frames results in a substantial waste of computational resources during video-understanding tasks. Frame sampling is a crucial technique in improving resource utilization. However, existing sampling strategies typically adopt fixed-frame selection, which lacks flexibility in handling different action categories. In this paper, inspired by the neural mechanism of the human visual pathway, we propose an effective and interpretable frame-sampling method called Entropy-Guided Motion Enhancement Sampling (EGMESampler), which can remove redundant spatio-temporal information in videos. Our fundamental motivation is that motion information is an important signal that drives us to adaptively select frames from videos. Thus, we first perform motion modeling in EGMESampler to extract motion information from irrelevant backgrounds. Then, we design an entropy-based dynamic sampling strategy based on motion information to ensure that the sampled frames can cover important information in videos. Finally, we perform attention operations on the motion information and sampled frames to enhance the motion expression of the sampled frames and remove redundant spatial background information. Our EGMESampler can be embedded in existing video processing algorithms, and experiments on five benchmark datasets demonstrate its effectiveness compared to previous fixed-sampling strategies, as well as its generalizability across different video models and datasets.
冗余视频帧的存在会在视频理解任务中导致计算资源的大量浪费。帧采样是提高资源利用率的一项关键技术。然而,现有的采样策略通常采用固定帧选择,在处理不同动作类别时缺乏灵活性。在本文中,受人类视觉通路神经机制的启发,我们提出了一种有效且可解释的帧采样方法,称为熵引导运动增强采样(EGMESampler),它可以去除视频中冗余的时空信息。我们的基本动机是,运动信息是驱动我们从视频中自适应选择帧的重要信号。因此,我们首先在EGMESampler中进行运动建模,以从无关背景中提取运动信息。然后,我们基于运动信息设计了一种基于熵的动态采样策略,以确保采样帧能够覆盖视频中的重要信息。最后,我们对运动信息和采样帧进行注意力操作,以增强采样帧的运动表达并去除冗余的空间背景信息。我们的EGMESampler可以嵌入到现有的视频处理算法中,在五个基准数据集上的实验表明,与以前的固定采样策略相比,它是有效的,并且在不同的视频模型和数据集上具有通用性。