• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视频序列中的时空冗余减少:一种基于运动的熵驱动注意力方法。

Temporal-Spatial Redundancy Reduction in Video Sequences: A Motion-Based Entropy-Driven Attention Approach.

作者信息

Yuan Ye, Wu Baolei, Mo Zifan, Liu Weiye, Hong Ji, Li Zongdao, Liu Jian, Liu Na

机构信息

Institute of Machine Intelligence, University of Shanghai for Science and Technology, Shanghai 200093, China.

School of Automation and Electronic Information, Xiangtan University, Xiangtan 411105, China.

出版信息

Biomimetics (Basel). 2025 Mar 21;10(4):192. doi: 10.3390/biomimetics10040192.

DOI:10.3390/biomimetics10040192
PMID:40277591
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12025262/
Abstract

The existence of redundant video frames results in a substantial waste of computational resources during video-understanding tasks. Frame sampling is a crucial technique in improving resource utilization. However, existing sampling strategies typically adopt fixed-frame selection, which lacks flexibility in handling different action categories. In this paper, inspired by the neural mechanism of the human visual pathway, we propose an effective and interpretable frame-sampling method called Entropy-Guided Motion Enhancement Sampling (EGMESampler), which can remove redundant spatio-temporal information in videos. Our fundamental motivation is that motion information is an important signal that drives us to adaptively select frames from videos. Thus, we first perform motion modeling in EGMESampler to extract motion information from irrelevant backgrounds. Then, we design an entropy-based dynamic sampling strategy based on motion information to ensure that the sampled frames can cover important information in videos. Finally, we perform attention operations on the motion information and sampled frames to enhance the motion expression of the sampled frames and remove redundant spatial background information. Our EGMESampler can be embedded in existing video processing algorithms, and experiments on five benchmark datasets demonstrate its effectiveness compared to previous fixed-sampling strategies, as well as its generalizability across different video models and datasets.

摘要

冗余视频帧的存在会在视频理解任务中导致计算资源的大量浪费。帧采样是提高资源利用率的一项关键技术。然而,现有的采样策略通常采用固定帧选择,在处理不同动作类别时缺乏灵活性。在本文中,受人类视觉通路神经机制的启发,我们提出了一种有效且可解释的帧采样方法,称为熵引导运动增强采样(EGMESampler),它可以去除视频中冗余的时空信息。我们的基本动机是,运动信息是驱动我们从视频中自适应选择帧的重要信号。因此,我们首先在EGMESampler中进行运动建模,以从无关背景中提取运动信息。然后,我们基于运动信息设计了一种基于熵的动态采样策略,以确保采样帧能够覆盖视频中的重要信息。最后,我们对运动信息和采样帧进行注意力操作,以增强采样帧的运动表达并去除冗余的空间背景信息。我们的EGMESampler可以嵌入到现有的视频处理算法中,在五个基准数据集上的实验表明,与以前的固定采样策略相比,它是有效的,并且在不同的视频模型和数据集上具有通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/2c7f00866d35/biomimetics-10-00192-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/4bbdc6f98e49/biomimetics-10-00192-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/aa0cf4a8fcc3/biomimetics-10-00192-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/2c7f00866d35/biomimetics-10-00192-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/4bbdc6f98e49/biomimetics-10-00192-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/aa0cf4a8fcc3/biomimetics-10-00192-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4c/12025262/2c7f00866d35/biomimetics-10-00192-g003.jpg

相似文献

1
Temporal-Spatial Redundancy Reduction in Video Sequences: A Motion-Based Entropy-Driven Attention Approach.视频序列中的时空冗余减少:一种基于运动的熵驱动注意力方法。
Biomimetics (Basel). 2025 Mar 21;10(4):192. doi: 10.3390/biomimetics10040192.
2
Video Summarization Based on Mutual Information and Entropy Sliding Window Method.基于互信息和熵滑动窗口法的视频摘要
Entropy (Basel). 2020 Nov 12;22(11):1285. doi: 10.3390/e22111285.
3
DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.DSTAN:一种具有双向序列特征细化的可变形时空注意力网络,用于去除甲状腺超声视频中的斑点噪声。
J Imaging Inform Med. 2024 Dec;37(6):3264-3281. doi: 10.1007/s10278-023-00935-5. Epub 2024 Jun 5.
4
Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer.基于时空协同注意力Transformer的无监督低光照视频增强
IEEE Trans Image Process. 2023;32:4701-4715. doi: 10.1109/TIP.2023.3301332. Epub 2023 Aug 16.
5
Full-frame video stabilization with motion inpainting.采用运动修复技术的全帧视频稳定
IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1150-63. doi: 10.1109/TPAMI.2006.141.
6
Study of Spatio-Temporal Modeling in Video Quality Assessment.视频质量评估中的时空建模研究
IEEE Trans Image Process. 2023;32:2693-2702. doi: 10.1109/TIP.2023.3272480. Epub 2023 May 16.
7
MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.MEST:一种具有运动编码器和时空模块的动作识别网络。
Sensors (Basel). 2022 Sep 1;22(17):6595. doi: 10.3390/s22176595.
8
Video Summarization for Sign Languages Using the Median of Entropy of Mean Frames Method.使用平均帧熵中位数方法的手语视频摘要
Entropy (Basel). 2018 Sep 29;20(10):748. doi: 10.3390/e20100748.
9
Key frame extraction method for lecture videos based on spatio-temporal subtitles.基于时空字幕的讲座视频关键帧提取方法
Multimed Tools Appl. 2023 Jun 2:1-14. doi: 10.1007/s11042-023-15829-5.
10
Action recognition using attention-based spatio-temporal VLAD networks and adaptive video sequences optimization.基于注意力的时空VLAD网络与自适应视频序列优化的动作识别
Sci Rep. 2024 Oct 31;14(1):26202. doi: 10.1038/s41598-024-75640-6.

本文引用的文献

1
BQN: Busy-Quiet Net Enabled by Motion Band-Pass Module for Action Recognition.BQN:基于运动带通模块实现动作识别的忙碌-安静网络
IEEE Trans Image Process. 2022;31:4966-4979. doi: 10.1109/TIP.2022.3189810. Epub 2022 Aug 1.
2
Motion-Driven Visual Tempo Learning for Video-Based Action Recognition.基于运动驱动的视觉节奏学习的视频动作识别。
IEEE Trans Image Process. 2022;31:4104-4116. doi: 10.1109/TIP.2022.3180585. Epub 2022 Jun 20.
3
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.NetVLAD:用于弱监督场景识别的卷积神经网络架构。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1437-1451. doi: 10.1109/TPAMI.2017.2711011. Epub 2017 Jun 1.
4
High-throughput classification of clinical populations from natural viewing eye movements.从自然观看眼动中高通量分类临床人群。
J Neurol. 2013 Jan;260(1):275-84. doi: 10.1007/s00415-012-6631-2. Epub 2012 Aug 25.
5
Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study.人类模型在视觉显著性建模中的一致性定量分析:一项比较研究。
IEEE Trans Image Process. 2013 Jan;22(1):55-69. doi: 10.1109/TIP.2012.2210727. Epub 2012 Jul 30.