• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度时态对比学习:视频中暗部增强与动作识别的联合优化

DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos.

作者信息

Tu Zhigang, Liu Yuanzhong, Zhang Yan, Mu Qizi, Yuan Junsong

出版信息

IEEE Trans Image Process. 2023;32:3507-3520. doi: 10.1109/TIP.2023.3286254. Epub 2023 Jun 23.

DOI:10.1109/TIP.2023.3286254
PMID:37335800
Abstract

Recognizing human actions in dark videos is a useful yet challenging visual task in reality. Existing augmentation-based methods separate action recognition and dark enhancement in a two-stage pipeline, which leads to inconsistently learning of temporal representation for action recognition. To address this issue, we propose a novel end-to-end framework termed Dark Temporal Consistency Model (DTCM), which is able to jointly optimize dark enhancement and action recognition, and force the temporal consistency to guide downstream dark feature learning. Specifically, DTCM cascades the action classification head with the dark augmentation network to perform dark video action recognition in a one-stage pipeline. Our explored spatio-temporal consistency loss, which utilizes the RGB-Difference of dark video frames to encourage temporal coherence of the enhanced video frames, is effective for boosting spatio-temporal representation learning. Extensive experiments demonstrated that our DTCM has remarkable performance: 1) Competitive accuracy, which outperforms the state-of-the-arts on the ARID dataset by 2.32% and the UAVHuman-Fisheye dataset by 4.19% in accuracy, respectively; 2) High efficiency, which surpasses the current most advanced method (Chen et al., 2021) with only 6.4% GFLOPs and 71.3% number of parameters; 3) Strong generalization, which can be used in various action recognition methods (e.g., TSM, I3D, 3D-ResNext-101, Video-Swin) to promote their performance significantly.

摘要

在现实中,识别黑暗视频中的人类行为是一项有用但具有挑战性的视觉任务。现有的基于增强的方法在两阶段管道中分离了动作识别和黑暗增强,这导致动作识别的时间表征学习不一致。为了解决这个问题,我们提出了一种新颖的端到端框架,称为黑暗时间一致性模型(DTCM),它能够联合优化黑暗增强和动作识别,并强制时间一致性来指导下游黑暗特征学习。具体来说,DTCM将动作分类头与黑暗增强网络级联,以在单阶段管道中执行黑暗视频动作识别。我们探索的时空一致性损失利用黑暗视频帧的RGB差异来鼓励增强视频帧的时间连贯性,对于提升时空表征学习是有效的。大量实验表明,我们的DTCM具有显著性能:1)具有竞争力的准确率,在ARID数据集上的准确率分别比最先进方法高出2.32%,在UAVHuman-Fisheye数据集上高出4.19%;2)高效性,仅用6.4%的GFLOP和71.3%的参数数量就超过了当前最先进的方法(Chen等人,2021);3)强大的泛化能力,可用于各种动作识别方法(如TSM、I3D、3D-ResNext-101、Video-Swin)以显著提升其性能。

相似文献

1
DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos.深度时态对比学习:视频中暗部增强与动作识别的联合优化
IEEE Trans Image Process. 2023;32:3507-3520. doi: 10.1109/TIP.2023.3286254. Epub 2023 Jun 23.
2
Dark-DSAR: Lightweight one-step pipeline for action recognition in dark videos.Dark-DSAR:黑暗视频中动作识别的轻量级一站式流水线。
Neural Netw. 2024 Nov;179:106622. doi: 10.1016/j.neunet.2024.106622. Epub 2024 Aug 8.
3
Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer.基于粒子群优化卷积神经网络变压器的带动力学的视频动作识别协同学习
Sci Rep. 2023 Sep 5;13(1):14624. doi: 10.1038/s41598-023-39744-9.
4
Variable Temporal Length Training for Action Recognition CNNs.用于动作识别卷积神经网络的可变时间长度训练
Sensors (Basel). 2024 May 25;24(11):3403. doi: 10.3390/s24113403.
5
A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset.用于小规模深度视频数据集动作识别的深度序列学习框架。
Sensors (Basel). 2022 Sep 9;22(18):6841. doi: 10.3390/s22186841.
6
Learnable Feature Augmentation Framework for Temporal Action Localization.用于时间动作定位的可学习特征增强框架
IEEE Trans Image Process. 2024;33:4002-4015. doi: 10.1109/TIP.2024.3413599. Epub 2024 Jun 28.
7
Spatio-Temporal Representation of an Electoencephalogram for Emotion Recognition Using a Three-Dimensional Convolutional Neural Network.使用三维卷积神经网络进行情感识别的脑电图的时空表示。
Sensors (Basel). 2020 Jun 20;20(12):3491. doi: 10.3390/s20123491.
8
MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.MEST:一种具有运动编码器和时空模块的动作识别网络。
Sensors (Basel). 2022 Sep 1;22(17):6595. doi: 10.3390/s22176595.
9
DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.DANet:用于视频动作识别的半监督差异化辅助引导网络。
Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.
10
FineTea: A Novel Fine-Grained Action Recognition Video Dataset for Tea Ceremony Actions.FineTea:一个用于茶道动作的新型细粒度动作识别视频数据集。
J Imaging. 2024 Aug 31;10(9):216. doi: 10.3390/jimaging10090216.

引用本文的文献

1
Low-Light Image and Video Enhancement for More Robust Computer Vision Tasks: A Review.用于更强大计算机视觉任务的低光图像和视频增强:综述
J Imaging. 2025 Apr 21;11(4):125. doi: 10.3390/jimaging11040125.
2
The analysis of dance teaching system in deep residual network fusing gated recurrent unit based on artificial intelligence.基于人工智能的融合门控循环单元的深度残差网络中的舞蹈教学系统分析
Sci Rep. 2025 Jan 8;15(1):1305. doi: 10.1038/s41598-025-85407-2.
3
A Survey on 3D Skeleton-Based Action Recognition Using Learning Method.基于学习方法的三维骨骼动作识别研究
Cyborg Bionic Syst. 2024 May 16;5:0100. doi: 10.34133/cbsystems.0100. eCollection 2024.
4
ASMNet: Action and Style-Conditioned Motion Generative Network for 3D Human Motion Generation.ASMNet:用于3D人体运动生成的动作与风格条件运动生成网络
Cyborg Bionic Syst. 2024 Feb 6;5:0090. doi: 10.34133/cbsystems.0090. eCollection 2024.