Suppr超能文献

深度时态对比学习:视频中暗部增强与动作识别的联合优化

DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos.

作者信息

Tu Zhigang, Liu Yuanzhong, Zhang Yan, Mu Qizi, Yuan Junsong

出版信息

IEEE Trans Image Process. 2023;32:3507-3520. doi: 10.1109/TIP.2023.3286254. Epub 2023 Jun 23.

Abstract

Recognizing human actions in dark videos is a useful yet challenging visual task in reality. Existing augmentation-based methods separate action recognition and dark enhancement in a two-stage pipeline, which leads to inconsistently learning of temporal representation for action recognition. To address this issue, we propose a novel end-to-end framework termed Dark Temporal Consistency Model (DTCM), which is able to jointly optimize dark enhancement and action recognition, and force the temporal consistency to guide downstream dark feature learning. Specifically, DTCM cascades the action classification head with the dark augmentation network to perform dark video action recognition in a one-stage pipeline. Our explored spatio-temporal consistency loss, which utilizes the RGB-Difference of dark video frames to encourage temporal coherence of the enhanced video frames, is effective for boosting spatio-temporal representation learning. Extensive experiments demonstrated that our DTCM has remarkable performance: 1) Competitive accuracy, which outperforms the state-of-the-arts on the ARID dataset by 2.32% and the UAVHuman-Fisheye dataset by 4.19% in accuracy, respectively; 2) High efficiency, which surpasses the current most advanced method (Chen et al., 2021) with only 6.4% GFLOPs and 71.3% number of parameters; 3) Strong generalization, which can be used in various action recognition methods (e.g., TSM, I3D, 3D-ResNext-101, Video-Swin) to promote their performance significantly.

摘要

在现实中,识别黑暗视频中的人类行为是一项有用但具有挑战性的视觉任务。现有的基于增强的方法在两阶段管道中分离了动作识别和黑暗增强,这导致动作识别的时间表征学习不一致。为了解决这个问题,我们提出了一种新颖的端到端框架,称为黑暗时间一致性模型(DTCM),它能够联合优化黑暗增强和动作识别,并强制时间一致性来指导下游黑暗特征学习。具体来说,DTCM将动作分类头与黑暗增强网络级联,以在单阶段管道中执行黑暗视频动作识别。我们探索的时空一致性损失利用黑暗视频帧的RGB差异来鼓励增强视频帧的时间连贯性,对于提升时空表征学习是有效的。大量实验表明,我们的DTCM具有显著性能:1)具有竞争力的准确率,在ARID数据集上的准确率分别比最先进方法高出2.32%,在UAVHuman-Fisheye数据集上高出4.19%;2)高效性,仅用6.4%的GFLOP和71.3%的参数数量就超过了当前最先进的方法(Chen等人,2021);3)强大的泛化能力,可用于各种动作识别方法(如TSM、I3D、3D-ResNext-101、Video-Swin)以显著提升其性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验