Suppr超能文献

Dark-DSAR:黑暗视频中动作识别的轻量级一站式流水线。

Dark-DSAR: Lightweight one-step pipeline for action recognition in dark videos.

机构信息

The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.

The Department of Pediatrics, Renmin Hospital, Wuhan University, Wuhan, China.

出版信息

Neural Netw. 2024 Nov;179:106622. doi: 10.1016/j.neunet.2024.106622. Epub 2024 Aug 8.

Abstract

Dark video human action recognition has a wide range of applications in the real world. General action recognition methods focus on the actor or the action itself, ignoring the dark scene where the action happens, resulting in unsatisfied accuracy in recognition. For dark scenes, the existing two-step action recognition methods are stage complex due to introducing additional augmentation steps, and the one-step pipeline method is not lightweight enough. To address these issues, a one-step Transformer-based method named Dark Domain Shift for Action Recognition (Dark-DSAR) is proposed in this paper, which integrates the tasks of domain migration and classification into a single step and enhances the model's functional coherence with respect to these two tasks, making our Dark-DSAR has low computation but high accuracy. Specifically, the domain shift module (DSM) achieves domain adaption from dark to bright to reduce the number of parameters and the computational cost. Besides, we explore the matching relationship between the input video size and the model, which can further optimize the inference efficiency by removing the redundant information in videos through spatial resolution dropping. Extensive experiments have been conducted on the datasets of ARID1.5, HMDB51-Dark, and UAV-human-night. Results show that the proposed Dark-DSAR obtains the best Top-1 accuracy on ARID1.5 with 89.49%, which is 2.56% higher than the state-of-the-art method, 67.13% and 61.9% on HMDB51-Dark and UAV-human-night, respectively. In addition, ablation experiments reveal that the action classifiers can gain ≥1% in accuracy compared to the original model when equipped with our DSM.

摘要

黑暗视频人体动作识别在现实世界中有广泛的应用。一般的动作识别方法侧重于演员或动作本身,忽略了动作发生的黑暗场景,导致识别精度不高。对于黑暗场景,现有的两步动作识别方法由于引入了额外的增强步骤而较为复杂,而一步式流水线方法则不够轻量级。针对这些问题,本文提出了一种基于 Transformer 的一步式方法,名为 Dark Domain Shift for Action Recognition(Dark-DSAR),它将域迁移和分类任务集成到一个步骤中,并增强了模型对这两个任务的功能一致性,使我们的 Dark-DSAR 具有低计算量但高精度。具体来说,域迁移模块(DSM)实现了从黑暗到明亮的域自适应,减少了参数数量和计算成本。此外,我们还探索了输入视频大小与模型之间的匹配关系,通过空间分辨率降低去除视频中的冗余信息,进一步优化了推理效率。在 ARID1.5、HMDB51-Dark 和 UAV-human-night 数据集上进行了广泛的实验。结果表明,所提出的 Dark-DSAR 在 ARID1.5 上获得了最佳的 Top-1 准确率 89.49%,比最先进的方法高 2.56%,在 HMDB51-Dark 和 UAV-human-night 上的准确率分别为 67.13%和 61.9%。此外,消融实验表明,当配备我们的 DSM 时,动作分类器的准确率可以比原始模型提高≥1%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验