Suppr超能文献

基于注意力的具有背景无关运动掩码的时间编码网络用于动作识别。

Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition.

作者信息

Weng Zhengkui, Jin Zhipeng, Chen Shuangxi, Shen Quanquan, Ren Xiangyang, Li Wuzhao

机构信息

Jiaxing Vocational and Technical College, Jiaxing, Zhejiang, China.

Medical 3D Printing Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.

出版信息

Comput Intell Neurosci. 2021 Mar 27;2021:8890808. doi: 10.1155/2021/8890808. eCollection 2021.

Abstract

Convolutional neural network (CNN) has been leaping forward in recent years. However, the high dimensionality, rich human dynamic characteristics, and various kinds of background interference increase difficulty for traditional CNNs in capturing complicated motion data in videos. A novel framework named the attention-based temporal encoding network (ATEN) with background-independent motion mask (BIMM) is proposed to achieve video action recognition here. Initially, we introduce one motion segmenting approach on the basis of boundary prior by associating with the minimal geodesic distance inside a weighted graph that is not directed. Then, we propose one dynamic contrast segmenting strategic procedure for segmenting the object that moves within complicated environments. Subsequently, we build the BIMM for enhancing the object that moves based on the suppression of the not relevant background inside the respective frame. Furthermore, we design one long-range attention system inside ATEN, capable of effectively remedying the dependency of sophisticated actions that are not periodic in a long term based on the more automatic focus on the semantical vital frames other than the equal process for overall sampled frames. For this reason, the attention mechanism is capable of suppressing the temporal redundancy and highlighting the discriminative frames. Lastly, the framework is assessed by using HMDB51 and UCF101 datasets. As revealed from the experimentally achieved results, our ATEN with BIMM gains 94.5% and 70.6% accuracy, respectively, which outperforms a number of existing methods on both datasets.

摘要

近年来,卷积神经网络(CNN)取得了显著进展。然而,高维度、丰富的人类动态特征以及各种背景干扰增加了传统CNN在捕获视频中复杂运动数据时的难度。本文提出了一种名为基于注意力的时间编码网络(ATEN)并带有背景无关运动掩码(BIMM)的新颖框架,以实现视频动作识别。首先,我们基于边界先验引入了一种运动分割方法,该方法通过与加权无向图内的最小测地距离相关联来实现。然后,我们提出了一种动态对比分割策略程序,用于分割在复杂环境中移动的物体。随后,我们构建了BIMM,以通过抑制各帧内不相关的背景来增强移动的物体。此外,我们在ATEN内部设计了一个长程注意力系统,该系统能够基于对语义关键帧的更自动关注,而不是对所有采样帧进行同等处理,有效地弥补长期非周期性复杂动作的依赖性。因此,注意力机制能够抑制时间冗余并突出判别性帧。最后,使用HMDB51和UCF101数据集对该框架进行了评估。从实验结果可以看出,我们带有BIMM的ATEN分别获得了94.5%和70.6%的准确率,在这两个数据集上均优于许多现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7661/8024088/ae7961669f10/CIN2021-8890808.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验