Suppr超能文献

基于伪 3D 的时空交互残差网络的视频动作识别。

Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition.

机构信息

College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China.

Institute for Intelligent Elderly Care, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China.

出版信息

Sensors (Basel). 2020 Jun 1;20(11):3126. doi: 10.3390/s20113126.

Abstract

Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve performance, we proposed a novel deep network, named the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP possesses three advantages. First, the STINP consists of two branches constructed based on residual networks (ResNets) to simultaneously learn the spatial and temporal information of the video. Second, the STINP integrates the pseudo3D block into residual units for building the spatial branch, which ensures that the spatial branch can not only learn the appearance feature of the objects and scene in the video, but also capture the potential interaction information among the consecutive frames. Finally, the STINP adopts a simple but effective multiplication operation to fuse the spatial branch and temporal branch, which guarantees that the learned spatial and temporal representation can interact with each other during the entire process of training the STINP. Experiments were implemented on two classic action recognition datasets, UCF101 and HMDB51. The experimental results show that our proposed STINP can provide better performance for video recognition than other state-of-the-art algorithms.

摘要

动作识别是传感器和计算机视觉领域中的一个重要且具有挑战性的课题。双流卷积神经网络(CNN)和 3D CNN 是视频动作识别的两种主流深度学习架构。为了将它们结合到一个框架中以进一步提高性能,我们提出了一种新的深度网络,名为具有伪 3D(STINP)的时空交互残差网络。STINP 具有三个优点。首先,STINP 由两个基于残差网络(ResNets)构建的分支组成,以同时学习视频的空间和时间信息。其次,STINP 将伪 3D 块集成到残差单元中以构建空间分支,这确保了空间分支不仅可以学习视频中物体和场景的外观特征,还可以捕捉连续帧之间的潜在交互信息。最后,STINP 采用简单但有效的乘法运算来融合空间分支和时间分支,这保证了在训练 STINP 的整个过程中,学习到的空间和时间表示可以相互作用。我们在两个经典的动作识别数据集 UCF101 和 HMDB51 上进行了实验。实验结果表明,我们提出的 STINP 可以为视频识别提供比其他最先进算法更好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5374/7308980/1db1bf43718d/sensors-20-03126-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验