Suppr超能文献

基于时空拉普拉斯金字塔的动作识别。

Spatio-temporal Laplacian pyramid coding for action recognition.

出版信息

IEEE Trans Cybern. 2014 Jun;44(6):817-27. doi: 10.1109/TCYB.2013.2273174. Epub 2013 Jul 31.

Abstract

We present a novel descriptor, called spatio-temporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.

摘要

我们提出了一种新的描述符,称为时空拉普拉斯金字塔编码(STLPC),用于整体表示人类动作。与基于检测到的局部兴趣点的稀疏表示不同,STLPC 将视频序列视为整体,直接从中提取时空特征,从而防止稀疏表示中的信息丢失。通过将每个序列分解为一组带通滤波分量,所提出的金字塔模型将特征定位在不同的尺度上,因此能够有效地编码动作的运动信息。为了使特征进一步不变且能够抵抗失真和噪声,对拉普拉斯金字塔的每一级应用一组 3D Gabor 滤波器,然后在滤波器带和时空邻域内进行最大池化。由于卷积和池化是在时空上进行的,因此编码模型可以同时捕获结构和运动信息,并提供动作的信息表示。该方法在 KTH、多视图 IXMAS、具有挑战性的 UCF Sports 和新发布的 HMDB51 数据集上实现了出色的识别率。它优于最先进的方法,显示了其在动作识别方面的巨大潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验