School of Engineering, University of KwaZulu-Natal, Durban 4041, South Africa.
Sensors (Basel). 2020 Nov 9;20(21):6380. doi: 10.3390/s20216380.
The Bag-of-Words (BoW) framework has been widely used in action recognition tasks due to its compact and efficient feature representation. Various modifications have been made to this framework to increase its classification power. This often results in an increased complexity and reduced efficiency. Inspired by the success of image-based scale coded BoW representations, we propose a spatio-temporal scale coded BoW (SC-BoW) for video-based recognition. This involves encoding extracted multi-scale information into BoW representations by partitioning spatio-temporal features into sub-groups based on the spatial scale from which they were extracted. We evaluate SC-BoW in two experimental setups. We first present a general pipeline to perform real-time action recognition with SC-BoW. Secondly, we apply SC-BoW onto the popular Dense Trajectory feature set. Results showed SC-BoW representations to successfully improve performance by 2-7% with low added computational cost. Notably, SC-BoW on Dense Trajectories outperformed more complex deep learning approaches. Thus, scale coding is a low-cost and low-level encoding scheme that increases classification power of the standard BoW without compromising efficiency.
词袋(BoW)框架由于其紧凑高效的特征表示,已被广泛应用于动作识别任务中。为了提高其分类能力,对该框架进行了各种修改。这通常会导致复杂性增加和效率降低。受基于图像的尺度编码 BoW 表示的成功启发,我们提出了一种基于视频识别的时空尺度编码 BoW(SC-BoW)。这涉及通过根据提取时空特征的空间尺度将其划分为子组,将提取的多尺度信息编码到 BoW 表示中。我们在两个实验设置中评估了 SC-BoW。首先,我们提出了一个通用的管道,使用 SC-BoW 进行实时动作识别。其次,我们将 SC-BoW 应用于流行的密集轨迹特征集。结果表明,SC-BoW 表示在不影响效率的情况下,成功地提高了 2-7%的性能。值得注意的是,SC-BoW 在密集轨迹上的表现优于更复杂的深度学习方法。因此,尺度编码是一种低成本、低层次的编码方案,它可以在不影响效率的情况下提高标准 BoW 的分类能力。