IEEE Trans Cybern. 2016 Sep;46(9):2156-65. doi: 10.1109/TCYB.2015.2466692. Epub 2015 Aug 20.
Dynamic scene classification started drawing an increasing amount of research efforts recently. While existing arts mainly rely on low-level features, little work addresses the need of exploring the rich spatial layout information in dynamic scene. Motivated by the fact that dynamic scenes are characterized by both dynamic and static parts with spatial layout priors, we propose to use redundant spatial grouping of a large number of spatiotemporal patches, named scenelet, to represent a dynamic scene. Specifically, each scenelet is associated with a category-dependent scenelet model to encode the likelihood of a specific scene category. All scenelet models for a scene category are jointly learned to encode the spatial interactions and redundancies among them. Subsequently, a dynamic scene sequence is represented as a collection of category likelihoods estimated by these scenelet models. Such presentation effectively encodes the spatial layout prior together with associated semantic information, and can be used for classifying dynamic scenes in combination with a standard learning algorithm such as k -nearest neighbor or linear support vector machine. The effectiveness of our approach is clearly demonstrated using two dynamic scene benchmarks and a related application for violence video classification. In the nearest neighbor classification framework, for dynamic scene classification, our method outperforms previous state-of-the-arts on both Maryland "in the wild" dataset and "stabilized" dynamic scene dataset. For violence video classification on a benchmark dataset, our method achieves a promising classification rate of 87.08%, which significantly improves previous best result of 81.30%.
动态场景分类最近引起了越来越多的研究关注。虽然现有技术主要依赖于底层特征,但很少有工作能够解决探索动态场景中丰富的空间布局信息的需求。受动态场景具有动态和静态部分且具有空间布局先验的事实的启发,我们提出使用大量时空补丁的冗余空间分组,称为场景块,来表示一个动态场景。具体来说,每个场景块都与一个与类别相关的场景块模型相关联,以编码特定场景类别的可能性。所有场景块模型都共同学习,以编码它们之间的空间相互作用和冗余性。随后,动态场景序列表示为通过这些场景块模型估计的类别可能性的集合。这种表示形式有效地编码了空间布局先验和相关的语义信息,并可与标准学习算法(如 k-最近邻或线性支持向量机)结合用于动态场景分类。我们的方法在两个动态场景基准和一个相关的暴力视频分类应用中得到了明显的验证。在最近邻分类框架中,对于动态场景分类,我们的方法在马里兰州“野外”数据集和“稳定”动态场景数据集上都优于以前的最先进技术。在基准数据集上进行暴力视频分类时,我们的方法达到了 87.08%的有希望的分类率,这显著提高了以前 81.30%的最佳结果。