Li Ce, Zhang Baochang, Chen Chen, Ye Qixiang, Han Jungong, Guo Guodong, Ji Rongrong
IEEE Trans Image Process. 2019 Apr 25. doi: 10.1109/TIP.2019.2912357.
While intrinsic data structure in subspace provides useful information for visual recognition, it has not yet been well studied in deep feature learning for action recognition. In this paper, we introduce a new spatio-temporal manifold network (STMN) that leverages data manifold structures to regularize deep action feature learning, aiming at simultaneously minimizing the intra-class variations of learned deep features and alleviating the over-fitting problem. To this end, the manifold prior is imposed from the top layer of a convolutional neural network (CNN), and is propagated across convolutional layers during forward-backward propagation. The observed correspondence of manifold structures in the data space and feature space validates that the manifold priori can be transferred across CNN layers. STMN theoretically recasts the problem of transferring the data structure prior into the deep learning architectures as a projection over the manifold via an embedding method, which can be easily solved by an Alternating Direction Method of Multipliers and Backward Propagation (ADMM-BP) algorithm. STMN is generic in the sense that it can be plugged into various backbone architectures to learn more discriminative representation for action recognition. Extensive experimental results show that our method achieves comparable or even better performance as compared with the state-of-the-art approaches on four benchmark datasets.
虽然子空间中的内在数据结构为视觉识别提供了有用信息,但在用于动作识别的深度特征学习中尚未得到充分研究。在本文中,我们引入了一种新的时空流形网络(STMN),它利用数据流形结构来规范深度动作特征学习,旨在同时最小化学习到的深度特征的类内变化并缓解过拟合问题。为此,流形先验从卷积神经网络(CNN)的顶层施加,并在正向 - 反向传播期间跨卷积层传播。数据空间和特征空间中流形结构的观察对应关系验证了流形先验可以跨CNN层转移。STMN从理论上将通过嵌入方法将数据结构先验转移到深度学习架构中的问题重新表述为流形上的投影,这可以通过交替方向乘子法和反向传播(ADMM - BP)算法轻松解决。STMN具有通用性,因为它可以插入各种骨干架构中,以学习更具判别力的动作识别表示。广泛的实验结果表明,与四个基准数据集上的现有方法相比,我们的方法实现了相当甚至更好的性能。