Mi Yang, Zhang Xingyuan, Li Zhongguo, Wang Song
IEEE Trans Image Process. 2020 Apr 29. doi: 10.1109/TIP.2020.2989864.
By involving only subtle motions of body parts, video-based microaction recognition is a very important but challenging problem. Most existing action recognition methods are developed for general actions, and the current state-of-the-art methods usually largely rely on high-layer features learned from convolutional neural networks (CNNs). High-layer CNN features usually contain more semantic information but less detailed information. However, detailed information can be important for microactions due to the motion subtleness of such actions. In this paper, we propose to more effectively learn midlayer CNN features for enhancing microaction recognition. More specifically, we develop a new dual-branch network for microaction recognition: one branch uses the high-layer CNN features for classification, and the second branch further explores the midlayer CNN features for classification. In the second branch, we introduce a novel subtle motion detector consisting of three modules: 1) a discriminative spatial-temporal feature learning module, which further learns the subtle motion features corresponding to the discriminative spatial-temporal regions, 2) a parallel multiplier attention module, which further refines the features learned in channels and spatial-temporal domains, and 3) an activation fusion module, which fuses the max and average activations from midlayer CNN features for classification. In the experiments, we build a new microaction video dataset, where the micromotions of interest are mixed with other larger general motions such as walking. Comprehensive experimental results verify that the proposed method yields new state-of-the-art performance in two microaction video datasets, while its performance on two generalaction video datasets is also very promising.