Wang Yancheng, Xiao Yang, Lu Junyi, Tan Bo, Cao Zhiguo, Zhang Zhenjun, Zhou Joey Tianyi
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5332-5345. doi: 10.1109/TNNLS.2021.3070179. Epub 2022 Oct 5.
Dramatic imaging viewpoint variation is the critical challenge toward action recognition for depth video. To address this, one feasible way is to enhance view-tolerance of visual feature, while still maintaining strong discriminative capacity. Multi-view dynamic image (MVDI) is the most recently proposed 3-D action representation manner that is able to compactly encode human motion information and 3-D visual clue well. However, it is still view-sensitive. To leverage its performance, a discriminative MVDI fusion method is proposed by us via multi-instance learning (MIL). Specifically, the dynamic images (DIs) from different observation viewpoints are regarded as the instances for 3-D action characterization. After being encoded using Fisher vector (FV), they are then aggregated by sum-pooling to yield the representative 3-D action signature. Our insight is that viewpoint aggregation helps to enhance view-tolerance. And, FV can map the raw DI feature to the higher dimensional feature space to promote the discriminative power. Meanwhile, a discriminative viewpoint instance discovery method is also proposed to discard the viewpoint instances unfavorable for action characterization. The wide-range experiments on five data sets demonstrate that our proposition can significantly enhance the performance of cross-view 3-D action recognition. And, it is also applicable to cross-view 3-D object recognition. The source code is available at https://github.com/3huo/ActionView.
深度视频动作识别面临的关键挑战是显著的成像视角变化。为解决这一问题,一种可行的方法是提高视觉特征的视角容忍度,同时保持强大的辨别能力。多视角动态图像(MVDI)是最近提出的一种三维动作表示方式,它能够紧凑地编码人体运动信息和三维视觉线索。然而,它仍然对视角敏感。为了利用其性能,我们通过多实例学习(MIL)提出了一种有辨别力的MVDI融合方法。具体来说,将来自不同观察视角的动态图像(DI)视为用于三维动作表征的实例。使用Fisher向量(FV)进行编码后,通过求和池化将它们聚合起来,以产生具有代表性的三维动作特征。我们的见解是,视角聚合有助于提高视角容忍度。并且,FV可以将原始的DI特征映射到更高维的特征空间,以提升辨别能力。同时,还提出了一种有辨别力的视角实例发现方法,以舍弃不利于动作表征的视角实例。在五个数据集上进行的广泛实验表明,我们的方法可以显著提高跨视角三维动作识别的性能。而且,它也适用于跨视角三维物体识别。源代码可在https://github.com/3huo/ActionView获取。