用于跨视图三维动作识别的判别式多视图动态图像融合

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition.

作者信息

Wang Yancheng, Xiao Yang, Lu Junyi, Tan Bo, Cao Zhiguo, Zhang Zhenjun, Zhou Joey Tianyi

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5332-5345. doi: 10.1109/TNNLS.2021.3070179. Epub 2022 Oct 5.

DOI:10.1109/TNNLS.2021.3070179

Abstract

Dramatic imaging viewpoint variation is the critical challenge toward action recognition for depth video. To address this, one feasible way is to enhance view-tolerance of visual feature, while still maintaining strong discriminative capacity. Multi-view dynamic image (MVDI) is the most recently proposed 3-D action representation manner that is able to compactly encode human motion information and 3-D visual clue well. However, it is still view-sensitive. To leverage its performance, a discriminative MVDI fusion method is proposed by us via multi-instance learning (MIL). Specifically, the dynamic images (DIs) from different observation viewpoints are regarded as the instances for 3-D action characterization. After being encoded using Fisher vector (FV), they are then aggregated by sum-pooling to yield the representative 3-D action signature. Our insight is that viewpoint aggregation helps to enhance view-tolerance. And, FV can map the raw DI feature to the higher dimensional feature space to promote the discriminative power. Meanwhile, a discriminative viewpoint instance discovery method is also proposed to discard the viewpoint instances unfavorable for action characterization. The wide-range experiments on five data sets demonstrate that our proposition can significantly enhance the performance of cross-view 3-D action recognition. And, it is also applicable to cross-view 3-D object recognition. The source code is available at https://github.com/3huo/ActionView.

摘要

深度视频动作识别面临的关键挑战是显著的成像视角变化。为解决这一问题，一种可行的方法是提高视觉特征的视角容忍度，同时保持强大的辨别能力。多视角动态图像（MVDI）是最近提出的一种三维动作表示方式，它能够紧凑地编码人体运动信息和三维视觉线索。然而，它仍然对视角敏感。为了利用其性能，我们通过多实例学习（MIL）提出了一种有辨别力的MVDI融合方法。具体来说，将来自不同观察视角的动态图像（DI）视为用于三维动作表征的实例。使用Fisher向量（FV）进行编码后，通过求和池化将它们聚合起来，以产生具有代表性的三维动作特征。我们的见解是，视角聚合有助于提高视角容忍度。并且，FV可以将原始的DI特征映射到更高维的特征空间，以提升辨别能力。同时，还提出了一种有辨别力的视角实例发现方法，以舍弃不利于动作表征的视角实例。在五个数据集上进行的广泛实验表明，我们的方法可以显著提高跨视角三维动作识别的性能。而且，它也适用于跨视角三维物体识别。源代码可在https://github.com/3huo/ActionView获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于跨视图三维动作识别的判别式多视图动态图像融合

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition.

作者信息

出版信息

相似文献

引用本文的文献

用于跨视图三维动作识别的判别式多视图动态图像融合

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition.

作者信息

出版信息

相似文献

引用本文的文献