Zhang Haoyuan
School of Electrical and Information Engineering, North Minzu Univeristy, Yinchuan 750021, China.
Sensors (Basel). 2025 Feb 28;25(5):1521. doi: 10.3390/s25051521.
In this paper, we propose a contrastive mask learning (CML) method for self-supervised 3D skeleton-based action recognition. Specifically, the mask modeling mechanism is integrated into multi-level contrastive learning with the aim of forming a mutually beneficial learning scheme from both contrastive learning and masked skeleton reconstruction. The contrastive objective is extended from an individual skeleton instance to clusters by closing the gap between cluster assignment from different instances of the same category, with the goal of pursuing inter-instance consistency. Compared with previous methods, CML integrates contrastive and masked learning comprehensively and enables intra-/inter-instance consistency pursuit via multi-level contrast, which leads to more discriminative skeleton representation learning. Our extensive evaluation of the challenging NTU RGB+D and PKU-MMD benchmarks demonstrates that representations learned via CML exhibit superior discriminability, consistently outperforming state-of-the-art methods in terms of action recognition accuracy.
在本文中,我们提出了一种用于基于自监督3D骨架的动作识别的对比掩码学习(CML)方法。具体而言,掩码建模机制被集成到多级对比学习中,目的是从对比学习和掩码骨架重建中形成一种互利的学习方案。通过缩小同一类不同实例的聚类分配之间的差距,将对比目标从单个骨架实例扩展到聚类,以追求实例间的一致性。与先前的方法相比,CML全面集成了对比学习和掩码学习,并通过多级对比实现了实例内/实例间一致性的追求,从而导致更具判别力的骨架表示学习。我们对具有挑战性的NTU RGB+D和PKU-MMD基准进行的广泛评估表明,通过CML学习到的表示具有卓越的判别能力,在动作识别准确率方面始终优于现有方法。