Xu Binqian, Shu Xiangbo, Song Yan
IEEE Trans Image Process. 2022;31:3852-3867. doi: 10.1109/TIP.2022.3175605. Epub 2022 Jun 2.
Semi-supervised skeleton-based action recognition is a challenging problem due to insufficient labeled data. For addressing this problem, some representative methods leverage contrastive learning to obtain more features from the pre-augmented skeleton actions. Such methods usually adopt a two-stage way: first randomly augment samples, and then learn their representations via contrastive learning. Since skeleton samples have already been randomly augmented, the representation ability of the subsequent contrastive learning is limited due to the inconsistency between the augmentations and representations. Thus, we propose a novel X-invariant Contrastive Augmentation and Representation learning (X-CAR) framework to thoroughly obtain rotate-shear-scale (X for short) invariant features by learning augmentations and representations of skeleton sequences in a one-stage way. In X-CAR, a new Adaptive-combination Augmentation (AA) mechanism is designed to rotate, shear, and scale the skeletons by learnable controlling factors in an adaptive way rather than a random way. Here, such controlling factors are also learned in the whole contrastive learning process, which can facilitate the consistency between the learned augmentations and representations of skeleton sequences. In addition, we relax the pre-definition of positive and negative samples to avoid the confusing allocation of ambiguous samples, and present a new Pull-Push Contrastive Loss (PPCL) to pull the augmenting skeleton close to the original skeleton, while push far away from the other skeletons. Experimental results on both NTU RGB+D and North-Western UCLA datasets show that the proposed X-CAR achieves better accuracy compared with other competitive methods in the semi-supervised scenario.
基于半监督骨架的动作识别是一个具有挑战性的问题,因为标注数据不足。为了解决这个问题,一些有代表性的方法利用对比学习从预增强的骨架动作中获取更多特征。这类方法通常采用两阶段方式:首先随机增强样本,然后通过对比学习来学习它们的表示。由于骨架样本已经被随机增强,后续对比学习的表示能力因增强与表示之间的不一致而受到限制。因此,我们提出了一种新颖的X不变对比增强与表示学习(X-CAR)框架,通过以单阶段方式学习骨架序列的增强和表示,全面获取旋转-剪切-缩放(简称为X)不变特征。在X-CAR中,设计了一种新的自适应组合增强(AA)机制,通过可学习的控制因子以自适应方式而非随机方式对骨架进行旋转、剪切和缩放。这里,这些控制因子也在整个对比学习过程中进行学习,这有助于骨架序列的学习增强与表示之间的一致性。此外,我们放宽了正负样本的预定义,以避免模糊样本的混淆分配,并提出了一种新的拉-推对比损失(PPCL),将增强后的骨架拉近原始骨架,同时推远与其他骨架的距离。在NTU RGB+D和西北大学洛杉矶分校数据集上的实验结果表明,在半监督场景下,所提出的X-CAR与其他竞争方法相比取得了更好的准确率。