IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7559-7576. doi: 10.1109/TPAMI.2022.3222871. Epub 2023 May 5.
In the semi-supervised skeleton-based action recognition task, obtaining more discriminative information from both labeled and unlabeled data is a challenging problem. As the current mainstream approach, contrastive learning can learn more representations of augmented data, which can be considered as the pretext task of action recognition. However, such a method still confronts three main limitations: 1) It usually learns global-granularity features that cannot well reflect the local motion information. 2) The positive/negative pairs are usually pre-defined, some of which are ambiguous. 3) It generally measures the distance between positive/negative pairs only within the same granularity, which neglects the contrasting between the cross-granularity positive and negative pairs. Toward these limitations, we propose a novel Multi-granularity Anchor-Contrastive representation Learning (dubbed as MAC-Learning) to learn multi-granularity representations by conducting inter- and intra-granularity contrastive pretext tasks on the learnable and structural-link skeletons among three types of granularities covering local, context, and global views. To avoid the disturbance of ambiguous pairs from noisy and outlier samples, we design a more reliable Multi-granularity Anchor-Contrastive Loss (dubbed as MAC-Loss) that measures the agreement/disagreement between high-confidence soft-positive/negative pairs based on the anchor graph instead of the hard-positive/negative pairs in the conventional contrastive loss. Extensive experiments on both NTU RGB+D and Northwestern-UCLA datasets show that the proposed MAC-Learning outperforms existing competitive methods in semi-supervised skeleton-based action recognition tasks.
在基于骨架的半监督动作识别任务中,从有标签和无标签数据中获取更具判别性的信息是一个具有挑战性的问题。作为当前主流的方法,对比学习可以学习增强数据的更多表示,可以被认为是动作识别的预备任务。然而,这种方法仍然面临三个主要的局限性:1)它通常学习全局粒度的特征,不能很好地反映局部运动信息。2)正/负样本对通常是预先定义的,有些是模糊的。3)它通常只在同一粒度内测量正/负样本对之间的距离,而忽略了跨粒度正/负样本对之间的对比。针对这些局限性,我们提出了一种新的多粒度锚定对比表示学习(称为 MAC-Learning),通过在可学习的和结构链接骨架上进行跨粒度和同粒度的对比预备任务,学习多粒度的表示,这些骨架覆盖了局部、上下文和全局视角的三种粒度。为了避免来自噪声和离群样本的模糊样本对的干扰,我们设计了一种更可靠的多粒度锚定对比损失(称为 MAC-Loss),它基于锚图而不是传统对比损失中的硬正/负样本,来衡量高置信度软正/负样本对之间的一致性/不一致性。在 NTU RGB+D 和 Northwestern-UCLA 数据集上的广泛实验表明,所提出的 MAC-Learning 在基于骨架的半监督动作识别任务中优于现有的竞争方法。