IEEE Trans Image Process. 2023;32:4046-4058. doi: 10.1109/TIP.2023.3293766. Epub 2023 Jul 19.
We present Skeleton-CutMix, a simple and effective skeleton augmentation framework for supervised domain adaptation and show its advantage in skeleton-based action recognition tasks. Existing approaches usually perform domain adaptation for action recognition with elaborate loss functions that aim to achieve domain alignment. However, they fail to capture the intrinsic characteristics of skeleton representation. Benefiting from the well-defined correspondence between bones of a pair of skeletons, we instead mitigate domain shift by fabricating skeleton data in a mixed domain, which mixes up bones from the source domain and the target domain. The fabricated skeletons in the mixed domain can be used to augment training data and train a more general and robust model for action recognition. Specifically, we hallucinate new skeletons by using pairs of skeletons from the source and target domains; a new skeleton is generated by exchanging some bones from the skeleton in the source domain with corresponding bones from the skeleton in the target domain, which resembles a cut-and-mix operation. When exchanging bones from different domains, we introduce a class-specific bone sampling strategy so that bones that are more important for an action class are exchanged with higher probability when generating augmentation samples for that class. We show experimentally that the simple bone exchange strategy for augmentation is efficient and effective and that distinctive motion features are preserved while mixing both action and style across domains. We validate our method in cross-dataset and cross-age settings on NTU-60 and ETRI-Activity3D datasets with an average gain of over 3% in terms of action recognition accuracy, and demonstrate its superior performance over previous domain adaptation approaches as well as other skeleton augmentation strategies.
我们提出了 Skeleton-CutMix,这是一种简单而有效的骨架增强框架,用于监督领域自适应,并展示了其在基于骨架的动作识别任务中的优势。现有的方法通常使用精心设计的损失函数来进行动作识别的领域自适应,这些损失函数旨在实现领域对齐。然而,它们无法捕捉骨架表示的内在特征。受一对骨架之间骨骼明确对应关系的启发,我们通过在混合域中构造骨架数据来减轻领域转移,该混合域混合了来自源域和目标域的骨骼。在混合域中构造的骨架可以用于增强训练数据,并为动作识别训练更通用和稳健的模型。具体来说,我们通过使用源域和目标域中的骨架对来产生新的骨架;通过从源域的骨架中交换一些骨骼并与目标域的骨架中的相应骨骼进行交换,生成一个新的骨架,类似于剪接和混合操作。在交换来自不同域的骨骼时,我们引入了一种特定于类别的骨骼采样策略,以便在为该类生成增强样本时,以更高的概率交换对该动作类更重要的骨骼。我们通过在 NTU-60 和 ETRI-Activity3D 数据集上进行跨数据集和跨年龄设置的实验表明,这种简单的骨骼交换增强策略是高效且有效的,并且在混合跨域的动作和风格时保留了独特的运动特征。与之前的领域自适应方法以及其他骨架增强策略相比,我们的方法在动作识别准确性方面平均提高了 3%以上,证明了其优越的性能。