从海量无标注行走视频中学习步态表示：一个基准。

Learning Gait Representation From Massive Unlabelled Walking Videos: A Benchmark.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14920-14937. doi: 10.1109/TPAMI.2023.3312419. Epub 2023 Nov 3.

DOI:10.1109/TPAMI.2023.3312419

Abstract

Gait depicts individuals' unique and distinguishing walking patterns and has become one of the most promising biometric features for human identification. As a fine-grained recognition task, gait recognition is easily affected by many factors and usually requires a large amount of completely annotated data that is costly and insatiable. This paper proposes a large-scale self-supervised benchmark for gait recognition with contrastive learning, aiming to learn the general gait representation from massive unlabelled walking videos for practical applications via offering informative walking priors and diverse real-world variations. Specifically, we collect a large-scale unlabelled gait dataset GaitLU-1M consisting of 1.02M walking sequences and propose a conceptually simple yet empirically powerful baseline model GaitSSB. Experimentally, we evaluate the pre-trained model on four widely-used gait benchmarks, CASIA-B, OU-MVLP, GREW and Gait3D with or without transfer learning. The unsupervised results are comparable to or even better than the early model-based and GEI-based methods. After transfer learning, GaitSSB outperforms existing methods by a large margin in most cases, and also showcases the superior generalization capacity. Further experiments indicate that the pre-training can save about 50% and 80% annotation costs of GREW and Gait3D. Theoretically, we discuss the critical issues for gait-specific contrastive framework and present some insights for further study. As far as we know, GaitLU-1M is the first large-scale unlabelled gait dataset, and GaitSSB is the first method that achieves remarkable unsupervised results on the aforementioned benchmarks.

摘要

步态描绘了个体独特且有区别的行走模式，已成为人体识别中最有前途的生物特征之一。作为一项细粒度识别任务，步态识别很容易受到许多因素的影响，通常需要大量完全标注的数据，而这些数据既昂贵又难以满足需求。本文提出了一种基于对比学习的大规模自监督步态识别基准，旨在通过提供有用的行走先验知识和多样化的真实世界变化，从大量无标签的行走视频中学习通用的步态表示，以实现实际应用。具体来说，我们收集了一个大规模的无标签步态数据集 GaitLU-1M，其中包含 102 万条行走序列，并提出了一个概念简单但经验上强大的基线模型 GaitSSB。在实验中，我们在四个广泛使用的步态基准（CASIA-B、OU-MVLP、GREW 和 Gait3D）上评估了预训练模型，包括有无迁移学习。无监督结果与早期基于模型和 GEI 的方法相当，甚至更好。在迁移学习后，GaitSSB 在大多数情况下都大大优于现有方法，并且展示了卓越的泛化能力。进一步的实验表明，预训练可以节省 GREW 和 Gait3D 约 50%和 80%的标注成本。从理论上讲，我们讨论了步态特定对比框架的关键问题，并为进一步研究提供了一些见解。据我们所知，GaitLU-1M 是第一个大规模无标签步态数据集，而 GaitSSB 是在上述基准上实现显著无监督结果的第一个方法。