Suppr超能文献

基于自监督学习的 3D 人体姿态估计

3D Human Pose Machines with Self-Supervised Learning.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1069-1082. doi: 10.1109/TPAMI.2019.2892452. Epub 2019 Jan 14.

Abstract

Driven by recent computer vision and robotic applications, recovering 3D human poses has become increasingly important and attracted growing interests. In fact, completing this task is quite challenging due to the diverse appearances, viewpoints, occlusions and inherently geometric ambiguities inside monocular images. Most of the existing methods focus on designing some elaborate priors /constraints to directly regress 3D human poses based on the corresponding 2D human pose-aware features or 2D pose predictions. However, due to the insufficient 3D pose data for training and the domain gap between 2D space and 3D space, these methods have limited scalabilities for all practical scenarios (e.g., outdoor scene). Attempt to address this issue, this paper proposes a simple yet effective self-supervised correction mechanism to learn all intrinsic structures of human poses from abundant images. Specifically, the proposed mechanism involves two dual learning tasks, i.e., the 2D-to-3D pose transformation and 3D-to-2D pose projection, to serve as a bridge between 3D and 2D human poses in a type of "free" self-supervision for accurate 3D human pose estimation. The 2D-to-3D pose implies to sequentially regress intermediate 3D poses by transforming the pose representation from the 2D domain to the 3D domain under the sequence-dependent temporal context, while the 3D-to-2D pose projection contributes to refining the intermediate 3D poses by maintaining geometric consistency between the 2D projections of 3D poses and the estimated 2D poses. Therefore, these two dual learning tasks enable our model to adaptively learn from 3D human pose data and external large-scale 2D human pose data. We further apply our self-supervised correction mechanism to develop a 3D human pose machine, which jointly integrates the 2D spatial relationship, temporal smoothness of predictions and 3D geometric knowledge. Extensive evaluations on the Human3.6M and HumanEva-I benchmarks demonstrate the superior performance and efficiency of our framework over all the compared competing methods.

摘要

受近期计算机视觉和机器人应用的推动,恢复 3D 人体姿势变得越来越重要,引起了越来越多的关注。事实上,由于单目图像中存在多样的外观、视角、遮挡和固有的几何歧义,完成这项任务极具挑战性。大多数现有的方法都侧重于设计一些精细的先验/约束条件,以便直接基于相应的 2D 人体姿势感知特征或 2D 姿势预测回归 3D 人体姿势。然而,由于训练所需的 3D 姿势数据不足以及 2D 空间和 3D 空间之间的域差距,这些方法对于所有实际场景(例如,户外场景)的可扩展性有限。为了解决这个问题,本文提出了一种简单而有效的自监督校正机制,从大量图像中学习人体姿势的所有内在结构。具体来说,所提出的机制涉及两个对偶学习任务,即 2D 到 3D 姿势转换和 3D 到 2D 姿势投影,作为 3D 和 2D 人体姿势之间的桥梁,在一种“自由”的自我监督下进行准确的 3D 人体姿势估计。2D 到 3D 姿势表示通过在序列相关的时间上下文下将 2D 域中的姿势表示转换到 3D 域中来顺序回归中间 3D 姿势,而 3D 到 2D 姿势投影通过保持 3D 姿势的 2D 投影和估计的 2D 姿势之间的几何一致性来细化中间 3D 姿势。因此,这两个对偶学习任务使我们的模型能够自适应地从 3D 人体姿势数据和外部大规模 2D 人体姿势数据中学习。我们进一步将我们的自监督校正机制应用于开发 3D 人体姿势机器,该机器联合整合了 2D 空间关系、预测的时间平滑性和 3D 几何知识。在 Human3.6M 和 HumanEva-I 基准上的广泛评估表明,我们的框架在所有比较的竞争方法中都具有优越的性能和效率。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验