用于3D人体姿态估计的单目视频的时间表示学习

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation.

作者信息

Honari Sina, Constantin Victor, Rhodin Helge, Salzmann Mathieu, Fua Pascal

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6415-6427. doi: 10.1109/TPAMI.2022.3215307. Epub 2023 Apr 3.

DOI:10.1109/TPAMI.2022.3215307

PMID:36251908

Abstract

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.

摘要

在本文中，我们提出了一种无监督特征提取方法，用于捕捉单目视频中的时间信息，即我们在每一帧中检测并编码感兴趣的对象，并利用对比自监督（CSS）学习来提取丰富的潜在向量。与其他CSS方法不同，我们不是简单地将相邻帧的潜在特征视为正例对，将时间上较远的帧的潜在特征视为负例对，而是明确地将每个潜在向量分解为一个时变分量和一个时不变分量。然后我们表明，仅对时变特征应用对比损失，并鼓励它们在相邻帧和远离帧之间进行渐进过渡，同时重建输入，可以提取出丰富的时间特征，非常适合人体姿态估计。与标准CSS策略相比，我们的方法将误差降低了约50%，优于其他无监督单视图方法，并与多视图技术的性能相匹配。当有二维姿态可用时，我们的方法可以提取更丰富的潜在特征并提高三维姿态估计精度，优于其他最新的弱监督方法。

相似文献

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation.用于3D人体姿态估计的单目视频的时间表示学习

IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6415-6427. doi: 10.1109/TPAMI.2022.3215307. Epub 2023 Apr 3.

A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation.基于视频的 3D 婴儿姿态估计的自监督时空注意网络。

Med Image Anal. 2024 Aug;96:103208. doi: 10.1016/j.media.2024.103208. Epub 2024 May 18.

3D Human Pose, Shape and Texture From Low-Resolution Images and Videos.基于低分辨率图像和视频的三维人体姿态、形状和纹理。

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4490-4504. doi: 10.1109/TPAMI.2021.3070002. Epub 2022 Aug 4.

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding.跨视图运动一致的自我监督视频内-外对比动作表示理解。

Neural Netw. 2024 Nov;179:106578. doi: 10.1016/j.neunet.2024.106578. Epub 2024 Jul 26.

EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos.内镜 SLAM 数据集和一种用于内镜视频的无监督单目视觉里程计和深度估计方法。

Med Image Anal. 2021 Jul;71:102058. doi: 10.1016/j.media.2021.102058. Epub 2021 Apr 15.

TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.TCGL：用于自监督视频表征学习的时间对比图

IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle.基于自监督学习的葡萄园无人农业车单目深度估计

Sensors (Basel). 2022 Jan 18;22(3):721. doi: 10.3390/s22030721.

ContrastivePose: A contrastive learning approach for self-supervised feature engineering for pose estimation and behavorial classification of interacting animals.对比姿态：一种用于自我监督特征工程的对比学习方法，用于估计互动动物的姿态和行为分类。

Comput Biol Med. 2023 Oct;165:107416. doi: 10.1016/j.compbiomed.2023.107416. Epub 2023 Aug 29.

RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry.RAUM-VO：旋转调整无监督单目视觉里程计。

Sensors (Basel). 2022 Mar 30;22(7):2651. doi: 10.3390/s22072651.

Weakly Supervised Adversarial Learning for 3D Human Pose Estimation from Point Clouds.基于点云的弱监督对抗学习三维人体姿态估计

IEEE Trans Vis Comput Graph. 2020 May;26(5):1851-1859. doi: 10.1109/TVCG.2020.2973076. Epub 2020 Feb 13.

引用本文的文献

Deep learning for genomic selection of aquatic animals.用于水生动物基因组选择的深度学习

Mar Life Sci Technol. 2024 Sep 27;6(4):631-650. doi: 10.1007/s42995-024-00252-y. eCollection 2024 Nov.

A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation.近期用于三维人体姿态估计的深度学习方法的系统综述。

J Imaging. 2023 Dec 12;9(12):275. doi: 10.3390/jimaging9120275.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于3D人体姿态估计的单目视频的时间表示学习

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献