Suppr超能文献

单目3D人体姿态估计的技术现状综述:方法、基准和挑战

A Survey of the State of the Art in Monocular 3D Human Pose Estimation: Methods, Benchmarks, and Challenges.

作者信息

Guo Yan, Gao Tianhan, Dong Aoshuang, Jiang Xinbei, Zhu Zichen, Wang Fuxin

机构信息

Software College, Northeastern University, Shenyang 110004, China.

出版信息

Sensors (Basel). 2025 Apr 10;25(8):2409. doi: 10.3390/s25082409.

Abstract

Three-dimensional human pose estimation (3D HPE) from monocular RGB cameras is a fundamental yet challenging task in computer vision, forming the basis of a wide range of applications such as action recognition, metaverse, self-driving, and healthcare. Recent advances in deep learning have significantly propelled the field, particularly with the incorporation of state-space models (SSMs) and diffusion models. However, systematic reviews that comprehensively cover these emerging techniques remain limited. This survey contributes to the literature by providing the first comprehensive analysis of recent innovative approaches, featuring diffusion models and SSMs within 3D HPE. It categorizes and analyzes various techniques, highlighting their strengths, limitations, and notable innovations. Additionally, it provides a detailed overview of commonly employed datasets and evaluation metrics. Furthermore, this survey offers an in-depth discussion on key challenges, particularly depth ambiguity and occlusion issues arising from single-view setups, thoroughly reviewing effective solutions proposed in recent studies. Finally, current applications and promising avenues for future research are highlighted to guide and inspire ongoing innovation in the area, with emerging trends such as integrating large language models (LLMs) to provide semantic priors and prompt-based supervision for improved 3D pose estimation.

摘要

从单目RGB相机进行三维人体姿态估计(3D HPE)是计算机视觉中一项基础但具有挑战性的任务,它构成了诸如动作识别、元宇宙、自动驾驶和医疗保健等广泛应用的基础。深度学习的最新进展显著推动了该领域的发展,特别是通过纳入状态空间模型(SSM)和扩散模型。然而,全面涵盖这些新兴技术的系统综述仍然有限。本综述通过对近期创新方法进行首次全面分析,为该文献做出了贡献,这些方法以3D HPE中的扩散模型和SSM为特色。它对各种技术进行了分类和分析,突出了它们的优势、局限性和显著创新。此外,它还提供了常用数据集和评估指标的详细概述。此外,本综述对关键挑战进行了深入讨论,特别是单视图设置中出现的深度模糊和遮挡问题,并全面回顾了近期研究中提出的有效解决方案。最后,强调了当前的应用和未来研究的有前景的途径,以指导和激发该领域的持续创新,包括整合大语言模型(LLM)以提供语义先验和基于提示的监督以改进3D姿态估计等新兴趋势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1818/12031093/57f8d8434482/sensors-25-02409-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验