• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过融合多个虚拟视角预测的深度信息从单张RGB图像估计三维人体骨骼

Estimating a 3D Human Skeleton from a Single RGB Image by Fusing Predicted Depths from Multiple Virtual Viewpoints.

作者信息

Lie Wen-Nung, Vann Veasna

机构信息

Department of Electrical Engineering, Center for Innovative Research on Aging Society (CIRAS), Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chia-Yi 621, Taiwan.

出版信息

Sensors (Basel). 2024 Dec 15;24(24):8017. doi: 10.3390/s24248017.

DOI:10.3390/s24248017
PMID:39771753
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11679025/
Abstract

In computer vision, accurately estimating a 3D human skeleton from a single RGB image remains a challenging task. Inspired by the advantages of multi-view approaches, we propose a method of predicting enhanced 2D skeletons (specifically, predicting the joints' relative depths) from multiple virtual viewpoints based on a single real-view image. By fusing these virtual-viewpoint skeletons, we can then estimate the final 3D human skeleton more accurately. Our network consists of two stages. The first stage is composed of a two-stream network: the Real-Net stream predicts 2D image coordinates and the relative depth for each joint from the real viewpoint, while the Virtual-Net stream estimates the relative depths in virtual viewpoints for the same joints. Our network's second stage consists of a depth-denoising module, a cropped-to-original coordinate transform (COCT) module, and a fusion module. The goal of the fusion module is to fuse skeleton information from the real and virtual viewpoints so that it can undergo feature embedding, 2D-to-3D lifting, and regression to an accurate 3D skeleton. The experimental results demonstrate that our single-view method can achieve a performance of 45.7 mm on average per-joint position error, which is superior to that achieved in several other prior studies of the same kind and is comparable to that of other sequence-based methods that accept tens of consecutive frames as the input.

摘要

在计算机视觉中,从单张RGB图像准确估计3D人体骨骼仍然是一项具有挑战性的任务。受多视图方法优势的启发,我们提出了一种基于单张真实视图图像从多个虚拟视角预测增强型2D骨骼(具体而言,预测关节的相对深度)的方法。通过融合这些虚拟视角骨骼,我们能够更准确地估计最终的3D人体骨骼。我们的网络由两个阶段组成。第一阶段由一个双流网络构成:真实网络流从真实视角预测每个关节的2D图像坐标和相对深度,而虚拟网络流估计相同关节在虚拟视角中的相对深度。我们网络的第二阶段由一个深度去噪模块、一个裁剪到原始坐标变换(COCT)模块和一个融合模块组成。融合模块的目标是融合来自真实和虚拟视角的骨骼信息,以便它能够进行特征嵌入、2D到3D提升,并回归到准确的3D骨骼。实验结果表明,我们的单视图方法在平均每个关节位置误差方面能够达到45.7毫米的性能,这优于其他同类先前研究的性能,并且与其他接受数十个连续帧作为输入的基于序列的方法相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/86c0fb40b4cc/sensors-24-08017-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/d137ea9dff39/sensors-24-08017-g001a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/302b05e000fd/sensors-24-08017-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/f8b20cc84a6f/sensors-24-08017-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/1a434c383cb7/sensors-24-08017-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/6e87aef6ca9b/sensors-24-08017-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/ddb0a6281eac/sensors-24-08017-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/324af1109f5f/sensors-24-08017-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/86c0fb40b4cc/sensors-24-08017-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/d137ea9dff39/sensors-24-08017-g001a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/302b05e000fd/sensors-24-08017-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/f8b20cc84a6f/sensors-24-08017-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/1a434c383cb7/sensors-24-08017-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/6e87aef6ca9b/sensors-24-08017-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/ddb0a6281eac/sensors-24-08017-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/324af1109f5f/sensors-24-08017-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9830/11679025/86c0fb40b4cc/sensors-24-08017-g008.jpg

相似文献

1
Estimating a 3D Human Skeleton from a Single RGB Image by Fusing Predicted Depths from Multiple Virtual Viewpoints.通过融合多个虚拟视角预测的深度信息从单张RGB图像估计三维人体骨骼
Sensors (Basel). 2024 Dec 15;24(24):8017. doi: 10.3390/s24248017.
2
MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer.MPCTrans:通过Swin Transformer进行3D手部姿态估计的多视角线索感知联合关系表示
Sensors (Basel). 2024 Oct 31;24(21):7029. doi: 10.3390/s24217029.
3
WHSP-Net: A Weakly-Supervised Approach for 3D Hand Shape and Pose Recovery from a Single Depth Image.WHSP-Net:一种用于从单张深度图像中恢复三维手部形状和姿态的弱监督方法。
Sensors (Basel). 2019 Aug 31;19(17):3784. doi: 10.3390/s19173784.
4
View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition.用于基于骨架的高性能人体动作识别的视图自适应神经网络。
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1963-1978. doi: 10.1109/TPAMI.2019.2896631. Epub 2019 Jan 31.
5
An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks.基于二维图像特征点的高效三维人体姿态检索与重建。
Sensors (Basel). 2021 Apr 1;21(7):2415. doi: 10.3390/s21072415.
6
Reconstructing 3D human pose and shape from a single image and sparse IMUs.从单张图像和稀疏惯性测量单元重建三维人体姿态和形状。
PeerJ Comput Sci. 2023 May 24;9:e1401. doi: 10.7717/peerj-cs.1401. eCollection 2023.
7
Glissando-Net: Deep Single View Category Level Pose Estimation and 3D Reconstruction.滑音网络:深度单视图类别级姿态估计与三维重建
IEEE Trans Pattern Anal Mach Intell. 2025 Apr;47(4):2298-2312. doi: 10.1109/TPAMI.2024.3519674. Epub 2025 Mar 6.
8
Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room.将来自多个 2D 深度相机的信息融合用于手术室中的 3D 人体姿态估计。
Int J Comput Assist Radiol Surg. 2019 Nov;14(11):1871-1879. doi: 10.1007/s11548-019-02044-7. Epub 2019 Aug 6.
9
Virtual view synthesis for 3D light-field display based on scene tower blending.基于场景塔融合的3D光场显示虚拟视图合成
Opt Express. 2021 Mar 1;29(5):7866-7884. doi: 10.1364/OE.419069.
10
3D Static Point Cloud Registration by Estimating Temporal Human Pose at Multiview.基于多视角估计时间人体姿态的 3D 静态点云配准
Sensors (Basel). 2022 Jan 31;22(3):1097. doi: 10.3390/s22031097.

引用本文的文献

1
GCN-Transformer: Graph Convolutional Network and Transformer for Multi-Person Pose Forecasting Using Sensor-Based Motion Data.GCN-Transformer:基于传感器的运动数据,用于多人姿态预测的图卷积网络和Transformer
Sensors (Basel). 2025 May 15;25(10):3136. doi: 10.3390/s25103136.

本文引用的文献

1
Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation.学习时空上下文自适应的三维人体姿态估计。
Sensors (Basel). 2024 Jul 8;24(13):4422. doi: 10.3390/s24134422.
2
HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation.HDPose:基于条件化的后分层扩散方法用于三维人体姿态估计
Sensors (Basel). 2024 Jan 26;24(3):829. doi: 10.3390/s24030829.
3
An Improved Mixture Density Network for 3D Human Pose Estimation with Ordinal Ranking.基于有序排序的 3D 人体姿态估计的改进混合密度网络。
Sensors (Basel). 2022 Jul 1;22(13):4987. doi: 10.3390/s22134987.
4
Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video.基于骨架的时空 U-Net 网络用于视频中的 3D 人体姿态估计。
Sensors (Basel). 2022 Mar 28;22(7):2573. doi: 10.3390/s22072573.
5
Deep High-Resolution Representation Learning for Visual Recognition.用于视觉识别的深度高分辨率表征学习
IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3349-3364. doi: 10.1109/TPAMI.2020.2983686. Epub 2021 Sep 2.
6
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields.OpenPose:基于部件亲和力字段的实时多人 2D 姿态估计。
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):172-186. doi: 10.1109/TPAMI.2019.2929257. Epub 2020 Dec 4.
7
View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition.用于基于骨架的高性能人体动作识别的视图自适应神经网络。
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1963-1978. doi: 10.1109/TPAMI.2019.2896631. Epub 2019 Jan 31.
8
Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments.Human3.6M:自然环境中 3D 人体感应的大规模数据集和预测方法。
IEEE Trans Pattern Anal Mach Intell. 2014 Jul;36(7):1325-39. doi: 10.1109/TPAMI.2013.248.