DetPoseNet：通过粗粒度姿势过滤提高多人姿势估计。

IEEE Trans Image Process. 2022;31:2782-2795. doi: 10.1109/TIP.2022.3161081. Epub 2022 Apr 4.

Human detection and pose estimation are essential for understanding human activities in images and videos. Mainstream multi-human pose estimation methods take a top-down approach, where human detection is first performed, then each detected person bounding box is fed into a pose estimation network. This top-down approach suffers from the early commitment of initial detections in crowded scenes and other cases with ambiguities or occlusions, leading to pose estimation failures. We propose the DetPoseNet, an end-to-end multi-human detection and pose estimation framework in a unified three-stage network. Our method consists of a coarse-pose proposal extraction sub-net, a coarse-pose based proposal filtering module, and a multi-scale pose refinement sub-net. The coarse-pose proposal sub-net extracts whole-body bounding boxes and body keypoint proposals in a single shot. The coarse-pose filtering step based on the person and keypoint proposals can effectively rule out unlikely detections, thus improving subsequent processing. The pose refinement sub-net performs cascaded pose estimation on each refined proposal region. Multi-scale supervision and multi-scale regression are used in the pose refinement sub-net to simultaneously strengthen context feature learning. Structure-aware loss and keypoint masking are applied to further improve the pose refinement robustness. Our framework is flexible to accept most existing top-down pose estimators as the role of the pose refinement sub-net in our approach. Experiments on COCO and OCHuman datasets demonstrate the effectiveness of the proposed framework. The proposed method is computationally efficient (5-6x speedup) in estimating multi-person poses with refined bounding boxes in sub-seconds.

人体检测和姿态估计对于理解图像和视频中的人体活动至关重要。主流的多人姿态估计方法采用自上而下的方法，首先进行人体检测，然后将每个检测到的人体边界框输入到姿态估计网络中。这种自上而下的方法存在早期承诺的问题，即在拥挤场景和其他存在歧义或遮挡的情况下，初始检测的结果会导致姿态估计失败。我们提出了 DetPoseNet，这是一个统一的三阶段网络中的端到端多人检测和姿态估计框架。我们的方法由一个粗姿态提议提取子网络、一个基于粗姿态的提议过滤模块和一个多尺度姿态细化子网络组成。粗姿态提议提取子网络在单次操作中提取全身边界框和身体关键点提议。基于人体和关键点提议的粗姿态过滤步骤可以有效地排除不太可能的检测结果，从而提高后续处理的效果。姿态细化子网络对每个细化的提议区域进行级联姿态估计。多尺度监督和多尺度回归被用于姿态细化子网络中，以同时加强上下文特征学习。结构感知损失和关键点掩模被应用于进一步提高姿态细化的鲁棒性。我们的框架具有灵活性，可以接受大多数现有的自上而下的姿态估计器作为我们方法中姿态细化子网络的角色。在 COCO 和 OCHuman 数据集上的实验证明了所提出框架的有效性。所提出的方法在亚秒级内计算效率高（速度提高 5-6 倍），能够精确定位多人姿态。

相似文献

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering.

IEEE Trans Image Process. 2022;31:2782-2795. doi: 10.1109/TIP.2022.3161081. Epub 2022 Apr 4.

Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video.

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1636-1651. doi: 10.1109/TPAMI.2022.3170353. Epub 2023 Jan 6.

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7157-7173. doi: 10.1109/TPAMI.2022.3222784.

Full-BAPose: Bottom Up Framework for Full Body Pose Estimation.

Sensors (Basel). 2023 Apr 4;23(7):3725. doi: 10.3390/s23073725.

Center point to pose: Multiple views 3D human pose estimation for multi-person.

PLoS One. 2022 Sep 13;17(9):e0274450. doi: 10.1371/journal.pone.0274450. eCollection 2022.

Multi-Person Pose Estimation Using an Orientation and Occlusion Aware Deep Learning Network.

Sensors (Basel). 2020 Mar 12;20(6):1593. doi: 10.3390/s20061593.

CONet: Crowd and occlusion-aware network for occluded human pose estimation.

Neural Netw. 2024 Apr;172:106109. doi: 10.1016/j.neunet.2024.106109. Epub 2024 Jan 9.

Coarse-to-Fine Hand-Object Pose Estimation with Interaction-Aware Graph Convolutional Network.

Sensors (Basel). 2021 Dec 3;21(23):8092. doi: 10.3390/s21238092.

Masked Kinematic Continuity-aware Hierarchical Attention Network for pose estimation in videos.

Neural Netw. 2024 Jan;169:282-292. doi: 10.1016/j.neunet.2023.10.038. Epub 2023 Oct 27.

UniPose+: A Unified Framework for 2D and 3D Human Pose Estimation in Images and Videos.

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9641-9653. doi: 10.1109/TPAMI.2021.3124736. Epub 2022 Nov 7.

引用本文的文献

A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.

Sci Rep. 2025 May 1;15(1):15284. doi: 10.1038/s41598-025-00259-0.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering.

IEEE Trans Image Process. 2022;31:2782-2795. doi: 10.1109/TIP.2022.3161081. Epub 2022 Apr 4.

Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video.

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1636-1651. doi: 10.1109/TPAMI.2022.3170353. Epub 2023 Jan 6.

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7157-7173. doi: 10.1109/TPAMI.2022.3222784.

Full-BAPose: Bottom Up Framework for Full Body Pose Estimation.

Sensors (Basel). 2023 Apr 4;23(7):3725. doi: 10.3390/s23073725.

Center point to pose: Multiple views 3D human pose estimation for multi-person.

PLoS One. 2022 Sep 13;17(9):e0274450. doi: 10.1371/journal.pone.0274450. eCollection 2022.

Multi-Person Pose Estimation Using an Orientation and Occlusion Aware Deep Learning Network.

Sensors (Basel). 2020 Mar 12;20(6):1593. doi: 10.3390/s20061593.

CONet: Crowd and occlusion-aware network for occluded human pose estimation.

Neural Netw. 2024 Apr;172:106109. doi: 10.1016/j.neunet.2024.106109. Epub 2024 Jan 9.

Coarse-to-Fine Hand-Object Pose Estimation with Interaction-Aware Graph Convolutional Network.

Sensors (Basel). 2021 Dec 3;21(23):8092. doi: 10.3390/s21238092.

Masked Kinematic Continuity-aware Hierarchical Attention Network for pose estimation in videos.

Neural Netw. 2024 Jan;169:282-292. doi: 10.1016/j.neunet.2023.10.038. Epub 2023 Oct 27.

UniPose+: A Unified Framework for 2D and 3D Human Pose Estimation in Images and Videos.

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9641-9653. doi: 10.1109/TPAMI.2021.3124736. Epub 2022 Nov 7.

引用本文的文献

A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.

Sci Rep. 2025 May 1;15(1):15284. doi: 10.1038/s41598-025-00259-0.

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献