School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China.
School of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, China.
Sensors (Basel). 2022 Sep 9;22(18):6821. doi: 10.3390/s22186821.
Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.
人体姿态估计长期以来一直是计算机视觉和人工智能领域的一个基本问题。在二维人体姿态估计(HPE)方法中,基于回归的方法是突出的,这些方法已被证明可以取得优异的结果。然而,在运动模糊、遮挡和截断等具有挑战性的情况下,地面真实标签通常固有地不明确,导致性能测量不佳和准确性降低。在本文中,我们提出了 Cofopose,这是一种由人体和关键点检测变压器组成的两阶段方法,用于二维人体姿态估计。Cofopose 由条件交叉注意力、条件检测变压器(conditional DETR)和变压器框架中的编码器-解码器组成;这允许它实现人体和关键点检测。与其他方法的显著区别在于,我们使用条件交叉注意力和微调条件 DETR 进行人体检测,以及变压器中的编码器-解码器进行关键点检测。我们使用两个基准数据集 MS COCO 和 MPII 对 Cofopose 进行了广泛评估,与现有最先进的框架相比,它的性能有了显著提高,并有显著的优势。