Suppr超能文献

基于部件感知注意力的单目三维人体姿态估计提升。

Boosting Monocular 3D Human Pose Estimation With Part Aware Attention.

出版信息

IEEE Trans Image Process. 2022;31:4278-4291. doi: 10.1109/TIP.2022.3182269. Epub 2022 Jun 29.

Abstract

Monocular 3D human pose estimation is challenging due to depth ambiguity. Convolution-based and Graph-Convolution-based methods have been developed to extract 3D information from temporal cues in motion videos. Typically, in the lifting-based methods, most recent works adopt the transformer to model the temporal relationship of 2D keypoint sequences. These previous works usually consider all the joints of a skeleton as a whole and then calculate the temporal attention based on the overall characteristics of the skeleton. Nevertheless, the human skeleton exhibits obvious part-wise inconsistency of motion patterns. It is therefore more appropriate to consider each part's temporal behaviors separately. To deal with such part-wise motion inconsistency, we propose the Part Aware Temporal Attention module to extract the temporal dependency of each part separately. Moreover, the conventional attention mechanism in 3D pose estimation usually calculates attention within a short time interval. This indicates that only the correlation within the temporal context is considered. Whereas, we find that the part-wise structure of the human skeleton is repeating across different periods, actions, and even subjects. Therefore, the part-wise correlation at a distance can be utilized to further boost 3D pose estimation. We thus propose the Part Aware Dictionary Attention module to calculate the attention for the part-wise features of input in a dictionary, which contains multiple 3D skeletons sampled from the training set. Extensive experimental results show that our proposed part aware attention mechanism helps a transformer-based model to achieve state-of-the-art 3D pose estimation performance on two widely used public datasets. The codes and the trained models are released at https://github.com/thuxyz19/3D-HPE-PAA.

摘要

单目 3D 人体姿态估计由于深度歧义而具有挑战性。已经开发了基于卷积和图卷积的方法,从运动视频中的时间线索中提取 3D 信息。通常,在基于提升的方法中,最近的工作采用转换器来对 2D 关键点序列的时间关系进行建模。这些以前的工作通常将骨架的所有关节视为一个整体,然后根据骨架的整体特征计算时间注意力。然而,人体骨架表现出明显的部分运动模式不一致。因此,分别考虑每个部分的时间行为更为合适。为了处理这种部分运动不一致性,我们提出了部分感知时间注意力模块,以分别提取每个部分的时间依赖性。此外,3D 姿态估计中的传统注意力机制通常在短时间间隔内计算注意力。这表明只考虑了时间上下文内的相关性。然而,我们发现人体骨架的部分结构在不同的时期、动作甚至不同的主体中重复出现。因此,可以利用部分之间的距离相关性来进一步提高 3D 姿态估计的性能。因此,我们提出了部分感知字典注意力模块,用于在字典中计算输入的部分特征的注意力,字典中包含从训练集中采样的多个 3D 骨架。广泛的实验结果表明,我们提出的部分感知注意力机制有助于基于转换器的模型在两个广泛使用的公共数据集上实现最先进的 3D 姿态估计性能。代码和训练好的模型可以在 https://github.com/thuxyz19/3D-HPE-PAA 上找到。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验