HandFI：基于RGB图像中多级别特征融合的多级别交互手部重建

HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images.

作者信息

Pan Huimin, Cai Yuting, Yang Jiayi, Niu Shaojia, Gao Quanli, Wang Xihan

机构信息

School of Computer Science, Xi'an Polytechnic University, Xi'an 710600, China.

出版信息

Sensors (Basel). 2024 Dec 27;25(1):88. doi: 10.3390/s25010088.

DOI:10.3390/s25010088

PMID:39796887

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11722860/

Abstract

Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships between interacting hands. In this paper, we propose a multilevel feature fusion interactive network for hand reconstruction (HandFI). Within this network, the hand feature separation module utilizes attentional mechanisms and positional coding to distinguish between left-hand and right-hand features while maintaining the spatial relationship of the features. The hand fusion and attention module promotes the alignment of hand vertices with the image by integrating multi-scale hand features while introducing cross-attention to help determine the complex spatial relationships between interacting hands, thereby enhancing the accuracy of two-hand reconstruction. We evaluated our method with existing approaches using the InterHand 2.6M, RGB2Hands, and EgoHands datasets. Extensive experimental results demonstrated that our method outperformed other representative methods, with performance metrics of 9.38 mm for the MPJPE and 9.61 mm for the MPVPE. Additionally, the results obtained in real-world scenes further validated the generalization capability of our method.

摘要

交互手部重建在各种应用中展现出了重大机遇。然而，它目前面临着诸多挑战，比如难以区分两只手的特征、手部网格与输入图像的对齐问题，以及对交互手部之间复杂空间关系进行建模。在本文中，我们提出了一种用于手部重建的多级特征融合交互网络（HandFI）。在这个网络中，手部特征分离模块利用注意力机制和位置编码来区分左手和右手特征，同时保持特征的空间关系。手部融合与注意力模块通过整合多尺度手部特征来促进手部顶点与图像的对齐，同时引入交叉注意力以帮助确定交互手部之间的复杂空间关系，从而提高双手重建的准确性。我们使用InterHand 2.6M、RGB2Hands和EgoHands数据集，将我们的方法与现有方法进行了评估。大量实验结果表明，我们的方法优于其他代表性方法，平均关节位置误差（MPJPE）的性能指标为9.38毫米，平均顶点位置误差（MPVPE）为9.61毫米。此外，在真实场景中获得的结果进一步验证了我们方法的泛化能力。