Zhang Jiajun, Zhang Yuxiang, An Liang, Li Mengcheng, Zhang Hongwen, Hu Zonghai, Liu Yebin
IEEE Trans Pattern Anal Mach Intell. 2025 Jul 11;PP. doi: 10.1109/TPAMI.2025.3588302.
Dynamic and dexterous manipulation of objects presents a complex challenge, requiring the synchronization of hand motions with the trajectories of objects to achieve seamless and physically plausible interactions. In this work, we introduce ManiDext, a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses based on 3D object trajectories. Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial. Therefore, we propose a continuous correspondence embedding representation that specifies detailed hand correspondences at the vertex level between the object and the hand. This embedding is optimized directly on the hand mesh in a self-supervised manner, with the distance between embeddings reflecting the geodesic distance. Our framework first generates contact maps and correspondence embeddings on the object's surface. Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process during the second stage of hand pose generation. At each step of the denoising process, we incorporate the current hand pose residual as a refinement target into the network, guiding the network to correct inaccurate hand poses. Introducing residuals into each denoising step inherently aligns with traditional optimization process, effectively merging generation and refinement into a single unified framework. Extensive experiments demonstrate that our approach can generate physically plausible and highly realistic motions for various tasks, including single and bimanual hand grasping as well as manipulating both rigid and articulated objects.
对物体进行动态且灵活的操作是一项复杂的挑战,需要手部动作与物体轨迹同步,以实现无缝且符合物理原理的交互。在这项工作中,我们引入了ManiDext,这是一个基于3D物体轨迹生成手部操作和抓握姿势的统一分层扩散框架。我们的关键见解是,在交互过程中准确建模物体与手部之间的接触对应关系至关重要。因此,我们提出了一种连续对应嵌入表示,它在物体与手部的顶点级别指定详细的手部对应关系。这种嵌入以自监督的方式直接在手部网格上进行优化,嵌入之间的距离反映测地距离。我们的框架首先在物体表面生成接触图和对应嵌入。基于这些细粒度的对应关系,我们引入了一种新颖的方法,在手部姿势生成的第二阶段将迭代细化过程集成到扩散过程中。在去噪过程的每一步,我们将当前手部姿势残差作为细化目标纳入网络,引导网络纠正不准确的手部姿势。在每个去噪步骤中引入残差本质上与传统优化过程一致,有效地将生成和细化合并到一个统一的框架中。大量实验表明,我们的方法可以为各种任务生成符合物理原理且高度逼真的动作,包括单手和双手抓握以及操作刚性和铰接物体。