Zhang Mingyuan, Cai Zhongang, Pan Liang, Hong Fangzhou, Guo Xinying, Yang Lei, Liu Ziwei
IEEE Trans Pattern Anal Mach Intell. 2024 Jun;46(6):4115-4128. doi: 10.1109/TPAMI.2024.3355414. Epub 2024 May 7.
Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, one of the first diffusion model-based text-driven motion generation frameworks, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation.
人体运动建模对许多现代图形应用程序都很重要,这些应用程序通常需要专业技能。为了消除外行人的技能障碍,最近的运动生成方法可以直接根据自然语言生成人体运动。然而,使用各种文本输入实现多样化和细粒度的运动生成仍然具有挑战性。为了解决这个问题,我们提出了MotionDiffuse,这是最早基于扩散模型的文本驱动运动生成框架之一,它在现有方法上展示了几个理想的特性。1)概率映射。MotionDiffuse不是确定性的语言-运动映射,而是通过一系列注入变化的去噪步骤来生成运动。2)逼真合成。MotionDiffuse擅长对复杂的数据分布进行建模并生成生动的运动序列。3)多层次操纵。MotionDiffuse响应关于身体部位的细粒度指令,以及使用随时间变化的文本提示进行任意长度的运动合成。我们的实验表明,在文本驱动的运动生成和动作条件运动生成方面,MotionDiffuse以显著优势优于现有的SoTA方法。定性分析进一步证明了MotionDiffuse在全面运动生成方面的可控性。