Xia Guiyu, Ma Furong, Liu Qingshan, Zhang Du
IEEE Trans Cybern. 2023 Apr;53(4):2412-2425. doi: 10.1109/TCYB.2021.3120010. Epub 2023 Mar 16.
A realistic 2-D motion can be treated as a deforming process of an individual appearance texture driven by a sequence of human poses. In this article, we thereby propose to transform the 2-D motion synthesis into a pose conditioned realistic motion image generation task considering the promising performance of pose estimation technology and generative adversarial nets (GANs). However, the problem is that GAN is only suitable to do the region-aligned image translation task while motion synthesis involves a large number of spatial deformations. To avoid this drawback, we design a two-step and multistream network architecture. First, we train a special GAN to generate the body segment images with given poses in step-I. Then in step-II, we input the body segment images as well as the poses into the multistream network so that it only needs to generate the textures in each aligned body region. Besides, we provide a real face as another input of the network to improve the face details of the generated motion image. The synthesized results with realism and sharp details on four training sets demonstrate the effectiveness of the proposed model.
一个逼真的二维运动可以被视为由一系列人体姿态驱动的个体外观纹理的变形过程。因此,在本文中,考虑到姿态估计技术和生成对抗网络(GAN)的良好性能,我们建议将二维运动合成转化为一个姿态条件下的逼真运动图像生成任务。然而,问题在于GAN仅适用于区域对齐的图像翻译任务,而运动合成涉及大量的空间变形。为了避免这一缺点,我们设计了一种两步多流网络架构。首先,我们训练一个特殊的GAN在第一步中生成具有给定姿态的身体部位图像。然后在第二步中,我们将身体部位图像以及姿态输入到多流网络中,这样它只需要在每个对齐的身体区域生成纹理。此外,我们提供一张真实的脸作为网络的另一个输入,以改善生成的运动图像的面部细节。在四个训练集上具有真实感和清晰细节的合成结果证明了所提出模型的有效性。