图像与协作解析流视频合成共舞。

Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis.

出版信息

IEEE Trans Image Process. 2021;30:9259-9269. doi: 10.1109/TIP.2021.3123549. Epub 2021 Nov 12.

DOI:10.1109/TIP.2021.3123549

Abstract

Transferring human motion from a source to a target person poses great potential in computer vision and graphics applications. A crucial step is to manipulate sequential future motion while retaining the appearance characteristic. Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person, which is not scalable in practice. This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person given only one image of the person, named as Collaborative Parsing-Flow Network (CPF-Net). The paucity of information regarding the target person makes the task particularly challenging to faithfully preserve the appearance in varying designated poses. To address this issue, CPF-Net integrates the structured human parsing and appearance flow to guide the realistic foreground synthesis which is merged into the background by a spatio-temporal fusion module. In particular, CPF-Net decouples the problem into stages of human parsing sequence generation, foreground sequence generation and final video generation. The human parsing generation stage captures both the pose and the body structure of the target. The appearance flow is beneficial to keep details in synthesized frames. The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance. Finally, the dedicated designed fusion network ensure the temporal coherence. We further collect a large set of human dancing videos to push forward this research field. Both quantitative and qualitative results show our method substantially improves over previous approaches and is able to generate appealing and photo-realistic target videos given any input person image. All source code and dataset will be released at https://github.com/xiezhy6/CPF-Net.

摘要

将人体运动从源体转移到目标体在计算机视觉和图形应用中具有巨大的潜力。关键步骤是在保留外观特征的同时操纵顺序未来运动。以前的工作要么依赖于精心制作的 3D 人体模型，要么针对每个目标人物专门训练一个单独的模型，这在实践中是不可扩展的。这项工作研究了一个更一般的设置，我们旨在学习一个单一的模型，以从源视频中节省地将运动转移到任何目标人物，只需该人物的一张图像，称为协作解析-流网络（CPF-Net）。由于目标人物的信息很少，因此要忠实地保留在不同指定姿势下的外观，任务特别具有挑战性。为了解决这个问题，CPF-Net 将人体解析和外观流集成在一起，以指导逼真的前景合成，该合成由时空融合模块合并到背景中。特别是，CPF-Net 将问题分为人体解析序列生成、前景序列生成和最终视频生成三个阶段。人体解析生成阶段捕获目标的姿势和身体结构。外观流有助于保持合成帧中的细节。人体解析和外观流的集成有效地指导了具有逼真外观的视频帧的生成。最后，专门设计的融合网络确保了时间一致性。我们进一步收集了大量的人类舞蹈视频，以推动这一研究领域的发展。定量和定性的结果都表明，我们的方法大大优于以前的方法，并且能够在给定任何输入人物图像的情况下生成吸引人的、逼真的目标视频。所有的源代码和数据集将在 https://github.com/xiezhy6/CPF-Net 上发布。

相似文献

Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis.图像与协作解析流视频合成共舞。

IEEE Trans Image Process. 2021;30:9259-9269. doi: 10.1109/TIP.2021.3123549. Epub 2021 Nov 12.

Semi-supervised body parsing and pose estimation for enhancing infant general movement assessment.用于增强婴儿一般运动评估的半监督身体解析与姿势估计

Med Image Anal. 2023 Jan;83:102654. doi: 10.1016/j.media.2022.102654. Epub 2022 Oct 14.

Progressive and Aligned Pose Attention Transfer for Person Image Generation.递进式和对齐式姿势注意转移在人像图像生成中的应用。

IEEE Trans Pattern Anal Mach Intell. 2022 Aug;44(8):4306-4320. doi: 10.1109/TPAMI.2021.3068236. Epub 2022 Jul 1.

Deep Action Parsing in Videos With Large-Scale Synthesized Data.利用大规模合成数据进行视频深度动作解析。

IEEE Trans Image Process. 2018 Jun;27(6):2869-2882. doi: 10.1109/TIP.2018.2813530.

CrowdGAN: Identity-Free Interactive Crowd Video Generation and Beyond.CrowdGAN：无身份交互人群视频生成及其他应用

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2856-2871. doi: 10.1109/TPAMI.2020.3043372. Epub 2022 May 5.

Robust Pose Transfer With Dynamic Details Using Neural Video Rendering.使用神经视频渲染实现具有动态细节的稳健姿态转移。

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2660-2666. doi: 10.1109/TPAMI.2022.3166989. Epub 2023 Jan 6.

Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation.层次化协同注意传播网络的零样本视频对象分割。

IEEE Trans Image Process. 2023;32:2348-2359. doi: 10.1109/TIP.2023.3267244. Epub 2023 Apr 25.

Unpaired Person Image Generation With Semantic Parsing Transformation.基于语义解析转换的非配对人物图像生成。

IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4161-4176. doi: 10.1109/TPAMI.2020.2992105. Epub 2021 Oct 1.

Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.通过VSPW数据集实现时间像素级语义理解

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):11297-11308. doi: 10.1109/TPAMI.2023.3266023. Epub 2023 Aug 7.

A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution.一种用于逼真且连贯视频超分辨率的渐进式融合生成对抗网络。

IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2264-2280. doi: 10.1109/TPAMI.2020.3042298. Epub 2022 Apr 1.

图像与协作解析流视频合成共舞。

Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis.

出版信息

IEEE Trans Image Process. 2021;30:9259-9269. doi: 10.1109/TIP.2021.3123549. Epub 2021 Nov 12.

DOI:10.1109/TIP.2021.3123549

PMID:34748489

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

图像与协作解析流视频合成共舞。

Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis.

出版信息

相似文献

图像与协作解析流视频合成共舞。

Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis.

出版信息

相似文献