使用单个RGBD相机进行3D实时人体重建。

3D real-time human reconstruction with a single RGBD camera.

作者信息

Lu Yang, Yu Han, Ni Wei, Song Liang

机构信息

Academy of Engineering and Technology, Fudan University, Shanghai, China.

Shanghai Key Research Laboratory of NSAI, Shanghai, China.

出版信息

Appl Intell (Dordr). 2023;53(8):8735-8745. doi: 10.1007/s10489-022-03969-4. Epub 2022 Aug 2.

DOI:10.1007/s10489-022-03969-4

PMID:35937202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9343569/

Abstract

3D human reconstruction is an important technology connecting the real world and the virtual world, but most of previous work needs expensive computing resources, making it difficult in real-time scenarios. We propose a lightweight human body reconstruction system based on parametric model, which employs only one RGBD camera as input. To generate a human model end to end, we build a fast and lightweight deep-learning network named Fast Body Net (FBN). The network pays more attention on the face and hands to enrich the local details. Additionally, we train a denoising auto-encoder to reduce unreasonable states of human model. Due to the lack of human dataset based on RGBD images, we propose an Indoor-Human dataset to train the network, which contains a total of 2500 frames of action data of five actors collected by Azure Kinect camera. Depth images avoid using RGB to extract depth features, which makes FBN lightweight and high-speed in reconstructing parametric human model. Qualitative and quantitative analysis on experimental results show that our method can improve at least 57% in efficiency with similar accuracy, as compared to state-of-the-art methods. Through our study, it is also demonstrated that consumer-grade RGBD cameras can provide great applications in real-time display and interaction for virtual reality.

摘要

三维人体重建是连接现实世界和虚拟世界的一项重要技术，但以往的大多数工作都需要昂贵的计算资源，这使得其在实时场景中应用困难。我们提出了一种基于参数模型的轻量级人体重建系统，该系统仅使用一个RGB-D相机作为输入。为了端到端地生成人体模型，我们构建了一个名为快速人体网络（FBN）的快速且轻量级的深度学习网络。该网络更加关注面部和手部以丰富局部细节。此外，我们训练了一个去噪自动编码器以减少人体模型的不合理状态。由于缺乏基于RGB-D图像的人体数据集，我们提出了一个室内人体数据集来训练该网络，它包含由Azure Kinect相机收集的五名演员的总共2500帧动作数据。深度图像避免使用RGB来提取深度特征，这使得FBN在重建参数化人体模型时既轻量级又高速。对实验结果的定性和定量分析表明，与现有方法相比，我们的方法在精度相似的情况下效率可提高至少57%。通过我们的研究还表明，消费级RGB-D相机可以在虚拟现实的实时显示和交互中提供出色的应用。