Wang Nanyang, Zhang Yinda, Li Zhuwen, Fu Yanwei, Yu Hang, Liu Wei, Xue Xiangyang, Jiang Yu-Gang
IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3600-3613. doi: 10.1109/TPAMI.2020.2984232. Epub 2021 Sep 2.
In this paper, we propose an end-to-end deep learning architecture that generates 3D triangular meshes from single color images. Restricted by the nature of prevalent deep learning techniques, the majority of previous works represent 3D shapes in volumes or point clouds. However, it is non-trivial to convert these representations to compact and ready-to-use mesh models. Unlike the existing methods, our network represents 3D shapes in meshes, which are essentially graphs and well suited for graph-based convolutional neural networks. Leveraging perceptual features extracted from an input image, our network produces the correct geometry by progressively deforming an ellipsoid. To make the whole deformation procedure stable, we adopt a coarse-to-fine strategy, and define various mesh/surface related losses to capture properties of various aspects, which benefits producing the visually appealing and physically accurate 3D geometry. In addition, our model by nature can be adapted to objects in specific domains, e.g., human faces, and be easily extended to learn per-vertex properties, e.g., color. Extensive experiments show that our method not only qualitatively produces the mesh model with better details, but also achieves the higher 3D shape estimation accuracy compared against the state-of-the-arts.
在本文中,我们提出了一种端到端的深度学习架构,该架构可从单幅彩色图像生成三维三角形网格。受当前深度学习技术性质的限制,大多数先前的工作都是以体素或点云的形式表示三维形状。然而,将这些表示形式转换为紧凑且易于使用的网格模型并非易事。与现有方法不同,我们的网络以网格形式表示三维形状,网格本质上是图形,非常适合基于图形的卷积神经网络。利用从输入图像中提取的感知特征,我们的网络通过逐步使椭球体变形来生成正确的几何形状。为了使整个变形过程稳定,我们采用了从粗到细的策略,并定义了各种与网格/曲面相关的损失,以捕捉各个方面的属性,这有助于生成视觉上吸引人且物理上准确的三维几何形状。此外,我们的模型本质上可以适应特定领域的物体,例如人脸,并且可以很容易地扩展以学习每个顶点的属性,例如颜色。大量实验表明,我们的方法不仅在定性上能生成具有更好细节的网格模型,而且与现有技术相比,还实现了更高的三维形状估计精度。