IEEE Trans Image Process. 2023;32:3080-3091. doi: 10.1109/TIP.2023.3275535. Epub 2023 May 30.
In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process. This approximation performs well when the distance between camera and face is far enough. However, in some scenarios that the face is very close to camera or moving along the camera axis, the methods suffer from the inaccurate reconstruction and unstable temporal fitting due to the distortion under the perspective projection. In this paper, we aim to address the problem of single-image 3D face reconstruction under perspective projection. Specifically, a deep neural network, Perspective Network (PerspNet), is proposed to simultaneously reconstruct 3D face shape in canonical space and learn the correspondence between 2D pixels and 3D points, by which the 6DoF (6 Degrees of Freedom) face pose can be estimated to represent perspective projection. Besides, we contribute a large ARKitFace dataset to enable the training and evaluation of 3D face reconstruction solutions under the scenarios of perspective projection, which has 902,724 2D facial images with ground-truth 3D face mesh and annotated 6DoF pose parameters. Experimental results show that our approach outperforms current state-of-the-art methods by a significant margin. The code and data are available at https://github.com/cbsropenproject/6dof_face.
在 3D 人脸重建中,正交投影已被广泛用于替代透视投影,以简化拟合过程。当相机和人脸之间的距离足够远时,这种近似效果很好。然而,在一些人脸非常靠近相机或沿相机轴移动的场景中,由于透视投影下的失真,这些方法会导致重建不准确和时间拟合不稳定。在本文中,我们旨在解决透视投影下的单幅 3D 人脸重建问题。具体来说,我们提出了一种深度神经网络,即透视网络(PerspNet),通过同时在规范空间中重建 3D 人脸形状并学习 2D 像素和 3D 点之间的对应关系,来估计 6DoF(6 自由度)人脸姿态,以表示透视投影。此外,我们还贡献了一个大型 ARKitFace 数据集,用于在透视投影场景下训练和评估 3D 人脸重建解决方案,该数据集包含 902724 张具有地面真实 3D 人脸网格和注释 6DoF 姿态参数的 2D 面部图像。实验结果表明,我们的方法显著优于当前最先进的方法。代码和数据可在 https://github.com/cbsropenproject/6dof_face 上获得。